Image Fusion [1st ed.] 9789811548666, 9789811548673

This book systematically discusses the basic concepts, theories, research and latest trends in image fusion. It focuses

289 35 17MB

English Pages XVIII, 404 [415] Year 2020

Table of contents :
Front Matter ....Pages i-xviii
Front Matter ....Pages 1-1
Introduction to Image Fusion (Gang Xiao, Durga Prasad Bavirisetti, Gang Liu, Xingchen Zhang)....Pages 3-20
Pixel-Level Image Fusion (Gang Xiao, Durga Prasad Bavirisetti, Gang Liu, Xingchen Zhang)....Pages 21-101
Feature-Level Image Fusion (Gang Xiao, Durga Prasad Bavirisetti, Gang Liu, Xingchen Zhang)....Pages 103-147
Decision-Level Image Fusion (Gang Xiao, Durga Prasad Bavirisetti, Gang Liu, Xingchen Zhang)....Pages 149-170
Multi-sensor Dynamic Image Fusion (Gang Xiao, Durga Prasad Bavirisetti, Gang Liu, Xingchen Zhang)....Pages 171-296
Objective Fusion Metrics (Gang Xiao, Durga Prasad Bavirisetti, Gang Liu, Xingchen Zhang)....Pages 297-324
Image Fusion Based on Machine Learning and Deep Learning (Gang Xiao, Durga Prasad Bavirisetti, Gang Liu, Xingchen Zhang)....Pages 325-352
Front Matter ....Pages 353-353
Example 1: Medical Image Fusion (Gang Xiao, Durga Prasad Bavirisetti, Gang Liu, Xingchen Zhang)....Pages 355-373
Example 2: Night Vision Image Fusion (Gang Xiao, Durga Prasad Bavirisetti, Gang Liu, Xingchen Zhang)....Pages 375-386
Simulation Platform of Image Fusion (Gang Xiao, Durga Prasad Bavirisetti, Gang Liu, Xingchen Zhang)....Pages 387-404

Recommend Papers

Image fusion: algorithms and applications [illustrated edition] 9780123725295, 0123725291

The growth in the use of sensor technology has led to the demand for image fusion: signal processing techniques that can

499 73 11MB Read more

Multisensor Fusion Estimation Theory and Application [1st ed.] 9789811594250, 9789811594267

This book focuses on the basic theory and methods of multisensor data fusion state estimation and its application. It co

408 13 7MB Read more

Intelligent Systems: Fusion, Tracking and Control (CSI, Control and Signal Image Processing Series) 086380277X, 9780863802775

Introduces the concepts and techniques used for intelligent systems, focusing on the areas of fusion, tracking and contr

444 108 1MB Read more

Introduction to Medical Image Analysis [1st ed.] 9783030393632, 9783030393649

This easy-to-follow textbook presents an engaging introduction to the fascinating world of medical image analysis. Avoid

438 39 10MB Read more

Intelligent Computing: Image Processing Based Applications [1st ed.] 9789811542879, 9789811542886

This book features a collection of extended versions of papers presented at OPTRONIX 2019, held at the University of Eng

416 45 7MB Read more

Stereoscopic Image Quality Assessment [1st ed.] 9789811577635, 9789811577642

This book provides a comprehensive review of all aspects relating to visual quality assessment for stereoscopic images,

323 44 4MB Read more

#56 Fusion

112 0 Read more

Multi-resolution Image Fusion in Remote Sensing [First published] 9781108475129, 1108475124

311 99 7MB Read more

Idea and image in Indian art [[1st ed.]]

385 48 35MB Read more

Image Co-segmentation [1st ed. 2023] 9811985693, 9789811985690

This book presents and analyzes methods to perform image co-segmentation. In this book, the authors describe efficient s

158 3 10MB Read more

Image Fusion [1st ed.]
9789811548666, 9789811548673

Author / Uploaded
Gang Xiao
Durga Prasad Bavirisetti
Gang Liu
Xingchen Zhang

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Gang Xiao · Durga Prasad Bavirisetti Gang Liu · Xingchen Zhang

Image Fusion

Image Fusion

Gang Xiao • Durga Prasad Bavirisetti • Gang Liu • Xingchen Zhang

Image Fusion

Gang Xiao School of Aeronautics and Astronautics Shanghai Jiao Tong University Shanghai, China

Durga Prasad Bavirisetti School of Aeronautics and Astronautics Shanghai Jiao Tong University Shanghai, China

Gang Liu School of Automation Engineering Shanghai University of Electrical Power Shanghai, China

Xingchen Zhang School of Aeronautics and Astronautics Shanghai Jiao Tong University Shanghai, China

ISBN 978-981-15-4866-6 ISBN 978-981-15-4867-3 https://doi.org/10.1007/978-981-15-4867-3

(eBook)

Jointly published with Shanghai Jiao Tong University Press, Shanghai, China The print edition is not for sale in China Mainland. Customers from China Mainland please order the print book from: Shanghai Jiao Tong University Press. © Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

This book is dedicated to the memory of my father. He was away from us during my PhD study

Preface

Image fusion has been a hot topic for many years, and it has been widely applied to ﬁelds varying from robotics, medical engineering, and surveillance to military and many others. However, almost 10 years have passed since the publication of the last book on image fusion. Within these years, we have seen a rapid progress and more applications of image fusion, which have not been reviewed systematically. For instance, with the emergence and fast growing of artiﬁcial intelligence, some researchers have begun to investigate how machine learning and deep learning would beneﬁt image fusion. Therefore, we think the time is ripe to write a new book on image fusion to systematically and thoroughly summarize the recent development of image fusion. The purpose of this book is to provide an extensive introduction to the ﬁeld of image fusion. Based on the discussion of pixel-level, feature-level, and decisionlevel of fusion, this book systematically introduces the basic concepts, basic theories, and latest research outcomes as well as practical applications of image fusion. There are 10 chapters in total, which are arranged as two parts. Part I is about image fusion theory, including basic concepts and principles of image fusion presented in Chap. 1 and multi-source image fusion at pixel, feature, and decision level discussed in Chaps. 2, 3, and 4, respectively. Chapter 5 introduces multi-source dynamic image fusion, and Chap. 6 talks about image fusion evaluation metrics. The recent trend of image fusion, namely fusion based on machine learning and deep learning, is given in Chap. 7. Part II is about the practical applications of image fusion, including medical image fusion discussed in Chap. 8 and night vision image fusion introduced in Chap. 9. Finally, Chap. 10 describes the image fusion simulation platform developed in Shanghai Jiao Tong University in detail. This book is written based on the research work of the authors on image fusion in recent years, reﬂecting research outcomes from the authors and members of the Advanced Avionics and Intelligent Information Laboratory (AAII Lab), Shanghai Jiao Tong University, and exhibiting the latest development of image fusion ﬁeld in China and international community to some extent.

vii

viii

Preface

This book can be used as a textbook or reference book for senior undergraduate and postgraduate students. It should also be useful to researchers working on image fusion and to practicing engineers who wish to use the concept of image fusion in practical applications. Hope you enjoy reading this book and the ﬁeld of image fusion. Shanghai, China Shanghai, China January 2020

Gang Xiao Gang Liu

Acknowledgments

This book would not have been possible without the funding from National Natural Science Foundation of China under Grant 61973212 and Grant 61673270. This book is sponsored by National Science and Technology Academic Publications Fund (2019). I would sincerely thank Prof. Zhongliang Jing, who is a distinguished professor of Shanghai Jiao Tong University, for his pioneering work in the ﬁeld of information fusion. It is my great honor to complete this book with the help of members of the Advanced Avionics and Intelligent Information Laboratory (AAII Lab), Shanghai Jiao Tong University. They are Dr. Durga Prasad Bavirisetti, Dr. Gang Liu, Dr. Xingchen Zhang, and others. We would like to thank all people who have contributed to this book, without whom this book would not have been possible. This includes all current and former members of the AAII Lab at Shanghai Jiao Tong University and students in Prof Gang Liu’s research group. We also appreciate the continuous help very much from Shanghai Jiao Tong University Press. I also thank Ms. Shu Wang, my wife, and my daughter, Suyang Xiao. They always understand and support me in my scientiﬁc research.

ix

Contents

Part I

Image Fusion Theories

1

Introduction to Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 History and Development . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Image Fusion Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Necessity to Combine Information of Images . . . . . . . 1.2.2 Deﬁnition of Image Fusion . . . . . . . . . . . . . . . . . . . . 1.2.3 Image Fusion Objective . . . . . . . . . . . . . . . . . . . . . . 1.3 Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Fundamental Steps of an Image Fusion System . . . . . . . . . . . . 1.5 Types of Image Fusion Systems . . . . . . . . . . . . . . . . . . . . . . . 1.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Summary and Outline of the Book . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

3 4 5 5 6 6 10 10 11 13 14 15 18 18

2

Pixel-Level Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Single-Scale Image Fusion . . . . . . . . . . . . . . . . . . . . 2.1.2 Multi-Scale Image Fusion . . . . . . . . . . . . . . . . . . . . . 2.2 Pyramid Image Fusion Method Based on Integrated Edge and Texture Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Fusion Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Pyramid Image Fusion of Edge and Texture Information-Speciﬁc Steps . . . . . . . . . . . . . . . . . . . . 2.2.4 Beneﬁcial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

21 21 22 23

. . .

29 29 30

. .

30 34

xi

xii

Contents

2.3

Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Discrete Wavelet Frame Multi-Resolution Transform . . . 2.3.3 Basic Structure of the New Fusion Scheme . . . . . . . . . 2.3.4 Fusion of the Low-Frequency Band Using the EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 The Selection of the High-Frequency Band Using the Informative Importance Measure . . . . . . . . . . . . . . . . . 2.3.6 Computer Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 The Generic Multi-Resolution Image Fusion Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Design Criteria of Filter Banks . . . . . . . . . . . . . . . . . . 2.4.4 Optimization Design of Filter Bank for Image Fusion . . . 2.4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Anisotropic Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Anisotropic Diffusion-Based Fusion Method (ADF) . . . 2.5.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Two-Scale Image Fusion (TIF) . . . . . . . . . . . . . . . . . . 2.6.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency Detection (MSSSF) . . . . . . . . . . . . . . . . . . . 2.7.1 Maximum Symmetric Surround Saliency Detection (MSSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 MSSS Detection-Based Image Fusion (MSSSF) . . . . . . 2.7.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.4 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34 34 35 37 38 43 44 48 51 51 52 53 54 58 62 62 63 65 68 70 74 74 80 82 84 85 87 91 92 99

Contents

3

4

Feature-Level Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Fusion Based on Grads Texture Characteristic . . . . . . . . . . . . . 3.2.1 Multi-Scale Transformation Method Based on Gradient Features . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Multi-Scale Transformation Based on Gradient Feature Fusion Strategy . . . . . . . . . . . . . . . . . . . . . . . 3.3 Fusion Based on United Texture and Gradient Characteristics . . . 3.3.1 Joint Texture and Gradient Features of Multi-Scale Transformation Method . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Multi-Scale Image Fusion Method Based on Gradient Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics . . . . 3.4.1 Area-Based Image Fusion Algorithm . . . . . . . . . . . . . . 3.4.2 Image Fusion Method Based on Fuzzy Region Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Image Fusion Method Based on Fuzzy Region Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Feature-Level Fusion Algorithm . . . . . . . . . . . . . . . . . 3.5.2 Fusion Recognition Based on Genetic Algorithm . . . . . 3.5.3 Experimental Results and Evaluation . . . . . . . . . . . . . . 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decision-Level Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Fusion Algorithm Based on Voting Method . . . . . . . . . . . . . . . 4.3 Fusion Algorithm Based on D-S Evidence Theory . . . . . . . . . . . 4.4 Fusion Algorithm Based on Bayes Inference . . . . . . . . . . . . . . . 4.5 Fusion Algorithm Based on Summation Rule . . . . . . . . . . . . . . 4.6 Fusion Algorithm Based on Min-Max Rule . . . . . . . . . . . . . . . . 4.6.1 Maximum Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Minimum Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Fusion Algorithm Based on Fuzzy Integral . . . . . . . . . . . . . . . . 4.7.1 ICA Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 SVM Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.3 Decision Fusion with Fuzzy Integral . . . . . . . . . . . . . . 4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . 4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xiii

103 103 106 106 108 110 110 116 124 125 127 131 137 138 140 143 145 146 149 149 150 150 152 153 154 154 155 156 157 158 158 159 159 159 163 169 169

xiv

5

Contents

Multi-sensor Dynamic Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Multi-sensor Dynamic Image Fusion System . . . . . . . . . . . . . . 5.3 Improved Dynamic Image Fusion Scheme for Infrared and Visible Sequence Based on Image Fusion System . . . . . . . . . . . 5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Generic Pixel-Based Image Fusion Scheme . . . . . . . . . 5.3.3 Improved Dynamic Image Fusion Scheme Based on Region-Based Target . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 The Platform of the Visible-Infrared Dynamic Image Fusion System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 Experimental Results and Analysis . . . . . . . . . . . . . . . 5.3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Infrared and Visible Dynamic Image Sequence Fusion Based on Region Target Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Generic Pixel-Based Image Fusion Scheme . . . . . . . . . 5.4.3 The Region-Based Target Detection Dynamic Image Fusion Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 Experimental Results and Analysis . . . . . . . . . . . . . . . 5.4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Multi-sensor Moving Image Registration Algorithm . . . . . . . . . 5.5.1 Dynamic Image Registration . . . . . . . . . . . . . . . . . . . . 5.5.2 Dynamic Image Registration Method . . . . . . . . . . . . . . 5.6 Criteria-Based Wavelet Moving Image Fusion Displacement Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Multi-scale Image Fusion Scheme . . . . . . . . . . . . . . . . 5.6.2 Wavelet Transform-Related Theory . . . . . . . . . . . . . . . 5.6.3 Convergence Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.4 Mobility Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.5 Performance Evaluation of Fusion Algorithms . . . . . . . 5.6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Multi-sensor Dynamic Image Fusion Algorithm . . . . . . . . . . . . 5.7.1 Image Fusion Algorithm Based on Low-Redundancy Discrete Wavelet Frame . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 Image Fusion Algorithm Based on Plum-Shaped Discrete Wavelet Framework . . . . . . . . . . . . . . . . . . . 5.7.3 Image Fusion Algorithm Based on Nonlinear Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Experimental Results and Evaluation . . . . . . . . . . . . . . . . . . . . 5.8.1 LRDWF Image Fusion Algorithm Experiment and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.2 QWWF Image Fusion Algorithm Experiment and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

171 171 172 174 174 175 176 180 181 184 184 185 185 186 189 191 191 191 194 198 199 200 211 214 218 224 224 224 240 257 272 272 281

Contents

xv

5.8.3

Two Kinds of Nonlinear Wavelet Image Fusion Algorithm Experiment and Evaluation . . . . . . . . . . . . . 289 5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 6

7

Objective Fusion Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Essentiality of Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Research on Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Evaluation Index Based on the Regional Fusion . . . . . 6.3.2 Noise Performance Evaluation Index of Fusion System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Traditional Fusion Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Evaluation Metrics Based on the Amount of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Evaluation Metrics Based on Statistical Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Evaluation Metrics Based on the Visual System . . . . . 6.5 Performance Measure for Image Fusion Considering Region Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Performance Measure for Image Fusion . . . . . . . . . . . 6.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Region Mutual Information-Based Objective Evaluation Measure for Image Fusion Considering Robustness . . . . . . . . . 6.6.1 Performance Measure for Image Fusion . . . . . . . . . . . 6.6.2 The Image Segment Algorithm . . . . . . . . . . . . . . . . . 6.6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

297 297 298 299 299

. 303 . 308 . 308 . 309 . 311 . . . .

312 312 313 314

. . . . . .

316 316 317 319 322 323

Image Fusion Based on Machine Learning and Deep Learning . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Machine Learning Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . 7.2.4 Important Machine Learning Algorithms . . . . . . . . . . . 7.3 Image Fusion Based on Machine Learning . . . . . . . . . . . . . . . . 7.4 Deep Learning Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Image Fusion Based on Deep Learning . . . . . . . . . . . . . . . . . . . 7.6 Future Scope on AI-Based Image Fusion . . . . . . . . . . . . . . . . . 7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

325 325 327 328 329 329 329 331 337 343 347 348 348

xvi

Contents

Part II 8

9

10

Experimental Examples

Example 1: Medical Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Traditional Medical Image Fusion Methods . . . . . . . . . . . . . . 8.2.1 Local Weighted Voting . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 MV Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Global Weighted Voting . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Semi-Local Weighted Fusion . . . . . . . . . . . . . . . . . . 8.3 Recent Medical Image Fusion Methods . . . . . . . . . . . . . . . . . 8.3.1 Patch-Based Local Weighted Voting Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Patch-Based Global Weighted Fusion Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example 2: Night Vision Image Fusion . . . . . . . . . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 A Novel Night Vision Image Color Fusion Method Based on Scene Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 An Evaluation Metric for Color Fusion of Night Vision Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

355 355 359 359 361 362 365 367

. 367 . 369 . 371 . 372

. . 375 . . 375 . . 376 . . 381 . . 385 . . 385

Simulation Platform of Image Fusion . . . . . . . . . . . . . . . . . . . . . . 10.1 Image Fusion Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 Introduction to the Image Fusion Simulator . . . . . . . . 10.1.2 Functions of Each Module . . . . . . . . . . . . . . . . . . . . 10.1.3 Demonstration of Simulation . . . . . . . . . . . . . . . . . . . 10.2 Fusion Tracking Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Introduction to the Fusion Tracking Simulator . . . . . . 10.2.2 Functions of Each Module . . . . . . . . . . . . . . . . . . . . 10.2.3 Demonstration of Simulation . . . . . . . . . . . . . . . . . . . 10.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

387 387 387 388 392 395 395 397 399 402 403

About the Authors

Gang Xiao received his bachelor’s degree, master’s degree, and PhD in 1998, 2001, and 2004, respectively. He is currently a full professor at the School of Aeronautics and Astronautics and the director of the Advanced Avionics and Intelligent Information (AAII) Laboratory, Shanghai Jiao Tong University. He was a visiting scholar at the University of California, San Diego (UCSD) in 2010 and Southern Illinois University Edwardsville (SIUE) in 2014. His current research interests include image fusion, target tracking, and avionics integration and simulation. Durga Prasad Bavirisetti received both his MTech and PhD degrees from the VIT University, India, in 2012 and 2016, respectively. Currently, he is pursuing his postdoctoral studies at the School of Aeronautics and Astronautics, Shanghai Jiao Tong University, China. He is the member of Advanced Avionics and Intelligent Information Laboratory. He has published his research on image fusion in several reputed journals and conferences. Dr. Durga Prasad serves as an active reviewer for Information Fusion, IEEE Transactions on Multimedia, IEEE Transactions on Instrumentation and Measurement, IEEE Sensors, Infrared Physics and Technology, Neurocomputing, International Journal of Imaging Systems and Technology, etc. His research interests are in image fusion, target detection, and tracking. Gang Liu is a full professor at the School of Automation Engineering, Shanghai University of Electrical Power. He received his PhD degree from Shanghai Jiao Tong University in 2005. His research interests are image fusion, pattern recognition, and machine learning.

xvii

xviii

About the Authors

Xingchen Zhang received his BSc degree from the Huazhong University of Science and Technology, in 2012, and PhD degree from the Queen Mary University of London, in 2017. He is currently a postdoctoral research fellow at the School of Aeronautics and Astronautics, Shanghai Jiao Tong University, China. He is also the director of the Artiﬁcial Intelligence and Image Processing Group and Advanced Avionics and Intelligent Information (AAII) Laboratory. His current research interests include object fusion tracking, image fusion, deep learning, and computer vision.

Part I

Image Fusion Theories

Chapter 1

Introduction to Image Fusion

Abstract Human beings possess wonderful sense to appreciate visuals. Eye plays a key role in supporting various human activities. An image capture of a visual scene always conveys much more information than any other description adhered to it. Human beings have ﬁve sensing capabilities or systems as shown in Fig. 1.1. They are eyes, ears, nose, tongue, and skin. These sensors are able to acquire independent information. Eyes can visualize a scene. Ears can sense the data by listening to sounds. Nose can smell the odor of an object. Tongue can sense the taste of an object. Skin can sense the texture and size of the object. As shown in Fig. 1.2, all these ﬁve sensing systems act as sensors. Human brain collects data from these individual sensors and fuses or combines information for compact representation or better description about a scenario. This compact data is useful for decision-making and task execution. Data fusion is a process of combining information from several sources for optimal or compact representation of a huge data supporting better description and decision-making. Human brain is the best example of a data fusion system. Even when we take one sensor, for example, eye, it can derive many useful details of a scene by looking at the scenario more than once. The brain can integrate visuals and give details hidden in a single view. Multiple views will always improve the decisions. Whenever we take a snapshot of a scene with our digital camera, we will not be satisﬁed with a single image. We try to take a few more images of the same scene to have more clarity and information. It is not rare to ﬁnd that none of these images contain all the required qualities. It is common to feel that all positive aspects of these images need to be combined to get the desired image. It motivates us to fuse images for the desired output. We can use different cameras and fuse images. Likewise, many options are there. Image fusion has a long history and development phase. Research on image fusion has been carried out by adopting various mathematical tools and techniques. It is very important to introduce concepts of image fusion in a systematic way. To this end, in this chapter, fundamental concepts, categories, types, and applications of image fusion are covered. © Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 G. Xiao et al., Image Fusion, https://doi.org/10.1007/978-981-15-4867-3_1

3

4

1

(a)

Introduction to Image Fusion

(b)

(d)

(c)

(e)

Fig. 1.1 Human sensors (a) eyes, (b) ears, (c) nose, (d) tongue, (e) skin Ears

Eyes

Tongue

Brain

Nose

Skin

Decision and execution

Fig. 1.2 Human sensor fusion system for decision-making

1.1

History and Development

Image fusion has a history of more than 35 years and has contributed to the image processing community by supporting various application ﬁelds. Image fusion research ﬁled has been developing progressively by adopting various mathematical

1.1 History and Development

5

tools and techniques. Here, a brief overview of history and development is presented for an overall understanding of image fusion.

1.1.1

History

In the late nineteenth century, the United States Navy installed the ﬁrst image fusion prototype on the Memphis submarine (SSN-691), enabling the operator to directly observe all captured images of sensors in the best position. In the Gulf war, the Low Altitude Navigation and Targeting Infrared for Night (LANTIRN) pod with better combat performance was an image fusion system. In 1995, American TI company launched an image fusion system, which was able to combine the forward-looking infrared and low light of the advanced helicopter (AHP) sensor systems. In May 2000, the Boeing avionics ﬂight laboratory successfully demonstrated the multisource information fusion technology and also a function of the integrated avionics system of Joint Strike Fighter (JSF). In October 1999, the satellite jointly launched by China and Brazil “CBERS-01” used CCD camera and infrared multispectral scanner to fuse remote sensing images. At the same time, image fusion also placed a crucial role in other ﬁelds such as medical, surveillance, and robotics. In cranial radiotherapy and brain surgery of medical ﬁeld, fusion was used to combine advantages of multi-modal images. In the security inspection system, multi-source image fusion technology was a good solution for hidden weapon inspection problem. In computer vision, multi-source image fusion technology was used for scene perception of the surrounding environment and for supporting the navigation of robots.

1.1.2

Development

During 1990s, major attention in image processing was drawn by pyramid decomposition based on techniques. In 1980, Burt and Julesz [1] were the ﬁrst authors who proposed an image fusion algorithm based on pyramids for binocular images. Then, they proposed another fusion algorithm for multi-modal images [2]. Later, Burt and Kolczynski [3] developed an improved approach to fuse images based on pyramid decomposition. Some other well-known image fusion work from Alexander Toet for fusing visible and infrared images depends on various pyramids and wavelet transforms [4–7] for surveillance applications. Later, neural networks were successfully adopted for visible and infrared image fusion by Ajjimarangsee and Huntsberger [8]. An image fusion device was developed for visible and thermal images by Lillquist in 1988 [9]. An integrated analysis of infrared and visible images was implemented by Nandhakumar and Aggarwal [10] for scene interpretation. Multisensor target detection, classiﬁcation, and segmentation were addressed by Ruck et al. [11] and Rogers et al. [12] using fusion approach. At the same time, Li et al.

6

1

Introduction to Image Fusion

[13] and Chipman et al. [14] developed fusion algorithms using discrete wavelet transform. Koren et al. [15] introduced a new method to fuse multi-sensor images with the help of steerable dyadic wavelet transform. For night vision, Waxmen et al. [16, 17] proposed a visible and thermal imagery fusion algorithm based on biological color vision. Prasad [18] developed a data fusion method for visual and range data for robotics and machine intelligence. Dasarathy [19] implemented various fusion strategies for enhancing decision reliability in a multisensory environment. Numerous transform-based methods such as stationary wavelet [20], complex wavelet [21], curvelet [22], contourlet [23], and non-sub-sampled contourlet [24] were utilized for fusion. Optimization-based methods [25, 26] were also adopted for image fusion. Recently, many researchers implemented image fusion methods based on ﬁltering-based decompositions. Cross bilateral ﬁlter [27], guided image ﬁlter [28], rolling guided image ﬁlter [29], anisotropic diffusion [30], and weighted least square ﬁlter [31] based techniques are a few notables among them.

1.2

Image Fusion Fundamentals

An image is a two-dimensional quantity. It can be viewed as the combination of illumination and reﬂectance. Illumination stands for the amount of light from the source falling on the object, and reﬂectance corresponds to the amount of light that is reﬂected from the same object. A sensor is a device which converts incoming energy into an electrical signal as shown in Fig. 1.3a. In case of imaging sensors, reﬂected energy will be converted into a corresponding electrical signal. As displayed in Fig. 1.3b, the sensor array gives a large number of signals. The sampling in the spatial domain is performed by the sensor array. These signals are quantized to obtain a digital image representation. This entire process is termed as digitization which is shown in Fig. 1.3c. Thus, the visual information present in a scene can be captured as a digital image f(x, y) using a sensor array as shown in Fig. 1.3b. All elements in the sensor array will be of the same modality. Hence image capture using a sensor array is simply referred to as single sensor image capture. We may be interested in the details of a scene using multiple sensor arrays, each operates in a different wavelength range. This is simply termed as multi-sensor image capture. In the following discussion, it can be noted that the term sensor is used simply in place of a sensor array. Next, fundamental concepts such as problem speciﬁcation (necessity to combine information), deﬁnition, and objective of image fusion are described.

1.2.1

Necessity to Combine Information of Images

Single-sensor image capture may not always provide complete information about a target scene. Sometimes we need two or more images of the same scene for better

1.2 Image Fusion Fundamentals

7

Light Sensing material

Power in

Electrical signal (a)

y

x

(b) Electrical signal

Digitization process (c)

f ( x, y) Intensity value at a particular location

f (x, y)

Fig. 1.3 Single sensor imaging. (a) Single sensor, (b) Single sensor array and its corresponding image represented as a matrix, (c) Digitization process

visual understanding. These images may be captured by using a single sensor or by using multiple sensors of different modalities, depending on the application [32]. These image captures provide complementary or visually different information. A human observer cannot reliably combine and observe a composite image from

8

1

Introduction to Image Fusion Sensor Plane

Object

P2

P1

P3

Lens Focal Point

p2 p1 p3 Circle of confusion

Fig. 1.4 Image formation model

these multiple image captures. Useful or complementary information of these images should be integrated into a single image to provide a more accurate description of the scene, than any one of the individual source images. Two different examples where we need to capture multiple images and combine the required information are discussed below. The ﬁrst one is single-sensor imaging in which multiple images of the same scene are captured using a sensor to extract more details of a targeted scene. Another one is multi-sensor imaging which requires multiple images using different sensors for the same propose. 1. Example 1: Single-sensor imaging In digital photography, objects of a scene at different distances cannot be focused at the same time. If the lens of a camera focuses on an object at a certain distance, then other objects appear to be blurred. Image formation model of a sensor or a camera system is displayed in Fig. 1.4. If a point P1 on an object is focused, then a dot p1 corresponding to that particular point will be generated on the sensor plane. Therefore, all the points at the same distance of P1 from lens will appear sharp. The region of acceptable sharpness of the object is referred to as the depth of ﬁeld (DoF). Let us consider another point P2 on the object behind the point P1. Since this point P2 is out of DoF, it will generate a dot p2 somewhere before the sensor plane. As lens distance from the P1 increases, the object will appear more blurred. As shown in Fig. 1.4, P3 is located in front of P1 on the object plane. Since P3 falls out of the DoF, it produces a dot p3 behind the sensor plane, resulting in an unsharp dot on the image plane (sensor plane). For better visual quality, images should have all objects in focus. One of the best approaches to do this is capture

1.2 Image Fusion Fundamentals

Gamma Rays

X-Rays

9

Ultraviolet

Near IR

Visible((VI)

Short wave IR

Infrared (IR)

Mid wave IR

Microwave

Long wave IR

Radio

Far IR

Fig. 1.5 The electromagnetic spectrum Table 1.1 Electromagnetic wavelength range

Electromagnetic waves Gamma rays X-rays Ultraviolet Visible (VI) Infrared (IR) Microwave Radio

Wavelength, λ (m) 10 15 to 10 11 10 11 to 10 9 10 9 to 4 10 7 4 10 7 to 7 10 7 10 7 to 10 3 10 3 to 0.1 0.1 to 105

7

images with different focusing conditions and combine them to generate an all-inone focus image. Now we discuss another example where we need to acquire multiple images using different modalities. 2. Example 2: Multi-sensor imaging Visual information present in a scene can be captured as an image using a charge coupled device (CCD). The wavelength of the visible (VI) light that can be captured by CCD sensor ranges from 4 10 7 to 7 10 7 m. However, in most of the image processing and computer vision applications, CCD image alone is not sufﬁcient to provide all the details of the scene. To extract more details, complementary images of the same scene should be captured by using multiple sensors of different modalities. This can be done by capturing images in wavelengths other than the VI band of the electromagnetic spectrum. The electromagnetic spectrum is illustrated in Fig. 1.5, and the corresponding wavelengths are presented in Table 1.1. As discussed before, VI light wavelength ranges from 4 10 7 to 7 10 7 m. The infrared (IR) spectrum wavelength ranges from 7 10 7 to 10 3 m. IR spectrum is further divided into ﬁve sub-bands as near, short, mid wave, long wave, and far IR bands. Usually objects with more than 0 K emits radiations throughout the IR spectrum. The energy emitted by these objects can be sensed by IR sensors and displayed as images for the end users. However, these images alone are not sufﬁcient to provide an accurate description of the targeted scene. Hence, information from VI spectrum also needs to be integrated for better scene understanding using fusion algorithms.

10

1

1.2.2

Introduction to Image Fusion

Deﬁnition of Image Fusion

Image fusion has numerous deﬁnitions. Among them, some well-known deﬁnitions are presented here below. Produce a single image from a set of input images. The fused image should have complete information which is more useful for human or machine perception [33].

or Generation of a result which describes the scene better than any images captured in a single shot [34, 35].

or Image fusion is a process of combining images, obtained by sensors of different wavelengths simultaneously viewing of the same scene, to form a composite image. The composite image is formed to improve image content and to make it easier for the user to detect, recognize, and identify targets and increase his situational awareness [36].

or Image fusion is the process of merging or combining or integrating useful or complementary information of several source images such that the resultant image provides a more accurate description about the scene than any one of the individual source images [37].

1.2.3

Image Fusion Objective

Image fusion process can also be explained using set theory representation [38, 39] as transferring of information of two sets. The Venn diagram representation of this process is shown in Fig. 1.6. Each set stands for the information contribution corresponding to that particular image. Sets A and B represent the information contribution by the two source images A and B, respectively. Set F corresponds to the information contribution by the fused image F. Ideally, this fused image should contain all the information from source images. However, it is not possible. In practice, not all the source image information is transferred into the fused image. Only required and necessary information will be transferred. Information loss of source images may occur during the fusion process. Simultaneously, fusion process itself may introduce extra information or false information called “fusion artifacts” into the fused image. In Fig. 1.6, blue portion represents the information transferred from source images in the fused image which is simply referred to as “fusion gain or fusion score.” Green portion indicates the information lost during the fusion process which is termed as “fusion loss.” This information in source images is not present in the fused image, and red portion corresponds to unnecessary information (fusion artifacts) introduced in the fused image. It has no relevance to the source images. Hence, fusion algorithm should consider all these factors for better performance.

1.3 Categorization

11

Fig. 1.6 Graphical illustration of image fusion process

The main objective of an image fusion algorithm is to generate a visually good fused image with less computational time, by maximizing the fusion gain and minimizing the fusion loss and fusion artifacts.

1.3

Categorization

Image fusion algorithms are broadly divided into three categories: pixel, feature, and decision levels. Pixel-level fusion is performed on each input image pixel by pixel. Pixel-level fusion methods can be implemented in the spatial domain [40, 41] or in a transform domain [6, 13, 14]. In the spatial domain, these methods can be implemented pixel by pixel. However, transform domain methods work by a coefﬁcient. For a small change in the frequency coefﬁcient, the whole resultant image will be effected. To obtain a better fused image without artifacts, best transform technique with a suitable fusion rule should be chosen. Substantial work has been contributed at pixel level because of their effectiveness and ease of implementation compared to other level fusion schemes. At feature level, fusion is executed on the extracted features of source images. Feature-level fusion schemes usually consider segmented regions based on different properties such as entropy-, variance-, and activity-level measurements [21, 42, 43]. These algorithms give a robust performance in the presence of noise [44]. At decision level, fusion is performed on probabilistic decision information of local decision makers. These decision makers are in turn derived from the extracted features. These fusion techniques integrate information from source images based on decision maps derived from the features. Relational graph matching is used for image fusion by Williams et al. [45], and organization of relational model is also used for decision level fusion by Shapiro [46].

12

1

Introduction to Image Fusion

Number of articles published in Ei Compendex Web

Key words: image fusion 16000

14858

14000 12000 10000

8784

8000 6000 3900 4000 1805

2000

98

236

664

1980-1984

1985-1989

1990-1994

0 1995-1999

2000-2004

2005-2009

2010-2015

(a) Key words: video fusion Number of articles published in Ei Compendex Web

3000 2510 2500 2000

1702

1500 1000 624 500

210 9

12

54

1980-1984

1985-1989

1990-1994

0 1995-1999

2000-2004

2005-2009

2010-2015

(b) Fig. 1.7 Number of articles published in Ei Compendex Web for a duration of 35 years from 1980 to 2015. Keywords used for searching are (a) image fusion and (b) video fusion

Even though image and video fusion research has started 35 years back, today lot of contributions are still happening in this area because of its diverse applications. The number of articles published in Ei Compendex engineering literature database from 1980 to 2015 for a duration of 35 years are displayed in Fig. 1.7a, b. The keywords used for searching are “image fusion” and “video fusion.” From the statistics, it is obvious that image and video fusion is an active research area and is following an increasing trend.

1.4 Fundamental Steps of an Image Fusion System

1.4

13

Fundamental Steps of an Image Fusion System

An image fusion system mainly consists of eight fundamental steps as shown in Fig. 1.8. They are: (1) image acquisition, (2) pre-processing, (3) image registration, (4) image fusion, (5) post-processing, (6) fusion performance evaluation, (7) storage, and (8) display. 1. During image acquisition stage, visually different or complementary images will be captured using a single sensor or multiple sensors of different modalities. 2. In the pre-processing step, noise or artifacts introduced in the source images during image acquisition process are removed or reduced. 3. Image registration is the process of aligning or arranging more than one images of the same scene according to a coordinate system. In this process, one of the source images will be taken as a reference image. It is also termed as the ﬁxed image. Then geometric transformation will be applied on remaining source images to align them with the reference image. 4. Fusion process can be performed at three levels [32]: pixel, feature, and decision level. Pixel-level fusion schemes are preferable for fusion compared to another level of approaches because of their effectiveness and ease of implementation. 5. During the fusion process, some required information of source images may be lost, and visually unnecessary information or artifacts may be introduced into the fused image. Hence, fusion algorithms need to be assessed and evaluated for better performance. This performance analysis can be carried out by evaluating them qualitatively by visual inspection and quantitatively using fusion metrics. 6. In post-processing, fused images are further processed depending on the application. This processing may involve segmentation, classiﬁcation, and feature extraction. 7. Source images, fused images, post-processing results, and their corresponding data will be stored with help of storage devices such as hard disks and ﬂash memories. 8. Finally, fused images and post-processing results such as segmented images, features, and classiﬁcation results can be displayed using devices like LCD and LED monitors.

Fusion performance

Image Acquisition

Preprocessing

Image registration

Image fusion

PostProcessing

Display Storage

Fig. 1.8 Fundamental steps in image fusion system

14

1

1.5

Introduction to Image Fusion

Types of Image Fusion Systems

Image fusion systems are broadly classiﬁed into single-sensor image fusion system (SSIF) (Fig. 1.9a) and multi-sensor image fusion system (MSIF) (Fig. 1.9b). In SSIF, using a single sensor, the sequence of images of the same scene are captured, and useful information of these several images is integrated into a single image by the process of fusion. In noisy environment and improper illumination conditions, human observers may not be able to detect objects of interest which can be easily found from fused images of that targeted scene. Digital photography applications such as multi-focus imaging and multi-exposure imaging [47] come under SSIF. However, these fusion systems have their drawbacks. They depend on conditions like illumination and dynamic range of the sensors. For example, VI sensor like the

(a) Sensor

Scene Fusion

Fused image

Image captures

(b)

Sensor 1

Sensor 2 Scene Sensor n

Fusion

Fused image

Fig. 1.9 Types of image fusion systems: (a) SSIF system and (b) MSIF system

1.6 Applications

15

digital camera can capture visually good images in high-illumination conditions. However, they fail to capture under improper illumination conditions such as night, fog, and rain. To overcome the shortcomings of SSIF, MSIF systems are introduced to capture images in adverse environment conditions. In MSIF, multiple images of the same scene are captured using various sensors of different modalities to acquire complementary information. For example, VI sensors are good in high-lighting conditions. However, IR sensors are able to capture images in low-lighting conditions. Required and necessary information of these images is combined into a single image by the fusion process. Applications such as medical imaging, military, navigation, and concealed weapon detection fall under MSIF category. Various advantages of MSIF systems are as follows: 1. Reliable and accurate information. These MSIF systems provide a reliable and accurate description of the scene compared to source images. 2. Robust performance. Even if one sensor of the MSIF fails, this system generates a composite image by considering the redundant information of other working sensors. So it is robust. 3. Compact representation. The fused image of the MSIF is compact and provides all the necessary information of source images in a single image. 4. Extended operating range. The range of operation is extended by capturing images at different operating conditions of the sensors. 5. Reduced uncertainty. Combined information of various sensors reduces the uncertainty present in individual captures of the scene.

1.6

Applications

Image fusion ﬁnds applications in various ﬁelds such as digital photography [33, 47], medical imaging [48], remote sensing [34], concealed weapon detection [27], military [49], night vision [16], autonomous vehicles [50], visual inspection in industrial plants [51], ambient intelligence [52], and person re-identiﬁcation [53]. As shown in Fig. 1.10, we consider four scenarios like digital photography, medical imaging, concealed weapon detection, and military to explain how image fusion is useful in these applications. In digital photography [33, 47], a scene cannot be focused at the same time due to inherent system limitations. If we focus on one object, we may lose information about other objects and vice versa. Figure 1.10a shows foreground and background focused images of a bottle dataset. Foreground focused image provides information about the bottle in the foreground, whereas background focused image gives information of the bottle in the background of the same scene. These individual images do not provide complete information about the targeted scene. For better visual understanding, focused regions of these two images have to be combined to result in an allin-one focused image.

16

1

Introduction to Image Fusion

(a)

(b)

(c)

(d)

Fig. 1.10 Fusion results. (a) Multi-focus imaging, (b) medical imaging, (c) concealed weapon detection, (d) battle ﬁeld monitoring in military

In medical imaging, different modalities like positron emission tomography (PET), single-photon emission tomography (SPECT), computer tomography (CT), and magnetic resonance imaging (MRI) are used to capture complementary information. These individual image captures do not provide all required details. Therefore, information from different captures has to be incorporated into a single image. Figure 1.10b shows CT and MR images of a human brain. As shown in Fig. 1.10b,

1.6 Applications

17

CT can capture bone structure or hard tissue information, whereas MRI can capture soft tissue information present in the brain. For a radiologist, a fused image obtained from these two images will be helpful in computer-assisted surgery and radio surgery for better diagnosis and treatment. In concealed weapon detection, visible light (VI) and millimeter wave (MMW) sensors are used to capture complementary images. In Fig. 1.10c, the left one is a VI image and the middle one is an MMW image. VI image conveys information of three persons. However, it is not providing any sign of the existence of a weapon. MMW image conveys the weapon information alone. From these individual images, it is difﬁcult to identify, which person concealed the weapon. To accurately locate and detect the weapon, useful information from these complementary images has to be combined in a single image. In military and navigation, VI and IR imaging sensors are used to acquire complementary information of the targeted scene. Due to bad weather circumstances, such as rain and foggy winter, the images captured using VI sensors alone are not sufﬁcient to provide the essential information about a situation. VI image is able to provide background details such as vegetation, texture, area, and soil. In contrast, IR sensors provide information about the foreground of weapons and enemy and vehicle movements. For the detection and localization of a target as well as improvement of situational awareness, information from both IR and VI images needs to be merged in a single image. In Fig. 1.10d, the ﬁrst image is VI output and the second one is IR image of a battle ﬁeld. VI image provides the information of a battle ﬁeld. However, it is incapable of identifying the person near fencing. IR image identiﬁes the person existence but cannot provide sufﬁcient visual information of the battle ﬁeld. If we integrate useful information from these images in a single image, then we can easily identify and localize enemy or target. Hence, for a better understanding of the scene, we need to combine essential visual information of source images to obtain a meaningful image. As we discussed, image fusion [32] is a phenomenon of integrating useful information of source images into the fused image. In Fig. 1.10, the third images from left are the fused images of various applications. An all-in-one focused image in Fig. 1.10a, obtained from two out-of-focus images provides visually more information. The fused image in Fig. 1.10b would assist a radiologist in better diagnosis and treatment than individual CT and MR images. The combined image in Fig. 1.10c is giving information about the person as well as the concealed weapon. From the fused image in Fig. 1.10c, one can say that the third person from the left concealed the weapon inside his shirt. From the fused image in Fig. 1.10d, one can identify an enemy moment on the battle ﬁeld near the fencing. In the following chapters, these applications are explained further for in-depth understanding.

18

1.7

1

Introduction to Image Fusion

Summary and Outline of the Book

In this chapter, an overview of image fusion is presented. In particular, history, development, problem speciﬁcation, deﬁnitions, objectives, and categories of image fusion are explained in detail. This chapter also provides an overview of image fusion system, and their components as well as types. In addition, it also gives a brief summary of image fusion applications. The remaining contents of the book are organized as follows. Chapters 2–4 contribute an in-depth discussion on pixel-, feature-, and decision-level fusion, respectively. Chapter 5 provides a detailed discussion of multi-sensor dynamic image fusion. Chapter 6 summarizes the existing fusion quantitative metrics and also introduces new metrics developed by the authors. Nowadays, machine learning especially deep learning is making signiﬁcant changes in the entire image processing community including image fusion. In Chap. 7, an attempt is made to give an overview of image fusion based on these concepts. As mentioned before, another important aspect of this book is experimental examples. Chapters 8 and 9 present experimental examples of medical imaging and night vision. Finally, an image fusion platform and a fusion tracking platform are introduced in Chap. 10.

References 1. P. Burt, B. Julesz, A disparity gradient limit for binocular fusion. Science 208, 615–617 (1980) 2. P.J. Burt, E.H. Adelson, The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983) 3. P.J. Burt, R.J. Kolczynski, Enhanced image capture through fusion. ICCV, 173–182 (1993) 4. A. Toet, Image fusion by a ration of low-pass pyramid. Pattern Recogn. Lett. 9(4), 245–253 (1989) 5. A. Toet, V. Ruyven, Merging thermal and visual images by a contrast pyramid. Opt. Eng. 28(7), 789–792 (1989) 6. A. Toet, Hierarchical image fusion. Mach. Vis. Appl. 3(1), 1–11 (1990) 7. A. Toet, Adaptive multi-scale contrast enhancement through non-linear pyramid recombination. Pattern Recogn. Lett. 11(11), 735–742 (1990) 8. P. Ajjimarangsee, T.L. Huntsberger, Neural network model for fusion of visible and infrared sensor outputs, in Sensor Fusion: Spatial Reasoning and Scene Interpretation, vol. 1003, (1989), pp. 153–161 9. R.D. Lillquist, Composite visible/thermal-infrared imaging apparatus, Google Patents, 14 Jun 1988 10. N. Nandhakumar, J.K. Aggarwal, Integrated analysis of thermal and visual images for scene interpretation. IEEE Trans. Pattern Anal. Mach. Intell. 10(4), 469–481 (1988) 11. D.W. Ruck, S.K. Rogers, J.P. Mills, M. Kabrisky, Multisensor target detection and classiﬁcation. Sens. Fusion 931, 14–22 (1988) 12. S.K. Rogers, C.W. Tong, M. Kabrisky, J.P. Mills, Multisensor fusion of ladar and passive infrared imagery for target segmentation. Opt. Eng. 28(8), 288881 (1989)

References

19

13. H. Li, B.S. Manjunath, S.K. Mitra, Multisensor image fusion using the wavelet transform. Graph. Models Image Process. 57, 235–245 (1995) 14. L.J. Chipman, T.M. Orr, L.N. Graham, Wavelets and image fusion, in Proceedings of International Conference on Image Processing, 1995, vol. 3, (1995), pp. 248–251 15. I. Koren, A. Laine, F. Taylor, Image fusion using steerable dyadic wavelet transform, in Proceedings of International Conference on Image Processing, 1995, vol. 3, (1995), pp. 232–235 16. A.M. Waxman et al., Color night vision: Fusion of intensiﬁed visible and thermal IR imagery, in Synthetic Vision for Vehicle Guidance and Control, vol. SPIE-2463, (1995), pp. 58–68 17. A.M. Waxman et al., Color night vision: Opponent processing in the fusion of visible and IR imagery. Neural Netw. 10(1), 1–6 (1997) 18. K.V. Prasad, Data fusion in robotics and machine intelligence. Control. Eng. Pract. 1(4), 753–754 (1993) 19. B.V. Dasarathy, Fusion strategies for enhancing decision reliability in multisensor environments. Opt. Eng. 35(3), 603–616 (1996) 20. O. Rockinger, Image sequence fusion using a shift-invariant wavelet transform. Proc. Int. Conf. Image Process. 3, 288–291 (1997) 21. P. Hill, N. Canagarajah, D. Bull, Image fusion using complex wavelets, in 13th Br. Mach. Vis. Conf., (2002), pp. 487–496 22. M. Choi, R.Y. Kim, M.R. Nam, H.O. Kim, Fusion of multispectral and panchromatic satellite images using the curvelet transform. IEEE Geosci. Remote Sens. Lett. 2(2), 136–140 (2005) 23. M. Qiguang, W. Baoshu, A novel image fusion method using contourlet transform, in International Conference on Communications, Circuits and Systems Proceedings, 2006, vol. 1, (2006), pp. 548–552 24. B.Y. Bin Yang, S.L.S. Li, F.S.F. Sun, Image fusion using nonsubsampled contourlet transform, in Fourth Int. Conf. Image Graph. (ICIG 2007), (2007), pp. 719–724 25. R. Shen, I. Cheng, J. Shi, A. Basu, Generalized random walks for fusion of multi-exposure images. IEEE Trans. Image Process. 20(12), 3634–3646 (2011) 26. M. Xu, H. Chen, P.K. Varshney, An image fusion approach based on Markov random ﬁelds. IEEE Trans. Geosci. Remote Sens. 49(12), 5116–5127 (2011) 27. B.K. Shreyamsha Kumar, Image fusion based on pixel signiﬁcance using cross bilateral ﬁlter. Signal Image Video Process., 1193–1204 (2013) 28. S. Li, X. Kang, J. Hu, Image fusion with guided ﬁltering. IEEE Trans. Image Process. 22(7), 2864–2875 (2013) 29. A. Toet, M.A. Hogervorst, Multiscale image fusion through guided ﬁltering, in SPIE Security + Defence, (2016), pp. 99970J–99970J 30. D.P. Bavirisetti, R. Dhuli, Fusion of infrared and visible sensor images based on anisotropic diffusion and Karhunen-Loeve transform. IEEE Sensors J. 16(1), 203–209 (2016) 31. Y. Jiang, M. Wang, Image fusion using multiscale edge-preserving decomposition based on weighted least squares ﬁlter. IET Image Process. 8(3), 183–190 (2014) 32. A. Ardeshir Goshtasby, S. Nikolov, Image fusion: Advances in the state of the art. Inf. Fusion 8 (2 SPEC Issue), 114–118 (2007) 33. Z. Zhang, R.S. Blum, A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application. Proc. IEEE 87(8), 1315–1326 (1999) 34. C. Pohl, J.L. Van Genderen, Multisensor image fusion in remote sensing: Concepts, methods and applications. Int. J. Remote Sens., 37–41 (2010) 35. M.B.A. Haghighat, A. Aghagolzadeh, H. Seyedarabi, Multi-focus image fusion for visual sensor networks in DCT domain. Comput. Electr. Eng. 37(5), 789–797 (2011) 36. Q. Miao, J. Lou, P. Xu, Image fusion based on NSCT and Bandelet transform, in Proceedings of the 2012 8th International Conference on Computational Intelligence and Security, CIS 2012, (2012), pp. 314–317

20

1

Introduction to Image Fusion

37. D.P. Bavirisetti, R. Dhuli, Fusion of infrared and visible sensor images based on anisotropic diffusion and Karhunen-Loeve transform. IEEE Sens. J. 16(1) (2016) 38. C.S. Xydeas, Objective image fusion performance measure. Electron. Lett. 36(4), 308–309 (2000) 39. V. Petrovic, C. Xydeas, Objective image fusion performance characterisation, in Tenth IEEE International Conference on Computer Vision, 2005. ICCV 2005, (2005), pp. 1866–1871 40. A.A. Goshtasby, 2-D and 3-D Image Registration: For Medical, Remote Sensing, and Industrial Applications (John Wiley & Sons, Hoboken, NJ, 2005) 41. S. Li, J.T. Kwok, Y. Wang, Using the discrete wavelet frame transform to merge Landsat TM and SPOT panchromatic images. Inf. Fusion 3(1), 17–23 (2002) 42. Z. Zhang, R. Blum, Region-based image fusion scheme for concealed weapon detection. Annu. Conf. Inf. Sci. Syst., 168–173 (1997) 43. G. Piella, A general framework for multiresolution image fusion: From pixels to regions. Inf. Fusion 4(4), 259–280 (2003) 44. G. Piella, A region-based multiresolution image fusion algorithm, in Proc. Fifth Int. Conf. Inf. Fusyion. FUSION 2002. (IEEE Cat.No.02EX5997), vol. 2, (2002), pp. 1557–1564 45. M.L. Williams, R.C. Wilson, E.R. Hancock, Deterministic search for relational graph matching. Pattern Recogn. 32(7), 1255–1271 (1999) 46. L.G. Shapiro, Organization of relational models, in Proceedings-International Conference on Pattern Recognition, (1982) 47. S. Li, X. Kang, Fast multi-exposure image fusion with median ﬁlter and recursive ﬁlter. IEEE Trans. Consum. Electron. 58(2), 626–632 (2012) 48. Q. Guihong, Z. Dali, Y. Pingfan, Medical image fusion by wavelet transform modulus maxima. Opt. Express 9(4), 184–190 (2001) 49. W. Gan et al., Infrared and visible image fusion with the use of multi-scale edge-preserving decomposition and guided image ﬁlter. Infrared Phys. Technol. 72, 37–51 (2015) 50. Q. Li, L. Chen, M. Li, S.L. Shaw, A. Nüchter, A sensor-fusion drivable-region and lanedetection system for autonomous vehicle navigation in challenging road scenarios. IEEE Trans. Veh. Technol. 63(2), 540–555 (2014) 51. B. Majidi, B. Moshiri, Industrial assessment of horticultural products’ quality using image data fusion, in Proceedings of the 6th International Conference on Information Fusion, FUSION 2003, vol. 2, (2003), pp. 868–873 52. H. Irshad, M. Kamran, A.B. Siddiqui, A. Hussain, Image fusion using computational intelligence: A survey, in 2009 Second Int. Conf. Environ. Comput. Sci., (2009), pp. 128–132 53. L. Zheng, S. Wang, L. Tian, F. He, Z. Liu, Q. Tian, Query-adaptive late fusion for image search and person re-identiﬁcation, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2015), pp. 1741–1750

Chapter 2

Pixel-Level Image Fusion

Abstract Image fusion can be performed at various levels of information representation. A generic classiﬁcation is given as pixel, feature, and decision levels. In this chapter, our focus in on pixel-level image fusion. The pixel-level fusion mainly ﬁnds applications in image processing where human perception is given due priority than machine vision. Possible applications are digital photography, medical imaging, remote sensing, surveillance, pilot navigation, and so on. In this chapter, ﬁrst a brief introduction, different categories, and traditional pixel-level image fusion approaches are presented. Next, new image fusion methods based on pyramids, wavelet ﬁlter banks, and edge-preserving decomposition methods are introduced. Both qualitative and quantitative analyses are considered for in-depth discussion on these methods.

2.1

Introduction

If source images are combined by performing pixel-wise operations, then it is referred to as pixel-level image fusion. The main objective of any pixel-level image fusion algorithm is to generate a visually good fused image with less computational time along with the following properties: 1. It has to transfer complementary or useful information of source images into the composite image. 2. It should not lose source image information during the fusion process. 3. It should not introduce artifacts into the fused image. In the view of image fusion objective, over the past few decades, various fusion algorithms have been proposed. As shown in Fig. 2.1, image fusion is broadly classiﬁed as single-scale and multi-scale fusion methods.

© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 G. Xiao et al., Image Fusion, https://doi.org/10.1007/978-981-15-4867-3_2

21

22

2 Pixel-Level Image Fusion

Fig. 2.1 A generic classiﬁcation of image fusion methods

Image fusion methods

Single-scale

2.1.1

Multi-scale

Single-Scale Image Fusion

Here, the fusion is performed on present scale source images without further decomposition. These are also referred to spatial domain techniques. A brief literature on spatial domain methods is tabulated with their advantages and drawbacks in Table 2.1. Simple operators (Ardeshir and Nikolov [1]) such as average, weighted average, minimum, maximum, and morphological operators are used for fusion. In the simple average method, the resultant fused image F(i, j) is obtained by calculating the pixel-wise average operation on input images A(i, j) and B(i, j), as in Eq. (2.1). F ði, jÞ ¼ ðAði, jÞ þ Bði, jÞÞ=2

ð2:1Þ

In the weighted average method, the resultant fused image F(i, j) is obtained by computing the pixel-wise weighted average operation on input images, as in Eq. (2.2). F ði, jÞ ¼

m X n X i¼0

wAði, jÞ þ ð1 wÞBði, jÞ

ð2:2Þ

j¼0

where w is the weight factor. In the selective maximum method, the resultant fused image F(i, j) is obtained by applying the pixel-wise maximum operation on input images. F ði, jÞ ¼ max ðAði, jÞ, Bði, jÞÞ

ð2:3Þ

In the selective minimum method, the resultant fused image is obtained by calculating the pixel-wise minimum operation on input images. F ði, jÞ ¼ min ðAði, jÞ, Bði, jÞÞ

ð2:4Þ

These methods are easy to implement. But, they may produce brightness or color distortions into the fused image. Principal component analysis (PCA) [2], independent component analysis (ICA) [3], and intensity-hue-saturation (IHS) [4] are some of the well-known methods in

2.1 Introduction

23

Table 2.1 Single-scale fusion methods, their advantages, and drawbacks Spatial domain methods Average, minimum, maximum, and morphological (Ardeshir and Nikolov [1]) operators Principal component analysis (PCA) (Yonghong [2]), independent component analysis (ICA) (Mitianoudis and Stathaki [3]), intensity-hue-saturation (IHS) (Tu et al. [4]) Focus measure (Huang and Jing [5]), bilateral sharpness criteria (Tian et al. [6]) Optimization methods (Shen et al. [7]; Xu and Varshney [8])

Advantages Easy to implement

Drawbacks Reduce the contrast or produce brightness or color distortions

Computationally efﬁcient

May suffer from spectral distortion. May give desirable results for only few fusion datasets

May produce desirable results

Applicable to a few datasets. Computationally expensive

May produce desirable results

Take multiple iterations. Computationally expensive. Over smoothen fused image

the spatial domain category. These methods may suffer from the spectral distortion and give desirable results only for a few fusion datasets. Focus measure-based approaches [5] are famous in this class. Here, source images are divided into blocks, and various focus measures are employed to select the best among image blocks. Variance, energy of image gradient, Tenenbaum algorithm (Tenengrad), energy of Laplacian (EOL), sum-modiﬁed Laplacian (SML), and spatial frequency (SF) are various focus measures successfully used for the fusion. In Huang and Jing [5], it is observed that SML gives superior performance compared to other focus measures. However, it is computationally expensive. To address this problem, the bilateral gradient-based sharpness criterion (BGS) (Tian et al. [6]) is used for fusion. But, it failed to produce a better focused image. In addition, it is computationally demanding. To overcome these problems, optimization-based fusion Schemes [7, 8] are proposed. These methods take multiple iterations to ﬁnd an optimal solution (fused image). These optimization methods may over smooth the fused image because of multiple iterations.

2.1.2

Multi-Scale Image Fusion

Multi-scale fusion methods are developed to overcome drawbacks of the single-scale fusion. Multi-scale decomposition (MSD) extracts the salient information (visually signiﬁcant information) of source images for the fusion purpose. MSD methods perform better than single-scale fusion methods due to following facts: 1. Human visual system (HVS) is sensitive to changes in the saliency information such as edges and lines. These features can be well extracted and fused with help of the MSD.

24

2 Pixel-Level Image Fusion

Fig. 2.2 The multi-scale image decomposition (MSD) (ψ)

2. It offers better spatial and frequency resolution. In the MSD, source images are decomposed into approximation and detail coefﬁcients/layers at several scales. For example, as displayed in Fig. 2.2, a given source image I is decomposed into approximation coefﬁcient C 1A and detail coefﬁcient C1D at level 1 (L1). C1A at L1 is further decomposed into C 2A and C 2D at level 2 (L2). In general, approximation and detail coefﬁcients at nth decomposition level can be represented as C nA and C nD , respectively. This MSD process is called as analysis, and it is represented as ψ. Its inverse process is termed as the synthesis which is indicated as ψ 1. In the multi-scale image fusion, source images I1 and I2 are decomposed into approximation and detail coefﬁcients as shown in Fig. 2.3. Approximation and detail coefﬁcients of the source image I1 at ith level are 1 1 and CiD , respectively. Similarly for I2, they can be represented as CiA i 2 i 2 represented as C A and C D . This decomposition process is called as analysis. Fusion is performed on these decomposed coefﬁcients to obtain ﬁnal approximation F F coefﬁcient CiA and detail coefﬁcient CiD by employing various fusion rules.

2.1 Introduction

25

Fig. 2.3 A general block diagram of the multi-scale image fusion

All these fused coefﬁcients at different levels will be combined to obtain the fused image F. This process is termed as synthesis. Multi-scale image fusion methods are further classiﬁed as: 1. Pyramid-based fusion. 2. Wavelet transform-based fusion. 3. Filtering-based fusion.

2.1.2.1

Pyramid-Based Fusion

During 1990s, pyramid decomposition-based fusion methods are introduced by Akerman [9]. The basic idea of these methods is explained as follows: First, decompose source images into successive sub-images using some operations such as blurring and down sampling. Next, apply fusion rules on these decomposed sub-images. Finally, reconstruct the fused image from these fused sub-images. The general block diagram of the pyramid based fusion is depicted in Fig. 2.4. As shown in the ﬁgure, source images I1 and I2 are blurred using linear ﬁlter and down sampled by 2 along rows and columns. This process is given as

C iþ1 P

1

¼ ¼

h h

CiP ðx, yÞ CiP ðx, yÞ

1 2

wðx, yÞ wðx, yÞ

i i

#2 #2

, ,

Ciþ1 P

2

i ¼ 0, 1, 2, . . . N:

ð2:5Þ

1 where Ciþ1 represents sub-images obtained from pyramid decomposition of the P source image I1 at (i + 1) ‐ th level which depends on its previous level sub-image i 1 1 represents the input image I1. The convolution operation is CP ðx, yÞ . C0P represented by . w is a linear ﬁlter, and Nrepresents the number of levels. The same is true for the source image I2 as well. Various fusion rules can be employed on these F at various levels Li. decomposed sub-images to obtain fused sub-images C iP Then pyramid is reconstructed back from these fused sub-images to get the fused image F.

26

2 Pixel-Level Image Fusion

Fig. 2.4 A general block diagram of pyramid-based image fusion

Gradient (GRAD) [10], Laplacian [11], morphological difference [12], ratio (RATIO) [13], contrast [14], and ﬁlter subtract decimate (FSD) pyramid [10, 11] based methods are well known methods in this class. These methods may produce halo effects near edges.

2.1 Introduction

2.1.2.2

27

Wavelet Transform-Based Fusion

Succeeding fusion schemes in the multi-resolution category is discrete wavelet transform (DWT) decompositions [15]. DWT is preferred over pyramid due to various advantages. It provides compact representation and directional information of a given image. These qualities make DWT suitable for the purpose of fusion. Wavelet methods produce less blocking effects when compared to pyramid methods. In DWT, each source image is decomposed into wavelet coefﬁcients at various levels. By using different fusion rules, these wavelet coefﬁcients are fused. Finally, inverse wavelet transform is applied on these ﬁnal fused wavelet coefﬁcients to obtain the desired fused image. A general block diagram of wavelet-based fusion is shown in Fig. 2.5. As depicted in the ﬁgure, source image I1 is decomposed into a set of four sub-images in various directions. {CLL}1 is an approximation coefﬁcient of I1 which represents the low frequency content. {CLH}1, {CHL}1, and {CHH}1 indicate detail coefﬁcients in horizontal, vertical, and diagonal directions, respectively. Approximation coefﬁcient {CLL}1 at level 1 will be further decomposed into approximation coefﬁcient and detail coefﬁcients in horizontal, vertical and diagonal directions at level 2, and so on. The same thing is true for source image I2 as well. By employing various fusion rules, wavelet coefﬁcients of source images at various levels will be combined. From these fused wavelet coefﬁcients, ﬁnal fused image F is generated by employing inverse wavelet transform on them. DWT is shift variant because of its multi-rate operations. This shift variant property may introduce some artifacts in the fused image. To overcome problems of the DWT, shift invariant discrete wavelet transform (SIDWT) has been introduced. SIDWT image fusion method can be found in Rogers [16]. Image fusion is

Fig. 2.5 A general block diagram of wavelet-based image fusion

28

2 Pixel-Level Image Fusion

also carried out using recent transforms like curvelet transform [17], non-subsampled contourlet [18], multi-resolution singular value decomposition (MSVD) [19], high-order singular value decomposition [20], empirical mode decomposition [21], discrete cosine harmonic wavelet transform (DCHWT) [22], shearlets [23], and so on.

2.1.2.3

Filtering-Based Fusion

Next category in multi-scale fusion is ﬁltering-based techniques. These techniques use image ﬁltering techniques such as edge-preserving decomposition ﬁlters (EPD) [24] and non-EPD (e.g., average ﬁlter) to perform decomposition process (this ﬁltering concepts will be discussed thoroughly in later sections). In this category of fusion, each source image is decomposed into approximation/base layers containing large-scale variations in intensity and detail layers containing smallscale variations in intensity. Separate fusion rules are employed to combine these decomposed base and detail layers. Finally, these ﬁnal base and detail layers are combined to generate the fused image. A generic block diagram of ﬁltering based fusion methods is shown in Fig. 2.6. Here, ψ and ψ 1 represent the ﬁltering-based image decomposition and reconstruction process, respectively. First, two source images fI n g2n¼1 are decomposed into base layers containing large-scale variations in intensity and detail layers containing small scale variations in intensity as

Bnkþ1

2 n¼1

2 ¼ Bkn n¼1 w,

where

k ¼ 0, 1, . . . K

Fig. 2.6 A general block diagram of ﬁltering-based image fusion

ð2:6Þ

2.2 Pyramid Image Fusion Method Based on Integrated Edge and Texture Information

29

where Bnkþ1 is the base layer of n-th source image at k + 1 level which depends on its previous level base layer Bkn . B0n represents the n-th input image In. The convolution operation is represented by . w is an image ﬁlter. It can be an EPD or non-EPD ﬁlter. K is the number of levels. The detail layers Dnkþ1 at present level k + 1 are obtained by subtracting base layers Bkn at previous level kfrom base layers Bnkþ1 at present level k + 1. Dnkþ1 ¼ Bnkþ1 Bkn :

ð2:7Þ

EPD ﬁlters decompose source images into base and detail layers while preserving the edge information. Thus, these ﬁlters may provide desirable decompositions for the purpose of fusion. Weighted least square ﬁlter [25], L1 ﬁdelity using L0 gradient [26], L0 gradient minimization [27], combination of weighted least square and guided image ﬁlters [24], guided image ﬁlter [28], and cross bilateral ﬁlter (CBF) [29] are recently proposed EPD-based fusion methods. In this chapter, new multi-scale image fusion methods are presented to address problems of existing methods. They are: 1. Pyramid image fusion method based on integrated edge and texture information. 2. Image fusion method based on the expected maximum and discrete wavelet frames. 3. Image fusion method based on optimal wavelet ﬁlter banks. 4. Anisotropic Diffusion-based Fusion for infrared and visible sensor images (ADF). 5. Two-scale Image Fusion of visible and infrared images using visual saliency detection (TIF). 6. Maximum Symmetric Surround Saliency detection-based multi-focus image Fusion (MSSSF). Out of these six methods, ﬁrst one is based on pyramid decomposition. Second and third are developed in the wavelet domain. Fourth method is an EPD-based fusion. Fifth and sixth are non-EPD-based image fusion methods.

2.2 2.2.1

Pyramid Image Fusion Method Based on Integrated Edge and Texture Information Background

The simplest method of image fusion is to weighted average the original images. The advantage of this method is that it is simple and has good real-time performance, but at the same time, it has the negative effect of reducing the contrast of the image. The

30

2 Pixel-Level Image Fusion

image pyramid decomposition technique closely resembles the observation of things in the human visual system. The image pyramid decomposition techniques include image wavelet transformation, multi-speed ﬁlter representation, and pyramid transformation. Among them, the fusion method based on the change of the pyramid may become the most promising fusion method. Although wavelet exhibits many advantages in image representation, such as orthogonality, direction sensitivity, and noise reduction performance, it outperforms the pyramid decomposition method. However, due to the existence of asymmetric wavelet, its performance in invariance of transformation is worse, which affects the effect of image fusion. Therefore, a new pyramid image fusion method based on pyramid decomposition is developed to avoid the disadvantage of poor shift invariance due to wavelet and to make up for the shortcomings of pyramid image fusion of traditional pyramid structure in extracting texture and edge features.

2.2.2

Fusion Framework

Through the pyramid image fusion method that combines edge and texture information, the fused image quality can be improved and the ideal practical effect can be achieved. The pyramid decomposition of the original image is obtained by the Gaussian ﬁlter, considering the linear relationship between the binarized Gaussian ﬁlter, the texture extraction ﬁlter, and the edge extraction ﬁlter. The corresponding coefﬁcients of the texture and the edge image are obtained by using the singular value decomposition method. Using the feature information of each scale image, each layer of the decomposed image is represented. The fusion process is as follows: First, calculate the similarity measures and saliency measures of each pair of texture and edge images of two images, adopt suitable fusion strategies (select maximum or weighted average) according to the size of saliency measures and then get a set of texture and edge pyramid representation of the image under such a fusion strategy, and ﬁnally the ﬁnal fusion of the image is reconstructed.

2.2.3

Pyramid Image Fusion of Edge and Texture Information-Speciﬁc Steps

The pyramid image fusion method using the integrated edge and texture information is shown in the ﬂowchart of Fig. 2.7. The decomposition and reconstruction structure used is shown in Fig. 2.8. The speciﬁc steps of each part are as follows: 1. Create a structure of pyramid decomposition and reconstruction based on edge and texture information. Decomposing the image into a pyramid representation of texture and edge information needs to satisfy the reconstruction conditions

2.2 Pyramid Image Fusion Method Based on Integrated Edge and Texture Information

31

Fig. 2.7 Diagram of image fusion scheme using integrated edge and texture information

Fig. 2.8 Block diagram of a pyramid decomposition and reconstruction based on texture and gradient feature

32

2 Pixel-Level Image Fusion

ð1 w_ new Þ ¼

25 X

ti T i T i þ

i¼1

4 X

ci Di Di

ð2:8Þ

i¼1

Here, ti and ci are undetermined coefﬁcients, which can be obtained by the singular value decomposition method; Ti is Laws texture extraction ﬁlter, and Di is the edge extraction ﬁlter. Establishing a pyramid decomposition and reconstruction based on edge and texture information, the main task is to ﬁnd these undetermined coefﬁcients. The ﬁve kernel vectors extracted by Laws texture are shown below: l5 ¼ ½1 4 6 4 1 e5 ¼ ½ 1 2 0 2

1

s5 ¼ ½ 1 0 2 0 1 u5 ¼ ½ 1 2 0 2 1 r5 ¼ ½ 1

4 6

4

ð2:9Þ

1

Ti represents twenty-ﬁve 9 9 ﬁlters obtained by cross-convolution and selfconvolution of these kernel vectors and then by convolutional expansion. The four edge extraction ﬁlters are as follows: 2

0

0

0

3

2

0

7 6 6 d1 ¼ 4 1 2 1 5; d2 ¼ 4 0 0 0 0 0:5 3 2 0:5 0 0 7 6 d4 ¼ 4 0 1 0 5 0 0 0:5

0 1 0

0:5

3

7 0 5; 0

2

0

6 d3 ¼ 4 0 0

1 2 1

0

3

7 0 5; 0

Similar to the texture extraction ﬁlter processing, they get four edge extraction ﬁlters Di by convolution expansion. Determine the coefﬁcient by the method of singular value decomposition of Eq. (2.8). We get the pyramid decomposition and reconstruction of the structure. 2. Pyramid decomposition process of an image Edge extraction ﬁlter and texture extraction ﬁlter together constitute 29 feature extraction ﬁlter Fl (l ¼ 1, 2, . . ., 29). Pyramid decomposition process can use Eq. (2.10) Lkl ¼ f l ðF l F l Þ ½Gk þ w_ new Gk

ð2:10Þ

Here, Fl and fl are the feature extraction ﬁlters and their corresponding ! coefﬁcients, respectively; L kl is the decomposed feature image.

2.2 Pyramid Image Fusion Method Based on Integrated Edge and Texture Information

33

3. Layers of image fusion processing After the image is decomposed into texture-based and edge-based pyramid forms, the fusion method adopts a strategy based on the similarity measure and the saliency measure. Marking the kth layer of the l direction of the image is Lkl. First, the activity measure of pyramid decomposition coefﬁcient of two images is calculated. Suppose, saliency measures of the two decomposition coefﬁcients !

are A p

!

and B p , respectively. Using the window-based measure, the

window size is 3 3, the window template coefﬁcient is 2

1 6 α ¼ 41 1

3 1 1 7 1 8 15 16 1 1

The signiﬁcance measure is X ! ! S p ¼ αðs, t ÞL kl ðm þ s, n þ t, k, lÞ2 s2S, t2T

The similarity measure is !A !B P αðs, t ÞL kl ðm þ s, n þ t, k, lÞL kl ðm þ s, n þ t, k, lÞ 2 s2S, t2T ! M AB p ¼ ! ! S2A p þ S2B p

If the similarity measure MAB α, then ωA ¼ 12 12

1M AB 1β

, and ωB ¼ 1 ωA.

If the similarity measure MAB < α, then

ωA ¼ 1

if SA > SB

ωA ¼ 0

otherwise

and

ωB ¼ 1 ωA

Finally, the fusion strategy is !F ! L kl p

!A !B ! ! ! ! ¼ ωA p L kl p þ ωB p L kl p

ð2:11Þ

4. Pyramid reconstruction of fused images based on texture and edge information !

Using L kl obtained by Eq. (2.11) to ﬁnd pyramid inverse transform based on texture and edge which can fuse image. The top image Gn represents the low-pass information of the image, and the part of the image is interpolated to obtain a

34

2 Pixel-Level Image Fusion

2 M 2 M image. This is equivalent to the dimensionality of the underlying texture and edge information image of the low-pass image. Considering the coefﬁcients ti and ci obtained in step 1 are obtained by satisfying the reconstruction conditions, the texture and edge images must be multiplied by this coefﬁcient and then add the interpolation result of the top image (low-pass image) to obtain a low-pass image Gn 1.

2.2.4

Beneﬁcial Effects

After the pyramid expansion of the image based on the edge and texture information, the image can fully reﬂect the edge and texture features on each scale. Image fusion at each level based on such pyramid decomposition can make the fused image fully reﬂect the features of the original image, which are necessary for the subsequent image recognition. The pyramid image fusion method based on texture and edge information greatly improves the image quality after fusion, which is of great signiﬁcance and practical value for the subsequent processing of application system and image display.

2.3 2.3.1

Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames Introduction

The simplest fusion approach is taking an average of the input images, pixel by pixel, to create a new composite image. However, this approach is not appropriate, since it creates a blurred image where the details are reduced. Image fusion approaches based on multi-resolution representations are now widely used. The basic idea of these approaches is to perform a multi-resolution transform (MST) on the source images and to construct a composite multi-resolution representation by using an appropriate fusion rule. The fused image is then obtained by taking the inverse multiresolution transform (IMST). The MST approaches include the Laplacian pyramid, the ratio of low-pass pyramid, and the discrete wavelet transform (DWT). Because of an underlying down sampling process, the fusion results of these approaches are shift-dependent. When there is a slight camera or object movement or when there is misregistration of the source images, the performance of those MST approaches [15, 30] will quickly go bad. For the discrete wavelet frame (DWF) transform, each frequency band will have the same size, because the DWF utilizes a dilate analysis ﬁlter instead of down sampling of the input signal. The DWF transform [31], which is more suitable for image fusion, has the properties of freedom from aliasing and shift invariance. Theoretically, the DWF decomposition process can be continued until the

2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames

35

low-pass approximation is just the mean information contained within a (1 1) pixel signal. This is not desirable in practice, because at low resolution feature selection becomes less accurate and prone to bias toward one of the input multi-resolution representations. This can cause large-scale ringing artifacts and signiﬁcantly degrade the fused image’s quality. Thus, decomposition is halted before the theoretical (1 1) pixel minimum is reached. In this case, the low-frequency band image contains only the very large-scale features that form the background of input images and are important for their natural appearance, while the high-frequency band shows the detailed information. For the fusion rule in the low-frequency band, there are two appropriate choices: weighted average and estimation methods [32, 33]. The weighted-average fusion method averages the input low-frequency bands to compose a single low-frequency band. The estimation fusion methods formulate the fusion result in terms of estimated parameters of an imaging model. The fusion results for the low-frequency band are obtained by maximizing their probability likelihood function. For the fusion rule in the high-frequency band, the basic fusion approach is absolute-value maximum selection, i.e., the largest absolute values in the sub-bands are retained for reconstruction. It is a fact that the largest absolute values correspond to features in the image such as edges, lines, and region boundaries. Recently, estimation theory has been used to improve the efﬁciency of image fusion algorithms. However, these approaches are all based on the assumption that the disturbance satisﬁes a Gaussian distribution. Since natural images actually follow a Gaussian scale mixture distribution in multi-resolution space, the Gaussian assumption might mistreat the useful signal as disturbance and hence degrade the quality of the fused image. The registered images are decomposed using the DWF transform. The DWF decomposes the source images into multi-resolution representations with both low-frequency coarse information and high-frequency detail information. We assume that there exists an optimal of the source images. Thus, the low-frequency fusion problem is formulated as a parameter estimation problem. The EM algorithm [33] is used to estimate these parameters. A new measure is used to characterize important image information, and value maximum selection is implemented to improve the robustness of the fusion algorithm. The informative importance measure is applied to the high frequencies. The ﬁnal fused image is obtained by taking the inverse transform of the fused low-frequency and high-frequency multi-resolution representations.

2.3.2

Discrete Wavelet Frame Multi-Resolution Transform

The DWF MST is aliasing-free and translation-invariant. For this reason, when the MST method is used in an image fusion system, better fusion results may be expected. Figure 2.9 illustrates the ith stage of the two-dimensional (2-D) DWF

36

2 Pixel-Level Image Fusion

introduced in [31], where a particular pair of analysis ﬁlters h(x) and g(x) corresponding to a particular type of wavelet are used. Here, S0 is the original image. The processing is applied recursively for each decomposition level. Figure 2.9 shows that after one stage of processing, an image is decomposed into four frequency bands: low-low (LL), low-high (LH), high-low (HL), and high-high (HH). They are the coarse information and the vertical, horizontal, and diagonal highfrequency information, respectively. A DWF transform with N decomposition levels will have M ¼ 3N + 1 such bands. For the ith decomposition level, SLL is processed iþ1 iþ1 iþ1 to produce Siþ1 LL , DLH , DHL , DHL . Because the DWF needs a dilate analysis ﬁlter instead of down sampling of the input signal, each frequency band will have the

Fig. 2.9 One stage of 2-D DWF decomposition (a) and reconstruction (b)

2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames

37

same size as the source image. On the contrary, for the DWT the sub-band images do not have the same size as the source images. Figure 2.9 shows the DWF transform of the “Lena” image. The transformed image and the source image have the same size. Diþ1 ðnÞ ¼ ½g"2i Si ðk Þ

ð2:12Þ

Siþ1 ðnÞ ¼ ½h"2i Si ðkÞ

ð2:13Þ

where the analysis ﬁlters ½h"2i and ½g"2i at level i are obtained by inserting the appropriate number of zeros between the taps of the prototype ﬁlters. The reconstruction process is similarly computed via 1-D synthesis ﬁlters e hðxÞ and e gð x Þ SðnÞ ¼ e hN S N ð k Þ þ

N X

e gi Di ðk Þ

ð2:14Þ

i¼1

h i where e h¼ e h

"2N

and e gi ¼ ½e g"2i , and where e hðxÞ and e gðxÞ are the synthesis ﬁlters

corresponding to the analysis ﬁlters h(x) and g(x), respectively.

2.3.3

Basic Structure of the New Fusion Scheme

The basic structure of the new fusion scheme is shown in Fig. 2.10. The multiresolution process can be summarized as follows: 1. Decompose the source images into a multi-resolution representation with both low-frequency coarse information and high-frequency detail information, including the vertical, horizontal, and diagonal high-frequency information SSðAÞ ¼ fD1 ðAÞ, . . . , Di ðAÞ, . . . , DN ðAÞ, SN ðAÞg,

Fig. 2.10 The proposed multi-resolution image fusion scheme

38

2 Pixel-Level Image Fusion

SSðBÞ ¼ fD1 ðBÞ, . . . , Di ðBÞ, . . . , DN ðBÞ, SN ðBÞg, whereDi(A) and Di(B) are the high-frequency detail information of input images: Di ¼ DiLH , DiHL DiHH ; and SN(A) and SN(B) are the low-frequency coarse information: SN ¼ SNLL. The high-frequency detail information yields the detailed feature information of the source images, such as edges, texture, and so on. The low-frequency coarse information shows large-scale features that form the background of the source images. The importance of the high frequencies and the low frequencies mainly depends on the effective information content, which varies with the source images. 2. Estimate optimally the low-frequency coarse information SN(F) for the fused image from SN(A) and SN(B). 3. Apply the feature selection rule to the high-frequency detail information, considering the important image information as Di ðF Þ ¼ RuleðDi ðAÞ, Di ðBÞÞ: 4. Construct the multi-resolution representation of the fused image as SSðF Þ ¼ fD1 ðF Þ, . . . , Di ðF Þ, . . . , DN ðF Þ, SN ðF Þg: 5. Perform the inverse discrete wavelet frame transform (IDWF) to obtain the ﬁnal fusion image F.

2.3.4

Fusion of the Low-Frequency Band Using the EM Algorithm

Before the proposed EM method is applied to the low-frequency band, the image model should be deﬁned as SN ðX, jÞ ¼ αðX, jÞSN ðF, jÞ þ βðX, jÞ þ εðX, jÞ

ð2:15Þ

where X ¼ A or B represents the source image index; j denotes the pixel location in the low-frequency band; SN(X, j) denotes the low-frequency band of the image X at the jth pixel; SN(F, j) represents the optimally fused low-frequency band at the jth pixel, which is a parameter to be estimated; α(X, j) ¼ 1 or 0 is the sensor selectivity factor; β(X, j) is the bias of the image, which reﬂects the mean of the low-frequency image; and ε(X, j) is the random noise, which is modeled by a K-term mixture of Gaussian probability density functions (pdfs), that is,

2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames

f εðX,jÞ ðεðX, jÞÞ ¼

k X k¼1

"

λk,x ð jÞ h

1 2πσ 2k,x ð

jÞ

i1=2

εðX, jÞ2 exp 2 2σ k,x ð jÞ

39

# ð2:16Þ

where λk, X( j) are the weights of the K-term Gaussian distribution and σ 2k,X ð jÞ is the variance of the distribution. The image formation model is generally different for each location j. However, to a ﬁrst-order approximation, the selectivity factor and the parameters of the pdf in Eqs. (2.15) and (2.16) can be considered constant over a small region of neighboring j. Estimation is usually performed in a neighborhood with j as the center. In this study, a neighborhood size L ¼ 5 5 is selected, and j scans all pixels in the low-frequency image. When the boundary is reached, symmetric mirror extension is used to extend the boundary by 2 pixels. Inside the neighborhood, we assume that the model parameters β(X, j), λk,l ð jÞ, σ 2k,l ð jÞ , and α(X, j) are constants. For simplicity, we2 drop the indices j of these parameters in the sequel, writing β(X), λk,l ð jÞ, σ k,l ð jÞ , and α(X). The EM algorithm is a general method for ﬁnding the maximum-likelihood estimate of the parameters of an underlying distribution from a given dataset in which the data are incomplete or have missing values. The ﬁrst step in deriving the EM algorithm is the speciﬁcation of sets of complete data and incomplete data. For the image formation model Eqs. (2.15) and (2.16), the incomplete dataset Y consists of the following observed data as Y ¼ fSN ðX, lÞ : X ¼ A or B,

l ¼ 1, . . . , Lg

ð2:17Þ

where l indexes the coefﬁcient location in the small region of neighboring j, X denotes the source image A or B, and S(X, l ) denotes the coefﬁcient at location l in region j of the low-frequency band of the source image in A or B. The complete dataset Yc is deﬁned as Y c ¼ fSN ðX, lÞ, kðX, lÞ : X ¼ A or B

l ¼ 1, . . . , Lg

ð2:18Þ

where k(X, l) identiﬁes which term in the Gaussian mixture pdf Eq. (2.16) produces the additive distortion sample in the observation SN(X, l). The common parameter set is F ¼ SN ðX, lÞ, βðX Þ, λk,X , σ 2k,X , αðX Þ; X ¼ A or B; l ¼ 1, . . . , L; k ¼ 1, . . . , K . The number of terms, K, in the Gaussian mixture pdf model Eq. (2.16) is assumed to be ﬁxed; we choose K ¼ 2. A standard technique of the SAGE version of the EM algorithm is used to derive the iterative estimation equations. The elements of the incomplete data Y in Eq. (2.17) are independent identical distribution with marginal pdf.

40

2 Pixel-Level Image Fusion

SN ðX, lÞjF hðSN ðX, lÞjFÞ ¼

k X k¼1

λk,X 2πσ 2k,X

) ~( ½SN ðX, lÞ βðX Þ αðX ÞSN ðF, lÞ2 exp 1=2 2σ 2k,X ð2:19Þ

where the symbol ~ denotes the two sides are independently and identically distributed, and h(SN(X, l )|F) is the marginal probability density function under the condition of common parameter set F for incomplete data Y. The elements of the complete data Yc in Eq. (2.17) are independent with marginal probability density function ðSN ðX, lÞ, k ðX, lÞÞjF hc ðSN ðX, lÞ, k ðX, lÞjFÞ ¼

λk,X 2πσ 2k,X

1=2

) ~( ½SN ðX, lÞ βðX Þ αðX ÞSN ðF, lÞ2 exp 2σ 2k,X ð2:20Þ

where hc(SN(X, j), k(X, l )|F) is the marginal probability density function under the condition of common parameter set F for complete data Yc. The conditional distribution k(X, l)|SN(X, l ), F is hc ðSN ðX, lÞ, kðX, lÞjFÞ

gk,xl ½SN ðX, lÞ ¼

λ

hðSN ðX, lÞjFÞ ¼

exp

½SN ðX, lÞβðX ÞαðX ÞSN ðF, lÞ2

ð2:21Þ

ð2πσ2k,X Þ P λk,X ½SN ðX, lÞβðX ÞαðX ÞSN ðF, lÞ2 exp 1=2 2σ 2 2 k,X k¼1 ð2πσ k,X Þ 1=2

2σ 2 k,X

K

The joint probability density functions for the incomplete datasets and complete datasets are hðYjF Þ ¼

L Y Y

hðSN ðX, lÞjF Þ

and

hc ðY c jF Þ

X¼A, B l¼1

¼

L Y Y

hc ðSN ðX, lÞ, k ðX, lÞjF Þ

X¼A, B l¼1

Each iteration of the EM algorithm involves two steps: the expectation step (E step) and the maximization step (M step). The E step of the EM algorithm performs an average over complete data, conditioned on the incomplete data to produce the cost function

2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames

41

QðF0 jFÞ ¼ E f ln hc ðY c jF0 ÞjY, Fg ¼

L X X

E f ln hc,Xl ðSN ðX, lÞ, kðX, lÞjF0 ÞjSN ðX, lÞ, FÞg

X¼A, B l¼1

¼Bþ

L X K X X X¼A, B l¼1 k¼1

L X K X

(

l¼1 k¼1

ln σ 02 k,X

1 X ln λ0k,X gk,Xl ½SN ðX, lÞ 2 X¼A, B

2 ) SN ðX, lÞ β0 ðX Þ α0ðX ÞSN ðF,lÞ þ gk,Xl ½SN ðX, lÞ σ 02 k,X

ð2:22Þ

where B is a term independent of F0 . The EM algorithm would update the parameter estimates to new values F0 that maximize Q(F0 |F) in Eq. (2.22). This is the M step of the EM algorithm. In order to maximize Q(F0 |F) analytically, we update each parameter one at a time. Because a(X) is discrete, a0 (X) is updated to have the value from the set {0,1, +1} that maximizes Eq. (2.22) with all the other parameters set at their old values: 2 S0N ðF, lÞ ¼ SN ðF, lÞ, λ0k,X ¼ λk,X , and σ 02 k,X ¼ σ k,X . The optimal fused low-frequency 0 coarse information SN ðF, lÞ is obtained from maximizing Eq. (2.22) analytically by solving ∂Q/∂SN(X, l) ¼ 0 using the updated α0(X) and the old for the other parameters values. The update estimate for λ0k,X and σ 02 k,X are obtained from solving ∂Q/ ∂λk, X ¼ 0 and ∂Q/∂σ k, X ¼ 0, respectively, for k ¼ 1,...,K. Initial values for the parameters are required to start the EM algorithm. A simple estimate for SN(F, l) comes from the weighted average for the low frequency of the source images SN ðF, lÞ ¼

X

wX SN ðX, lÞ

ð2:23Þ

X¼A, B

where

P

wX ¼ 1 . The simplest case is using an equal weight for each source

X¼A, B

image, that is, wX ¼ 1/q. A simple initialization for αX ¼ 1 for X ¼ A or B. To model the distortion in a robust way, the distortion is initialized as impulsive. We initialized the distortion parameters with λ1, X ¼ 0.8 and λ2, X¼. . . ¼ λk, X¼0.2/(K 1). Then, we set σ 2k,X ¼ γσ 2k1,X , k ¼ 2, . . . , K, where the choice of σ 21,X is based on an estimate K P of the total variance σ 2X ¼ λk,X σ 2k,X given by k¼1

σ 2X ¼

L X

½SN ðX, lÞ SN ðF, lÞ2 =L

ð2:24Þ

l¼1

where L ¼ h h. We choose γ ¼ 10 so that the initial distortion model is fairly impulsive. The sensor bias is

42

2 Pixel-Level Image Fusion

PL

l¼1 SN ðX, lÞ

βX ¼

ð2:25Þ

L

and is equal to the mean value in the small region. The initialization scheme worked very well for the cases we have studied. We observed that the algorithm in our experiments generally converged in less than ﬁve iterations to obtain the fusion result SN(F) in each local analysis window. According to the preceding derivation, we can summarize the standard technique of the SAGE version of the EM algorithm in the following iterative procedure: 1. Compute the condition probability density n o 2 exp ½SN ðX, lÞαðX2σÞS2N ðF, lÞβðX Þ k,X ð Þ gk,x,l ½SN ðX, lÞ ¼ PK λp,i ½SN ðX, lÞαðX ÞSN ðF, lÞβðX Þ2 1=2 exp p¼1 2σ 2p,X ð2πσ2p,X Þ λ 1=2 2πσ 2k,i

ð2:26Þ

2. Update the parameter α(X) by giving it the value from the set {1, 0, 1} that maximizes Q L K 1 X XX 2 X¼A, B l¼1 k¼1 ( ) ½SN ðX, lÞ αðX ÞSN ðF, lÞ βðX Þ2 2 ln σ k,X þ gk,X,l ½SN ðX, lÞ 2σ 2k,X

Q¼

3. Recalculate the condition probability density gk, gk, X, l[SN(X, l)] and the bias β(X) P 0

S ðF, lÞ ¼

PK X¼A,B

k¼1 ½SN ðX, lÞ

P

β 0 ðX Þ ¼

K P

l¼1 k¼1

βðX Þα0 ðX Þ

PK

X¼A,B L P

X, l[SN(X, l )];

gk,X,l ðSN ðX, lÞÞ σ 2k,X

gk,X,l ðSN ðX, lÞÞ 02 k¼1 α ðX Þ σ2

then update

ð2:28Þ

k,X

g ðS ðX, lÞÞ SN ðX, lÞ α0 ðX ÞS0N ðF, lÞ k,X,l σN2 k,i

L P

ð2:27Þ

K P

l¼1 k¼1

gk,X,l ðSN ðX, lÞÞ σ 2k,X

ð2:29Þ

4. Recalculate gk, X, l and β(X) and update the model parameters λk, X, σ 2k,X, and β(X)

2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames

λ0k,X ¼

L 1X g ðS ðX, lÞÞ, L l¼1 k,X,l N L

P

l¼1 σ 02 k,X ¼

k ¼ 1, . . . , SN ðF, lÞ . . . K,

X ¼ A, B

43

ð2:30Þ

2 SN ðX, lÞ α0ðX ÞS0N ðF, lÞ β0ðX Þ gk,X,l ðSN ðX, lÞÞ L P

,

k

gk,X,l ðSN ðX, lÞÞ

l¼1

¼ 1, . . . , K, . . . X ¼ A, B

ð2:31Þ

5. Repeat steps 1–4 using the new parameters S0N ðF, lÞ, α0(X), λ0k,X , σ 0k,X , and β0(X). When all of the parameters have converged to a ﬁxed range in each location of the low-frequency images, we can achieve the optimal fusion result SN(F).

2.3.5

The Selection of the High-Frequency Band Using the Informative Importance Measure

In the pattern-selective fusion scheme, we proposed a new measure to characterize important image information, which models the early retinal process [34]. A quantitative estimation of this important information can be provided by the measure of uncertainty in pixel-to-neighbors interaction. Two sources of such uncertainty must be considered: luminance uncertainty and topological uncertainty PIX ðm, nÞ ¼ C ðm, nÞI ðm, nÞ

ð2:32Þ

where PIX(m, n) indicates the important information of the wavelet frame coefﬁcient, C(m,n) is the absolute value of wavelets frame coefﬁcient (high-frequency band) and also reﬂects the luminance uncertainty, I(m,n) denotes the topological uncertainty Cðm, nÞ ¼ jDX ðm, nÞj

ð2:33Þ

where DX(m, n) is the high-frequency coefﬁcient DiLH , DiHL , or DiHH , and we omit the superscript i, and the subscripts LH, HL, HH that denote the scale and the directions, respectively, as in Fig. 2.9 and Fig. 2.10. The subscript X is added to indicate the source image A or B, and (m,n) is the location of the high-frequency coefﬁcient, m being the row number and n the column number. Considering the relationship of neighborhood coefﬁcients, the sign of the highfrequency coefﬁcient is decided ﬁrst

44

2 Pixel-Level Image Fusion

signðm, nÞ ¼ signðDX ðm, nÞÞ

ð2:34Þ

where sign() is 1 if the high-frequency coefﬁcient is equal to or greater than zero, and 0 otherwise. We have I ðm, nÞ ¼ PX ðm, nÞ½1 PX ðm, nÞ

ð2:35Þ

where pX(m, n) is the probability of ﬁnding surrounding coefﬁcients in the same state (sign) as the central coefﬁcient at position (m,n)

PX ðm, nÞ ¼

8 > > > > > > >
mþ1 P > > > > > > : 1 i¼m1

8 nP þ1

if signðm:nÞ ¼ 1

ð2:36Þ

signði, jÞ

j¼n1

8

if signðm:nÞ ¼ 0

Two extreme situations, when all neighbors and the central pixel are in the same state (a ﬂat region, p ¼ 1) and when none of the neighbors is in the central pixel state (an outlier, p ¼ 0), have the same intuitively expected result I(m,n) ¼ 0 and consequently PIX(m,n) ¼ 0: there is no “edginess” at that location. Then, we can achieve fusion-selective pattern DF ðm, nÞ ¼

DA ðm, nÞ,

PIA PIB

DB ðm, nÞ

PIA PIB

ðm, nÞ 2 E

ð2:37Þ

where DA denotes the high-frequency wavelet frame coefﬁcient of the source image A, DB denotes the high-frequency wavelet frame coefﬁcient of the source image B, and DF is the wavelet frame coefﬁcient or the high-frequency band of the fused image SS(F). Finally, when the optimal fused low-frequency band SN(F) and the fused high-frequency band Di are obtained, the ﬁnal fused image F can be achieved by performing the inverse discrete wavelet frame transformation as in Fig. 2.9.

2.3.6

Computer Simulation

Performance measures are essential to determine the possible beneﬁts of fusion as well as to compare results obtained with different algorithms. However, it is difﬁcult to evaluate image fusion result objectively. Therefore, three evaluation criteria are used to quantitatively assess the performance of the fusion. The ﬁrst evaluation measure is the objective performance metric proposed by Petrovic and Xydeas [35]. It models the accuracy with which visual information is

2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames

45

transferred from the source images to the fused image. Important information is associated with edge information measured for each pixel. Correspondingly, by evaluating the relative amount of edge information that is transferred from the input images to the fused image, a measure of fusion performance is obtained. A larger objective performance metric means that more important information in the source images has been preserved. Mutual information has been proposed for fusion evaluation. Given two images xF and xR, we deﬁne their mutual information as QðxR ; xF Þ ¼

L X L X

hR,F ðu, vÞ log 2

u¼1 v¼1

hR,F ðu, vÞ hR ðuÞhF ðvÞ

ð2:38Þ

where xR is the ideal reference; xF is the obtained fused image; hR and hF are the normalized gray-level histograms of xR and xF, respectively; hR,F is the joint graylevel histogram of xR and xF; and L is the number of bins. We select L ¼ 100. Thus, the higher the mutual information between xR and xF, the more likely it is that xF resembles the ideal xR. The mutual information evaluation method may be modiﬁed into an objective measure according to [36]. This is the second evaluation measure. The third evaluation measure is the entropy EN ¼

H X

pi ln pi

ð2:39Þ

i

where pi is the probability that the pixel number of a gray level is i. Since multiple-source images are always corrupted by noise or have register errors, the study of the robustness of image fusion system becomes important. We proposes a measure to evaluate the robustness by using the measures described. We call it the relative difference E¼

jQ0 Qj Q

ð2:40Þ

where Q0 denotes the evaluation measure of image fusion when source images are corrupted by noise or have register errors, and Q denotes the measure when source images have no noise effect. Figure 2.11a, b shows a visual image and a millimeter wave (MMW) image employed in concealed weapon detection (CWD). The size of source images is 200 256. We can see that a weapon is concealed on the third person from Fig. 2.11b. The discrete wavelet method, Yang’s statistical fusion method, the traditional DWF method, and the method proposed in this section are applied separately in the fusion process. In all cases, we perform a three-level decomposition. The fusion results are demonstrated in Fig. 2.11c–f. It is clear that the proposed

46

2 Pixel-Level Image Fusion

Fig. 2.11 (a) Visual image and (b) MMW image, and the fusion results employing (c) the 9/7 wavelet, (d) Yang’s statistical fusion method, (e) the DWF-based fusion method, and (f) the proposed method (Sect. 2.5)

method outperforms the others. The values of the evaluation measures are exhibited in Table 2.2. The entropy, the pixel mutual information, and the edge mutual information are increased. The implementation time is also shown in Table 2.2. Figure 2.12b demonstrates a visual image of a scene in which the background is road, grassland, and fence as well as a house. But the person is hardly found to appear on the infrared image in Fig. 2.12a. The same ﬁgure displays the results of fusion by the discrete wavelet, Yang’s statistical fusion method, the traditional DWF

2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames

47

Table 2.2 Fusion results of WMM image and visual image Method 9/7 wavelet Yang’s DWF Proposed

Entropy 4.0415 4.2556 4.0110 4.7009

Pixel mutual information 0.8812 1.6135 0.9622 1.6140

Edge mutual information 0.5465 0.6312 0.5670 0.6975

Implementation time (s) 0.3280 188.3900 1.4690 126.4220

Fig. 2.12 (a) IR image and (b) visual image, and the fusion results employing (c) the 9/7 wavelet, (d) Yang’s statistical fusion method, (e) the DWF-based fusion method, and (f) the proposed method

method, and the method proposed in this section. In all cases, we perform a threelevel decomposition. It is clear that the proposed method outperforms the others. The evaluation measures are exhibited in Table 2.3. The entropy and the pixel and edge mutual information go up.

48

2 Pixel-Level Image Fusion

Table 2.3 Fusion results of IR image and visual image Method 9/7 wavelet Yang’s DWF Proposed

Entropy 4.4520 4.9306 4.4711 4.9643

Pixel mutual information 0.9893 1.4593 1.0089 1.5308

Edge mutual information 0.3686 0.4715 0.4308 0.5042

Figure 2.13 shows CT and MRI image fusion. Their fusion results can be found in Table 2.4. Figure 2.14 shows SAR and IR image fusion, and their results are shown in Table 2.5. The ﬁlter coefﬁcient of the DWF is as the 9/7 wavelet. The high-frequency and the low-frequency ﬁlter coefﬁcients are ½0:0378; 0:0238; 0:1106; 0:3774; 0:8527; 0:3774; 0:1106; ; 0:02380:0378,

½0:0645; 0:0407; 0:4181; 0:7885; 0:4181; 0:0407; 0:0645:

Table 2.6 shows the fusion results of the infrared image and visual image of Fig. 2.12a, b corrupted by noise; the mean of the noise is 0, and the variance is 0.01. To evaluate the robustness of the fusion systems, the discrete wavelet-based fusion method, Yang’s statistical fusion method, the traditional DWF-based fusion method, and the proposed method are used. If the relative difference (RD) is less, the fusion method is more robust. When the source images have registration errors, there are some effects on image fusion system, which will lead to changes in the evaluation measure corresponding to the various fusion methods. Table 2.7 gives the values of the evaluation measure and relative difference when there exists 1-pixel registration error in the source images. Both Tables 2.6 and 2.7 indicate that the proposed fusion method is more robust than the existing methods.

2.3.7

Conclusions

In this section, an image fusion method has been proposed for merging multiple source images based on the expectation maximum (EM) algorithm and discrete wavelet frame. Experimental results indicate that the proposed method out-performs the methods based on the discrete wavelet transform and the existing wavelet frame transforms. We also proposed a relative difference measure to evaluate the robustness of an image fusion system and illustrate that the proposed method is more robust than the existing image fusion methods.

2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames

49

Fig. 2.13 (a) CT image and (b) MRI image, and the fusion results employing (c) the 9/7 wavelet, (d) Yang’s statistical fusion method, (e) the DWF-based fusion method and (f) the proposed method Table 2.4 Fusion results of CT image and MRI image Method 9/7 wavelet Yang’s DWF Proposed

Entropy 3.7655 4.0594 3.7461 4.2133

Pixel mutual information 1.4793 2.3301 1.5958 2.4954

Edge mutual information 0.5458 0.7705 0.6663 0.7715

50

2 Pixel-Level Image Fusion

Fig. 2.14 (a) SAR image and (b) IR image, and the fusion results employing (c) the 9/7 wavelet, (d) Yang’s statistical fusion method, (e) the DWF-based fusion method, and (f) the proposed method

Table 2.5 Fusion results of SAR image and IR image Method 9/7 wavelet Yang’s DWF Proposed

Entropy 4.9012 5.0762 4.9139 5.1471

Pixel mutual information 0.7751 1.2709 0.8583 1.2246

Edge mutual information 0.3957 0.7705 0.5124 0.5623

2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks

51

Table 2.6 Fusion results of noise-corrupted IR image and noise-corrupted visual image Method 9/7 wavelet Yang’s DWF Proposed

Entropy Measure 5.0249 4.9747 5.0176 5.2703

RD 0.1287 0.0089 0.1222 0.0616

Pixel mutual information Measure RD 0.8478 0.1430 1.1087 0.2403 0.9163 0.0917 1.5349 0.0027

Edge mutual information Measure RD 0.2608 0.2925 0.3405 0.2925 0.3142 0.2716 0.5080 0.0075

Table 2.7 Fusion results of IR image and visual image corresponding to a register error of 1 pixel Method 9/7 wavelet Yang’s DWF Proposed

2.4 2.4.1

Entropy Measure 4.4816 4.9258 4.4886 4.9663

RD 0.0067 0.01 0.0039 0.0004

Pixel mutual information Measure RD 1.0305 0.0417 1.4631 0.0026 1.0266 0.0175 1.4349 0.0028

Edge mutual information Measure RD 0.3789 0.0279 0.4828 0.0193 0.4324 0.0023 0.4074 0.0068

Image Fusion Method Based on Optimal Wavelet Filter Banks Introduction

Primary image fusion approaches are based on combining the multi-resolution decomposition coefﬁcients of the source images [30]. The basic idea is to perform a multi-scale transform (MST) on all source images and construct a composite multiscale representation of them. The fused image is then obtained by taking the inverse multi-scale transform (IMST). In multi-scale transform, a ﬁlter bank is applied to split the input signal into the low-frequency approximate signal and the highfrequency detail signal. Therefore, the design of digital ﬁlter banks becomes key issue in image fusion. Existing ﬁlter-bank design methods focus on biorthogonal ﬁlter banks with perfect reconstruction [15, 37]. A nice property of perfect-reconstruction ﬁlter banks is that they do not introduce any errors by themselves. However, in image fusion, information of the fused image is an incomplete representation of the source images. Hence, small reconstruction errors introduced by ﬁlter banks do not necessarily lead to worse fusion quality. In this method, we relax the perfect reconstruction condition in designing ﬁlter banks and emphasize the overall fusion performance. We formulate the design problem as a nonlinear optimization problem whose design objectives include both the performance metrics of the overall image fusion, such as the RMSE to reference image, and those of each individual ﬁlter such as stopband and passband energies of a low-pass ﬁlter. The optimization problem is solved using simulating annealing nonlinear optimization method. At ﬁrst, ﬁlters are designed for each

52

2 Pixel-Level Image Fusion

Fig. 2.15 The generic image fusion scheme

individual training image to maximize the fusion quality. Then, the ﬁlter bank with the best performance across training images is selected as the ﬁnal result.

2.4.2

The Generic Multi-Resolution Image Fusion Algorithm

The basic structure of the new fusion scheme is shown in Fig. 2.15. The multiresolution process can be summarized as follows [16]: 1. The source images are decomposed into a multi-resolution representation with both low-frequency coarse information and high-frequency detail information. SSðAÞ ¼ fD1 ðAÞ, . . . , Di ðAÞ, . . . DN ðAÞ, SN ðAÞg,

ð2:41Þ

SSðBÞ ¼ fD1 ðBÞ, . . . , Di ðBÞ, . . . DN ðBÞ, SN ðBÞg,

ð2:42Þ

where Di(A) and Di(B) are the high-frequency detail information of input images; SN(A) and SN(B) are the low-frequency coarse information. 2. Average SN(A) and SN(B) to obtain the low-frequency coarse information SN(F) for the fused image. 3. Apply feature selection rule on the high-frequency detail information as Di ðF Þ ¼

Di ðAÞ, if Di ðAÞ Di ðBÞ Di ðBÞ, if Di ðAÞ < Di ðBÞ

,

ð2:43Þ

4. Construct the multi-resolution representation of the fused image as SSðF Þ ¼ fD1 ðF Þ, . . . , Di ðF Þ, . . . , DN ðF Þ, SN ðF Þg,

ð2:44Þ

5. Perform inverse discrete wavelet frame transform (IMST) to obtain the ﬁnal fusion image F. Different combinations of MSD methods yield different

2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks

53

Fig. 2.16 A two-channel ﬁlter bank

performance. A careful study of this issue has been lacking. In fact, there has been very little study comparing various MSD-based fusion algorithm, even at a basic level. Here, we attempt to provide such a study. We search for the ﬁlter bank in context of an image fusion scheme to maximize fusion quality.

2.4.3

Design Criteria of Filter Banks

A generic two-channel Finite Impulse Response (FIR) ﬁlter bank [38] is shown in Fig. 2.16, where H0(z) and H1(z) represent the low-pass and high-pass ﬁlters in the analysis bank, respectively, and G0(z) and G1(z) are the synthesis ﬁlters. The ﬁlter bank consists of an analysis stage and a synthesis stage. According to the Nyquist Theorem, each sub-band signal is then down-sampled by 2 to form the outputs of the analysis ﬁlter. These signals can then be analyzed or processed in various ways depending on the application. When no error occurs in the stage, the input signal to the synthesis ﬁlter is the same to the output of analysis stage. In the synthesis stage, each sub-band signal is up-sampled by 2 and processed by the synthesis ﬁlters G0(z) and G1(z). Finally, the output of the synthesis ﬁlter is summed to form the reconstructed signal. The Z-transform of the reconstructed signal, which is a function of x(n) and the ﬁlters, is as follows: bx ¼

1 2

½G0 ðzÞH 0 ðzÞ þ G1 ðZ ÞH 1 ðzÞX ðzÞ

þ½G0 ðzÞH 0 ðzÞ þ G1 ðzÞH 1 ðzÞX ðzÞ

¼ T ðzÞX ðzÞ þ SðzÞX ðzÞ

ð2:45Þ

There are three types of undesirable distortions in a ﬁlter bank. Aliasing distortions include aliasing caused by subsampling and images caused by up-sampling. The aliasing term S(z)X(z) in (2.45) represents the aliasing distortion. Amplitude distortions represent deviations of the magnitude of T(z) in Eq. (2.45) from unity. Phase distortions represent deviations of the phase of T(z) from the desired phase property such as linear phase. Extensive research has been conducted to remove undesirable distortions of ﬁlter banks. Aliasing distortions can be removed by selecting synthesis ﬁlters based on analysis ﬁlters. Inﬁnite impulse response (IIR) ﬁlter removes phase distortions. Perfect reconstruction of the original signal by a ﬁlter bank requires S(z) ¼ 0 for

54

2 Pixel-Level Image Fusion

all z and T(z) ¼ czd, where c and d are constants. Therefore, to perfectly reconstruct a signal, the transfer function is a pure delay with no aliasing, no amplitude change, and linear phase. In biorthogonal ﬁlter banks, perfect reconstruction and linear phase of ﬁlters can be achieved by the proper selection of ﬁlter parameters. First, aliasing is removed by choosing the synthesis ﬁlters according to the analysis ﬁlter. For example, when the two synthesis ﬁlters are deﬁned as G0 ðzÞ ¼ H 1 ðzÞ

and

G1 ðzÞ ¼ H 0 ðzÞ,

ð2:46Þ

Therefore, the aliasing term in Eq. (2.46) is 0 and the transfer function becomes T ðzÞ ¼ ðH 0 ðzÞH 1 ðzÞ H 1 ðzÞH 0 ðzÞÞ=2

ð2:47Þ

Due to the pure-delay constraint T(z) ¼ czd for a constant d, the two analysis ﬁlters H0(z) and H1(z) should satisfy the following conditions [38, 39]: 1. The sum of the lengths must be a multiple of 4. 2. Both of them are FIR ﬁlters and must be of even length or odd length at the same time. When they are even length, ﬁlter H0(z) should be symmetric and ﬁlter H1(z) antisymmetric. When they are of odd length, both of them should be symmetric. 3. To make the transfer function a pure delay, i.e., T(z) ¼ z d for a constant d, the coefﬁcients of H0(z) and H1(z) need to satisfy a set of equations called the perfect reconstruction (PR) condition. When both the ﬁlters H0(z) and H1(z) are symmetric and of odd length, the PR condition is as follows: 2i X 1 N N1 θ i 0 ð1Þk1 h0 ð2i þ 1 kÞh1 ðkÞ, ¼ 2 4 k¼1

¼ 1, 2, . . . ,

for

i

N0 þ N1 4

ð2:48Þ

where h0(n), n ¼ 1, . . ., N0 are coefﬁcients of H0(z) with length N0; h1(n), n ¼ 1, . . ., N1 are coefﬁcients of H1(z) with length N1; and θ(x) ¼ 1 if x ¼ 0, and 0 otherwise.

2.4.4

Optimization Design of Filter Bank for Image Fusion

In this method, the goal is to ﬁnd the ﬁlter bank that produces the best image fusion quality. We use a metrics to measure the performance of image fusion: RMSE (the root mean square error) to reference image. The RMSE is deﬁned as

2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u M X N u 1 X RMSE ¼ t ðRði, jÞ F ði, jÞÞ2 M N i¼1 j¼1

55

ð2:49Þ

where R is the reference image, and F is the fused image; M,N is the size of the two images. When ﬁlter length is ﬁxed, ﬁlter coefﬁcients form the search space. We apply the design criteria of both the ﬁlter bank and individual ﬁlters to constrain the search space. First, synthesis ﬁlters are decided by analysis ﬁlter according to Eq. (2.46). By doing that, the design of ﬁlter banks becomes the design of a pair of analysis ﬁlters H0(z) and H1(z). Second, the two analysis ﬁlters are FIR ﬁlters and total length of them is a multiple of 4. In addition, they are of even length or odd length at the same time. When they are of even length, ﬁlter H0(z) should be symmetric and ﬁlter H1(z) antisymmetric. When they are of odd length, both of them should be symmetric. Note that these are the conditions of a biorthogonal ﬁlter bank, except that the PR condition because we focus on the situation of high image fusion property, in which the PR condition is not critical. In addition, individual ﬁlters include regularity and smoothness properties in wavelet theory. Regularity requires that the iterated low-pass ﬁlter converge to a continuous function. For a low-pass ﬁlter H0(z), the regularity of order m requires at least m zeros of its amplitude response H0(ω) at ω ¼ π. Similarly, for a high-pass ﬁlter H1(z), the regularity of order m requires at least m zeros of its amplitude response H1(ω) at ω ¼ 0 [40, 41]. It has been shown that regularity of order 2 is good enough. Regularity of order 2 can be achieved by letting H0(ω ¼ π) ¼ 0 for the low-pass ﬁlter and H1(ω ¼ 0) ¼ 0 for the high-pass ﬁlter, i.e., N0 X

ð1Þnþ1 h0 ðnÞ ¼ 0,

and

n¼1

N1 X

h1 ð nÞ ¼ 0

ð2:50Þ

n¼1

Furthermore, the following condition is necessary to make inﬁnite iteration of a low-pass ﬁlter converges N0 X n¼1

h0 ðnÞ ¼ 1,

and

N1 X

ð1Þnþ1 h1 ðnÞ ¼ 1:

ð2:51Þ

n¼1

These conditions are linear equations and can be used to remove variables through substitution. For example, in the design of a pair of symmetric analysis ﬁlter of length 9 and 7, we have a total of 5 free variables because the symmetric property reduces the number of variables to 5 + 4 ¼ 9 and the wavelet conditions remove another 4 variables. After constraining the search space, we then search for the ﬁlter coefﬁcients that maximizes RMSE to the reference image. Thus, the design problem becomes a

56

2 Pixel-Level Image Fusion

nonlinear optimization problem whose objective is maximizing RMSE. This design formulation is general and can be extended to other design objectives. Empirically, we ﬁnd that it is difﬁcult to get good ﬁlters based on maximizing RMSE. The reason is that the majority of the search space corresponds to ﬁlter coefﬁcients that lead to bad fusion results. RMSE itself does not provide enough guidance for the search to escape from bad search regions. To overcome this difﬁculty, we introduce a second objective that consists of stopband and passband energies of individual ﬁlters. The stopband energy and passband energy of a ﬁlter can be used to measure the proximity to the ideal step ﬁlter. Assume an odd-length symmetric low-pass H0(z) with coefﬁcients h0(n), n ¼ 1,. . .,N0, the Fourier transform of H0(z) is [42]. N 0 1 F 0 ejω ¼ H 0 ðωÞej 2 ω ,

ð2:52Þ

where N 0 1

H 0 ðωÞ ¼ h0 ððN 0 þ 1Þ=2Þ þ

2 X

n¼1

2h0 ðnÞ cos

N0 þ 1 n ω, 2

ð2:53Þ

The stopband energy ES(h0) with stopband cut-off frequency ωs is Zπ E S ðh0 Þ ¼

H 20 ðωÞdω ωs

¼h20 ððN 0 þ 1Þ=2Þðπ ωs Þ 4h0 ððN 0 þ 1Þ=2Þ N0 þ 1 n ωs 2 h0 ð nÞ N0 þ 1 n¼1 n 2 ðNX ðNX 0 1Þ=2 0 1Þ=2 sin ðN 0 þ 1 2nÞωs þ2 h0 ðnÞ2 ðπ ωs Þ 2 N 0 þ 1 2n n¼1 n¼1

ðNX 0 1Þ=2

ðN X 0 1Þ=2 m¼1, m6¼n

sin

h0 ðnÞh0 ðmÞ

sin ðn mÞωs sin ðN 0 þ 1 n mÞωs þ nm N0 þ 1 n m

The passband energy EP(h0) with passband cut-off frequency ωp is

ð2:54Þ

2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks

57

Zωp E S ð h0 Þ ¼

ðH 0 ðωÞ 1 0

Þ2 dω ¼ h0 ðððN 0 þ 1Þ=2Þ 1Þ2 ωp þ 4ðh0 ððN 0 þ 1Þ=2Þ 1Þ N þ1 ðNX 0 1Þ=2 n ωp sin 0 2 h0 ð nÞ N0 þ 1 n¼1 n 2 ðNX ðNX 0 1Þ=2 0 1Þ=2 sin ðN 0 þ 1 2nÞωp þ2 þ2 h0 ðnÞ2 ωp N 0 þ 1 2n n¼1 n¼1

ðNX 0 1Þ=2

h0 ðnÞh0 ðmÞ

m¼1, m6¼n

sin ðn mÞωp sin ðN 0 þ 1 n mÞωp þ nm N0 þ 1 n m

ð2:55Þ

The second objective is formulated to minimize the total stopband and passband energies of both the analysis ﬁlters, i.e., min E ðh0 h1 Þ,

h0 h1 ,

ð2:56Þ

where E(h0h1) ¼ ES(h0) + ES(h1) + EP(h0) + EP(h1) is the total energy of both the ﬁlters. In this objective, ﬁlters proximating the ideal step ﬁlters are being sought. The overall objective is a combination of the main objective, maximizing RMSE and entropy, and the secondary objective. min ωðRMSEðA, R, F ÞÞ þ ð1 ωÞEðh0 , h1 Þ,

h0 , h1 ,

ð2:57Þ

where 0 w 1 is a constant weight; A is the image fusion algorithm; R is the reference image, F is fused image. Note that when w ¼ 1, the objective becomes just maximizing RMSE. The nonlinear optimization problem in Eq. (2.57) can be solved by various nonlinear optimization methods such as random search methods, simulated annealing, and evolutionary algorithms. Simulated annealing was used in our experiments. The ﬁlters designed based on Eq. (2.57) depend on the training images: R and F. Different training images lead to different ﬁlter design. Instead of obtaining the best ﬁlter bank for each training images, we want a ﬁlter bank that performs well on many images. To achieve this goal, symmetric improvement ratio is used to measure how well a ﬁlter bank performs across multiple images. Symmetric improvement ratio [43] is a normalization method that avoids anomalies in inconsistent orderings of two hypotheses due to the choice of the baseline hypothesis. For two hypotheses,

58

2 Pixel-Level Image Fusion

A and B, with performance values (A1, . . ., Am) and (B1, . . ., Bm) respectively on m test cases, the symmetric improvement ratio S is deﬁned as follows: Si ¼

Ai =Bi 1

if

Ai Bi

1 Ai =Bi

if

Ai Bi

S¼

m 1 X S m i¼1 i

,

ð2:58Þ ð2:59Þ

In the regular improvement ratio Ai/Bi , degradations are between 0 and 1, whereas improvements are between 1 and inﬁnity. Consequently, when improvement ratios are averaged, degradations carry less weight than improvements. This problem does not exist in symmetric improvement ratio because it puts equal weight on degradations and improvements. In our method, the symmetric improvement ratio is used to aggregate the performance values of a ﬁlter bank on different training images. The ﬁlter bank that has the best symmetric improvement ratio is selected as the ﬁnal result.

2.4.5

Experiments

In our experiments, we have applied the optimization- and generalization-based method to design biorthogonal ﬁlter banks for image fusion and obtained promising results. Our goal is to ﬁnd generalizable ﬁlter banks that perform better than the best solutions. Then the following ﬁve sets of images used in the experiments, shown in Fig. 2.17, consist of ﬁve pair of natural images, and their reference images were picked arbitrarily as the training images used in the optimization and generalization phases of our method. The other images were used to test the performance of the ﬁnal result. In the experiments, the performance of the ﬁlter banks was evaluated using the generic image fusion method. In the design of 9/7 biorthogonal ﬁlter banks, the widely used Antonini 9/7 ﬁlter bank was used as the baseline solution. In the optimization phase, either the multi-starts of the ASA was applied. The performance of ﬁlter banks was evaluated. In calculating the passband and stopband energies, the passband cut-off frequency ωp was set to 1 and the stopband cut-off frequency ωs was set to π 1. In our experiments, the initial points of ASA were always the baseline solution. It usually found better solutions. In the ﬁrst experiment of designing 9/7 biorthogonal ﬁlter banks, ASA was used as the optimization method and Eq. (2.57) with w ¼ 1, 0.8, 0.2, 0.1, and 0.02 were the objectives. The best result was obtained when w ¼ 0.02. The initial point was always the baseline solution. To speed up the convergence of ASA, we restricted the search ranges on the variables. In the design of 9/7 ﬁlter banks, the search ranges of the ﬁrst three coefﬁcients of H0(z) were limited to [0.01, 0.1], [0.1, 0.01], and

2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks

59

Fig. 2.17 Image pairs and ideal fusion result used to optimize ﬁlter banks. The top four images are part of “clock” and the bottom images are “cameraman.” They are different focus images, and the right side is their reference images

60

2 Pixel-Level Image Fusion

Fig. 2.18 Amplitude responses of the pair of analysis ﬁlters in Antonini (9/7) ﬁlter bank [44] (left) and those in the new (9/7) ﬁlter bank obtained by the proposed method (right)

[0.12, 0.11], respectively. The search ranges of the ﬁrst two coefﬁcients of H1(z) were limited to [0.01, 0.1] and [0.1, 0.01], respectively. The execution of ASA was limited to 10,000 function evaluations in each run, which corresponds to 50 h for a pair of images. The amplitude responses of Antonini 9/7 ﬁlter bank and the new ﬁlter bank designed by our method are shown in Fig. 2.18. Since Antonini ﬁlters were designed to have high-order regularity, their amplitude responses are very smooth. On the other hand, our new ﬁlters have sharper transitions. An image fusion example is shown in Fig. 2.19. Input images in Fig. 2.19a, b are fused using the proposed ﬁlter bank with generic fusion scheme into the fused image in Fig. 2.19d. Image fusion result using Antonini 9/7 ﬁlter bank is shown in Fig. 2.19c. Tables 2.8, 2.9 and 2.10 compare the performance of our new 9/7 ﬁlter bank with that of Antonini (9/7) ﬁlter bank with that of Antonini 9/7 ﬁlter bank by using generic fusion scheme. Tables 2.8, 2.9 and 2.10 show that the new ﬁlter bank improves Antonini ﬁlter bank on most images. For some images such as “disk” and “cameraman,” the improvement is signiﬁcant. Although the new ﬁlter bank was designed based on ﬁve training images, it performs well on other images and under different image fusion scheme. Furthermore, in the experiments, we found that the entropy of the image is increased. The evaluation measure entropy is deﬁned as EN ¼

H X

pi ln pi

ð2:60Þ

i

where pi is the probability when the pixel number of gray leveli. The measure indicates how much information of an image is contained. So the larger the number is, the better the quality of the fused image. Table 2.8 demonstrate the comparison of our new 9/7 ﬁlter bank with Antonini (9/7) ﬁlter bank.

2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks

61

Fig. 2.19 The fusion results of “disk” images, (a) and (b) are the input multi-focus images, (c) is the fused image using the optimal ﬁlter bank and (d) is the fused image using the Antonini 9/7 ﬁlter bank Table 2.8 The fusion results of “disk” images

Filter selection 9/7 wavelet Optimal 9/7 wavelet

RMSE 12.3364 12.2813

Entropy 4.9880 4.9965

Table 2.9 The fusion results of “clock” images

Filter selection 9/7 wavelet Optimal 9/7 wavelet

RMSE 10.6446 10.0970

Entropy 4.8247 4.8393

Table 2.10 The fusion results of “cameraman” images

Filter selection 9/7 wavelet Optimal 9/7 wavelet

RMSE 14.2595 13.9386

Entropy 4.8842 4.8907

62

2.4.6

2 Pixel-Level Image Fusion

Conclusion

The design of digital ﬁlter banks is important because ﬁlter banks have been used in many applications, including moderns, data transmission, speech and audio coding, as well as image fusion. This method is generally enough to be applicable to the design of other types of ﬁlter banks, such as multi-rate and multi-band ﬁlter banks for various applications. In the experiment, the fusion result using optimal ﬁlter banks outperforms the fusion result using traditional 9/7 ﬁlter banks.

2.5

Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)

In applications such as military, navigation, and concealed weapon detection, different imaging systems such as CCD/VI and forward looking infrared (FLIR)/ IR are used to monitor a targeted scene. In these applications, details of the target are very difﬁcult to detect from VI image alone because of low visual contrast. But they can be easily obtained from IR image. VI image is able to provide background details such as vegetation, texture, area, and soil. It is a tedious task to derive meaningful information by consecutively looking at multiple images. For better understanding of the scene, useful information need to be integrated from these multiple images in a single image. As discussed in Chap. 1, a fusion algorithm can integrate information from these complementary images to give a composite image by transferring useful information of source images in a single image with less fusion loss and artifacts. As briefed in Sect. 2.1, many MSD fusion techniques such as pyramid and wavelet-based methods have been developed in the perspective of fusion objective. It has been observed that edge-preserving ﬁlters can extract more salient information such as lines and edges than that of the pyramid and wavelet methods. They can also generate appreciable fused images compared to that of remaining MSD methods (pyramid and wavelets). These advantages of EPD motivated us to explore an edgepreserving decomposition process called anisotropic diffusion for multi-scale fusion. As shown in Fig. 2.20, anisotropic diffusion is used to decompose source images into approximation and detail layers. Final detail and approximation layers are calculated with the help of Karhunen–Loeve transform (KL-transform) and linear superposition, respectively. A fused image is generated from the linear combination of ﬁnal detail and approximation layers. The advantages of this ADF method are outlined as follows: 1. This method is very effective and easy to implement. 2. It transfers most of the information from source images to the fused image. 3. Fusion loss is very less.

2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)

Base layers

63

Base layer fusion

Source images

Anisotropic Diffusion

Linear Combination

Fused Image

⊕ Detail layers

Detail layer fusion

Fig. 2.20 Schematic diagram of the ADF method

4. Fusion artifacts introduced in the fused image are almost negligible. 5. Computational time is less.

2.5.1

Anisotropic Diffusion

Diffusion is a kind of smoothing, in the context of image processing. It is classiﬁed as isotropic diffusion and anisotropic diffusion. In isotropic, the process of smoothing will be done isotopically all over the image without bothering about the edge information. However, anisotropic diffusion process (Perona and Malik [45]) will smooth a given image at homogeneous regions while preserving non-homogeneous regions (edges) using partial differential equations (PDE). It overcomes drawbacks of the isotropic diffusion. The isotropic diffusion uses inter-region smoothing. So edge information is lost. In contrast, the anisotropic diffusion uses intra-region smoothing to generate coarser resolution images. At each coarser resolution, edges are sharp and meaningful. For better understanding, isotropic and anisotropic diffusion processes are demonstrated in Fig. 2.21. Figure 2.21a is the original Lena image. Figure 2.21b, c are the isotropic and anisotropic diffused images, respectively. From the ﬁgure, it can be observed that isotropic diffused image is totally blurred. However, in the anisotropic diffused image, smoothing process is done in required regions while preserving the edge information. Anisotropic diffusion comes under EPD techniques, whereas average and Gaussian ﬁltering comes under non-EPD-based ﬁltering techniques. The anisotropic diffusion equation uses ﬂux function to control the diffusion of an image I as

64

2 Pixel-Level Image Fusion

Fig. 2.21 Demonstration of isotropic and anisotropic diffusion process

I t ¼ cðx, y, t ÞΔI þ ∇c:∇I,

ð2:61Þ

where c(x, y, t) is the ﬂux function or rate of diffusion, Δ is the Laplacian operator, ∇ is the gradient operator, and t is the time or scale or iteration. Equation (2.61) is also called as heat equation. Forward-time central-space (FTCS) scheme is used to solve this equation. The solution for this PDE is directly given by

t t I tþ1 i,j ¼ I i,j þ λ cN ∇N I þ cS ∇S I þ cE ∇E I þ cW ∇W I i,j

ð2:62Þ

In Eq. (2.62), I tþ1 i,j is the coarser resolution image at t + 1 scale which depends on the previous coarser scale image I ti,j. λ is the stability constant satisfying 0 λ 1/4. Superscript and subscripts are applicable to all terms enclosed in the square bracket. ∇N, ∇S, ∇E, and ∇W are the nearest-neighbor differences in north, south, east, and west directions, respectively. They are deﬁned as ∇N I i,j I i1,j I i,j , ∇S I i,j I iþ1,j I i,j , ∇E I i,j I i,jþ1 I i,j ,

ð2:63Þ

∇W I i,j I i,j1 I i,j : Similarly, cN, cS, cE, and cW are conduction coefﬁcients or ﬂux functions in north, south, east, and west directions, respectively.

2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)

ctNi,j ¼ g ð∇I Þtiþð1=2Þ,j ¼ g ∇N I ti,j , ctSi,j ¼ g ð∇I Þtið1=2Þ,j ¼ g ∇S I ti,j , ctEi,j ¼ g ð∇I Þti,jþð1=2Þ ¼ g ∇E I ti,j , ctWi,j ¼ g ð∇I Þti,jð1=2Þ ¼ g ∇W I ti,j :

65

ð2:64Þ

In Eq. (2.64), g(.) is a monotonically decreasing function with g(0) ¼ 1. Perona and Malik [45] suggested two functions as mentioned below: gð∇I Þ ¼ eððk∇I k=K Þ Þ , 2

gð∇I Þ ¼

1þ

1

2 :

k∇I k K

ð2:65Þ ð2:66Þ

These functions offer a trade-off between the smoothing and edge preservation. The ﬁrst function is useful if the image consists of high-contrast edges over low-contrast edges. The second function is preferred if the image consists of wide regions over smaller regions. Both the functions consist of a free parameter k. This constant k is used to decide the validity of a region boundary based on its edge strength. In subsequent discussions, the anisotropic diffusion for a given image I is denoted as aniso(I).

2.5.2

Anisotropic Diffusion-Based Fusion Method (ADF)

Various steps involved in the ADF method are briefed here below and discussed in detail in the following sub-sections. 1. 2. 3. 4.

Extract base and detail layers from source images using the anisotropic diffusion. Fuse detail layers based on the KL transform. Fuse base layers using the weighted superposition. Add ﬁnal detail and base layers.

2.5.2.1

Extracting Base and Detail Layers

Consider source images fI n ðx, yÞgNn¼1 of size p q. All images are assumed to be co-registered. These images are passed through the edge-preserving smoothing anisotropic diffusion process for obtaining base layers.

66

2 Pixel-Level Image Fusion

Fig. 2.22 Base and detail layer decomposition of kayak dataset. (a) IR image, (b) VI image, (c) base layer of IR image, (d) base layer of VI image, (e) detail layer of IR image, (f) detail layer of VI image

Bn ðx, yÞ ¼ anisoðI n ðx, yÞÞ,

ð2:67Þ

where Bn(x, y)is the n-th base layer and aniso(In(x, y)) represents the anisotropic diffusion process on n-th source image. Required discussion on the anisotropic diffusion process can recall from Sect. 2.5.1. Detail layers can be obtained by subtracting base layers from source images. Dn ðx, yÞ ¼ I n ðx, yÞ Bn ðx, yÞ

ð2:68Þ

Base and detail layer decomposition of a kayak dataset is shown in Fig. 2.22.

2.5.2.2

Detail Layer Fusion Based on KL Transform

Detail layers are fused with the help of KL transform. This technique transforms correlated components into uncorrelated components. It provides a compact representation for the given dataset. Other names for KL transform are Hotelling

2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)

67

transform or principle component analysis. KL transform basis vectors depend on the dataset unlike fast Fourier transform (FFT) and discrete cosine transform (DCT). The algorithm used for the purpose of detail layer fusion is as follows: 1. Let us take two detail layers D1(x, y) and D2(x, y) corresponding to two input images I1(x, y) andI2(x, y). Arrange these detail layers as column vectors of a matrix X. 2. Find the covariance matrix CXX of X by considering each row as an observation and each column as a variable. ξ 1 ð 1Þ ξ 2 ð 1Þ 3. Calculate eigenvalues σ 1, σ 2 and eigenvectors ξ1 ¼ and ξ2 ¼ ξ 1 ð 2Þ ξ 2 ð 2Þ of CXX. 4. Compute uncorrelated components KL1 and KL2 corresponding to the large eigenvalue (σ max ¼ max (σ 1, σ 2)). If ξmax is the eigenvector corresponding to the σ max, then KL1and KL2 are given by ξ ð1Þ KL1 ¼ Pmax , ξmax ðiÞ

ξ ð2Þ KL2 ¼ Pmax , ξmax ðiÞ

i

i

ð2:69Þ

5. The fused detail layer D is given by Dðx, yÞ ¼ KL1 D1 ðx, yÞ þ KL2 D2 ðx, yÞ:

ð2:70Þ

The generalized expression for N-detail layers is

Dðx, yÞ ¼

N X

KLn Dn ðx, yÞ:

ð2:71Þ

n¼1

2.5.2.3

Base Layer Fusion

Here, base layer information in each image is chosen by assigning proper weights wn to them. Final base layer is calculated as Bðx, yÞ ¼

N X n¼1

where ∑wn ¼ 1 and 0 wn 1.

wn Bn ðx, yÞ,

n ¼ 1, . . . N

ð2:72Þ

68

2 Pixel-Level Image Fusion

If w1 ¼ w2 . . . ¼ wN ¼ N1 , then this process represents the average of base layers. The ﬁnal detail layer is obtained by using KL transform in Eq. (2.71), and the ﬁnal base layer is obtained by using weighted superposition in Eq. (2.72).

2.5.2.4

Super Position of Final Detail and Base Layers

The fused image F is given by a simple linear combination of ﬁnal base layer B and detail layer D. F ¼BþD

2.5.3

ð2:73Þ

Experimental Setup

This section presents the image database on which experiments are carried out, other fusion methods that are used for comparison, objective fusion metrics, and free parameter analysis of the ADF algorithm.

2.5.3.1

Image Database

ADF method is applied on various IR and VI image pairs. As shown in Fig. 2.23, qualitative and quantitative analyses of the ADF method are done for ten image datasets. These datasets consist of nine gray-scale image pairs and one color image

Fig. 2.23 VI and IR image dataset. (a) Battle ﬁeld, (b) tree, (c) forest (d) industry, (e) kayak, (f) garden, (g) gun, (h) pedestrian, (i) trafﬁc, (j) house image pairs

2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)

69

pair. These image datasets are referred with their names in later part of the discussion as shown in Fig. 2.23. Qualitative analysis is presented for two image pairs (kayak and gun). However, quantitative analysis is done for all ten image datasets.

2.5.3.2

Fusion Metrics

Gradient-based fusion objective performance characterization (Petrovic and Xydeas [46]) is considered for the analysis of the ADF method. Fusion quantiﬁcation is done by considering the information contribution by each sensor, fusion gain or fusion score (QXY/F), fusion loss (LXY/F), fusion artifacts (NXY/F). Metrics QXY/F, LXY/F, and NXY/F represent complementary information. Their summation should be unity. However, it is not satisfying for all types of XY=F source images. So, modiﬁed fusion artifacts (N k ) are considered to make sumXY/F mation unity. For better performance, Q value should be high and LXY/F, NXY/F, XY=F values should be low. Nk

2.5.3.3

Methods for Comparison

This method is compared with seven existing multi-scale fusion methods. Out of these, three (GRAD) [10], FSD [11], and RATIO [14]) are pyramid-based methods, two (SIDWT [47] and MSVD [19]) are transform domain-based methods. Recently proposed EPD-based method (CBF [29]) is also considered for comparison. For all of these methods, default parameter settings are adopted.

2.5.3.4

Effect of Free Parameters on the ADF Method

Here, with the help of objective fusion metrics, the effect of free parameters on the ADF method is assessed. In this, individual source images are ﬁltered using the anisotropic diffusion process. The degree of smoothing depends on parameters constant k, number of iterations t, and stability constant λ. The effect of these parameters on the ADF is analyzed by considering average fusion metric values calculated over 10 image datasets. Effect of k on ADF algorithm is shown in XY=F Fig. 2.24. This ﬁgure demonstrates the behavior of QXY/F, LXY/F, NXY/F, and N k XY/F with respect to the change of k. It can be observed that Q looks almost constant XY=F XY/F XY/F . Metrics N and N k are almost constant for after k ¼ 20. It is also true for L any value of k. Similar analysis can be done for t. In Fig. 2.25, as the number of XY XY=F XY iterations t increases Q =F , L =F , NXY/F, and N k change but for t 10, they are almost constant. Similarly, as demonstrated in Fig. 2.26, above λ ¼ 0.1, fusion metrics show consistent performance. Hence, from the above analysis, an optimal performance of the ADF method is obtained for free parameters t ¼ 10, λ¼0.15, and

70

2 Pixel-Level Image Fusion

Fig. 2.24 Fusion score, fusion loss, and fusion artifacts with respect to k

Fig. 2.25 Fusion score, fusion loss, and fusion artifacts with respect to t

k ¼ 30. Equation (2.65) is considered for g(.) function. Weights w1 ¼ w2 ¼ 0.5 are taken for base layer fusion.

2.5.4

Results and Analysis

In this section, a comparative analysis of various image fusion algorithms with ADF algorithm for ten image datasets is presented in terms of visual quality and fusion metrics.

2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)

71

Fig. 2.26 Fusion score, fusion loss, and fusion artifacts with respect to λ

2.5.4.1

Qualitative Analysis

In this section, qualitative analysis of fused images of different methods along with the ADF method is presented. Figure 2.27 displays the visual quality of various fusion methods of kayak image dataset. Figure 2.27a, b are VI and IR images. These images provide idea about sea shore, persons, ship, and sky. But no single image gives complete information about the scene. By the fusion task, one can get complete idea about the scene. Figure 2.27c–i present the results of different methods. RATIO method introduces artifacts into the fused image. CBF introduces gradient reversal artifacts in the fused image. GRAD, FSD, MSVD, and SIDWT images are ﬁne, but these fused images are not able to provide entire information of the scene. However, fused image of the ADF method gives clear idea of the scene (seashore, ship, sky, and persons) without introducing extra information into the combined image compared to other methods. Qualitative comparison of a gun dataset is presented in Fig. 2.28. Figure 2.28a, b show VI and MMW images. Information about three persons with some object holding by the middle person can be found in VI image while information about the weapon conveyed by the MMW image. But no single image conveys entire information about the scene like which person concealed the pistol. By the fusion process one can understand the scene. Figure 2.28c–i show fused images of various methods. RATIO fused image does not provide the pistol information. GRAD, FSD, MSVD, and SIDWT methods are able to integrate source image information into the fused image, but these images are not visually signiﬁcant to provide more complementary details of source images. Even though CBF image is visually good, it produces artifacts in the background of the combined image. However, ADF method transfers all the necessary image content of input imagery into the combined image with

72

2 Pixel-Level Image Fusion

Fig. 2.27 Comparison of visual quality of fused images of various methods for a kayak dataset. (a) VI image, (b) IR image, (c) GRAD, (d) FSD, (e) RATIO, (f) SIDWT, (g) MSVD, (h) CBF, (i) ADF

Fig. 2.28 Comparison of visual quality of fused images of various methods for a gun dataset. (a) VI image, (b) MMW image, (c) GRAD, (d) FSD, (e) RATIO, (f) SIDWT, (g) MSVD, (h) CBF, (i) proposed ADF

reduced artifacts. One could say that the third person from the left concealed the weapon under his shirt from the fused image.

2.5.4.2

Quantitative Analysis

Quantitative analysis is done with the help of Petrovic metrics. Figure 2.29 displays the bar chart comparison of average fusion metric values calculated over 10 image datasets of various image fusion methods. As demonstrated in Fig. 2.29, ADF

2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)

73

Fig. 2.29 Quantitative analysis of various MSD fusion methods. (a) Fusion score QXY/F, (b) fusion XY=F loss LXY/F, (c) fusion artifacts NXY/F, (d) modiﬁed fusion artifacts N k

method is giving superior performance in all fusion metrics. Note that these metrics summation should tend to unity.

2.5.4.3

Computational Time

A comparison of computational time of various image fusion methods for IR-VI datasets is shown in Table 2.11. The experiments are carried out on a computer with 4 GB RAM and 2.27 GHz CPU. Experimentation on each dataset is conducted for 25 times, and the average of 25 computational times is considered for better accuracy. The average computational time calculated over 10 image datasets is considered. ADF computational time is less than the average computational time of SIDWT, MSVD, and CBF and is more than that of remaining methods GRAD, FSD, and RATIO. It is found that ADF method is giving superior results than the pyramid, transform domain, and edge-preserving methods for IR-VI images in terms of visual quality and Petrovic metrics. Next, we discuss another MSD fusion technique based on two-scale image decomposition and saliency extraction.

74

2 Pixel-Level Image Fusion

Table 2.11 Average computational time in seconds of various fusion methods

Time (s)

2.6

Method GRAD 0.3274

FSD 0.1210

RATIO 0.1749

SIDWT 0.7681

MSVD 1.4042

CBF 73.64

ADF 4.5875

Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection

Here, an image fusion method based on saliency detection and two-scale image decomposition is presented to fuse IR and VI images which is shortly represented as TIF. This method is beneﬁcial because of the visual saliency extraction process introduced in the fusion algorithm. It can highlight the saliency information of source images very well. A new weight map construction process based on visual saliency is developed. This process is able to integrate the visually signiﬁcant information of source images into the fused image. Unlike, most of the multi-scale fusion techniques, this method uses two-scale image decomposition for base and detail layers. In ADF method, anisotropic diffusion used for this purpose. However, in TIF method, to reduce the computational time further, an average ﬁlter (non-EPD ﬁlter) is used instead of anisotropic diffusion. Hence, it is computationally fast and efﬁcient. TIF method is tested on several image pairs and is evaluated qualitatively by visual inspection and quantitatively using objective fusion metrics. Outcomes of the TIF method are compared with the state-of-the-art multi-scale fusion techniques. Results reveal that this method outperforms the existing methods including the ADF method.

2.6.1

Two-Scale Image Fusion (TIF)

TIF method needs three steps to perform fusion: image decomposition/analysis, fusion, and image reconstruction/ synthesis. Decomposition is done by using an average ﬁlter (mean ﬁlter) to obtain base and detail layers. These decomposed base and detail layers are fused using different fusion rules. Fused image is reconstructed from the ﬁnal base and detail layers. Block diagram of this method is shown in Fig. 2.30.

2.6.1.1

Two-Scale Image Decomposition

Consider two co-registered source images ϕ1(x, y) and ϕ2(x, y) of same size. These source images are decomposed into base layers containing large-scale variations and detail layers containing small variations. For this purpose, a simple mean ﬁlter is employed. This operation is represented by

Saliency maps

Image

Detail layers

Two-scale image decomposition

Fig. 2.30 Schematic diagram of the TIF method

Source images

Base layers

Weight maps

Final detail layer

Final base layer

Two-scale image reconstruction

Image

Fused image

2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection 75

76

2 Pixel-Level Image Fusion

ϕB1 ðx, yÞ ¼ ϕ1 ðx, yÞ μðx, yÞ,

ð2:74Þ

ϕB2 ðx, yÞ ¼ ϕ2 ðx, yÞ μðx, yÞ,

ð2:75Þ

where ϕB1 and ϕB2 are the base layers corresponding to source images ϕ1 and ϕ2, respectively. The mean ﬁlter of square window size wμ is given by μ and represents the convolution. Detail layers are extracted by subtracting base layers from source images. B ϕD 1 ðx, yÞ ¼ ϕ1 ðx, yÞ ϕ1 ðx, yÞ,

ð2:76Þ

B ϕD 2 ðx, yÞ ¼ ϕ2 ðx, yÞ ϕ2 ðx, yÞ,

ð2:77Þ

D where ϕD 1 and ϕ2 are detail layers. These detail layers are fused with the help of saliency maps which will be discussed in the next section.

2.6.1.2

Visual Saliency Detection

Visual saliency detection or saliency detection (SD) [48] is the process of detecting or identifying regions such as persons or objects or pixels which are more signiﬁcant than their neighbors. These salient regions drag more human visual attention compared to other regions present in the scene. In this method, introduced a simple yet an effective saliency map detection algorithm to extract visual saliency of VI and IR images for the purpose of fusion. As shown in Fig. 2.31, a mean ﬁlter is applied on each source image to reduce intensity variations between a pixel and its neighbors. This ﬁlter is a linear or non-EPD ﬁlter which performs smoothing operation on entire image without bothering about the edge information. A median ﬁlter is applied on each source image to remove noise or artifacts. This is a nonlinear ﬁlter. Hence, it performs smoothing operation on each source image while preserving the edge information. Saliency map of each source image is calculated by taking the difference of mean and median ﬁltering outputs because the difference of these ﬁltering outputs can highlight the saliency information such as edges and lines which are more signiﬁcant than its neighbors. A norm of the difference is calculated since we are interested only in the magnitude of differences. As shown in Fig. 2.31, the saliency map ξ of a source image ϕ is given as ξðx, yÞ ¼ ϕμ ðx, yÞ ϕη ðx, yÞj,

ð2:78Þ

where || is the absolute value. ϕμ is the output of a mean ﬁlter of a square window of size wμ. ϕη is the output of a median ﬁlter of (square window) size wη. This saliency detection algorithm can be extended to color image processing. Visual saliency extraction of color image ϕ is given as

2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection

77

Fig. 2.31 Schematic diagram of the visual saliency detection

ξðx, yÞ ¼ ϕμ ðx, yÞ ϕη ðx, yÞk,

ð2:79Þ

where kk is the L2 norm or Euclidean distance. This equation can be expanded as ξðx, yÞ ¼ de ϕμ , ϕη ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ r 2 2 G B B ¼ ϕRμ ðx, yÞ ϕRη ðx, yÞÞ2 þ ϕG μ ðx, yÞ ϕη ðx, yÞÞ þ ϕμ ðx, yÞ ϕη ðx, yÞÞ , ð2:80Þ

2

ϕRμ

3

2

ϕRη

3

6 G7 6 G7 R G 7 6 7 where deis the Euclidean distance, ϕμ ¼ 6 4 ϕμ 5 and ϕη ¼ 4 ϕη 5. Here, ϕμ , ϕμ , and

ϕBμ ϕBη ﬁltering output, respectively. Similarly, of median ﬁltering output, respectively. For source images ϕ1 and ϕ2, the saliency maps are denoted as ξ1 and ξ2, respectively. To understand this better, the analysis of battleﬁeld images is presented in Fig. 2.32. The VI and IR images are illustrated in Fig. 2.32a, b and corresponding saliency maps are presented in Fig. 2.32c, d, respectively.

ϕBμ are the R, G, and B components of mean B ϕRη , ϕG η , and ϕη are the R, G, and B channels

78

2.6.1.3

2 Pixel-Level Image Fusion

Weight Map Construction

VI and IR images provide complementary information, i.e., information available in one image may not be available in other image. For example, Fig. 2.32a is the VI image which provides information about the battleﬁeld, but it fails to provide information about the existence of a person. However, IR image in Fig. 2.32b is able to provide information about the person, but battleﬁeld information is insufﬁcient. Visually signiﬁcant information need to be integrated from both source images into a single image. This can be done by assigning proper weights (pixels with insigniﬁcant information are assigned low weightage and pixels with signiﬁcant information are assigned more weightage) to detail layers of the source images. This is because HVS is sensitive to the detail layer information of an image compared to its base layer information which is evident from Fig. 2.32. Making use of this observation, a new weight map construction technique is developed based on saliency information to fuse detail layers. The weight maps are calculated by normalizing the saliency maps because normalization brings the range to [0, 1] as well as satisfy the requirements relevant to the information. These weight maps are given by ψ 1 ðx, yÞ ¼

ξ1 ðx, yÞ , ξ1 ðx, yÞ þ ξ2 ðx, yÞ

ð2:81Þ

Fig. 2.32 Saliency maps and weight maps of a battleﬁled dataset. (a) and (b) are source images, (c) and (d) are saliency maps, (e) and (f) are weight maps

2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection

ψ 2 ðx, yÞ ¼

ξ2 ðx, yÞ , ξ1 ðx, yÞ þ ξ2 ðx, yÞ

79

ð2:82Þ

D where ψ 1 and ψ 2 are weight maps corresponding to ϕD 1 and ϕ2 calculated from ξ1 and ξ2, respectively. As shown in Fig. 2.32e and Fig. 2.32f, these weights ψ 1 and ψ 2 are complementary to each other, i.e., ψ 1 + ψ 2 ¼ 1. Therefore, if this weight map construction process assigns more weight to a pixel with signiﬁcant information in one detail image, then it assigns less weight to the pixel with insigniﬁcant information in another detail image at the same pixel location and vice versa.

2.6.1.4

Detail Layer Fusion

Signiﬁcant information from detail layers are integrated into a single image by D multiplying weight maps ψ 1 and ψ 2 with detail layers ϕD 1 and ϕ2 , respectively. D ϕD ðx, yÞ ¼ ψ 1 ðx, yÞϕD 1 ðx, yÞ þ ψ 2 ðx, yÞϕ2 ðx, yÞ,

ð2:83Þ

where ϕD is the ﬁnal detail layer.

2.6.1.5

Base Layer Fusion

An average fusion rule is employed to combine base layers. ϕB ðx, yÞ ¼

1 B ϕ ðx, yÞ þ ϕB2 ðx, yÞ , 2 1

ð2:84Þ

where ϕB is the ﬁnal base layer.

2.6.1.6

Two-Scale Image Reconstruction

Finally, the fused image is constructed from the linear combination of ﬁnal base and detail layers. γ ðx, yÞ ¼ ϕB ðx, yÞ þ ϕD ðx, yÞ:

ð2:85Þ

80

2 Pixel-Level Image Fusion

φ 1( , )

1(

, )

1(

, )

1(

, )

Proposed method

( , )

Proposed method

( , )

Proposed method

( , )

γ( , )

2( , )

φ (, ) 2

2(

, )

2(

, )

Fig. 2.33 Color image fusion process

2.6.1.7

Color Image Fusion

TIF algorithm can also be applied on color images by performing fusion process on individual color channels (red, blue and green). Finally, fused image is reconstructed by concatenating these fused color channels. As shown in Fig. 2.33, red channel of the fused image can be obtained by applying the TIF fusion method on red channels of source images. Similarly, the same procedure can also be applied on remaining channels. By concatenating these color channels, fused image can be calculated.

2.6.2

Experimental Setup

The experimental setup discusses regarding image database on which experiments are carried out, other fusion methods that are used for comparison, objective fusion metrics, and free parameter analysis of the TIF algorithm.

2.6.2.1

Image Database

TIF method is applied on various IR and VI image pairs. As shown in Fig. 2.24, qualitative and quantitative analysis of this method is done for ten image datasets. Qualitative analysis is presented for two image pairs (battleﬁeld and house). However, quantitative analysis is done for all ten image pairs.

2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection

2.6.2.2

81

Other Methods for Comparison

TIF method is compared with seven state-of-the-art multi-scale fusion methods. Out of these, three (GRAD [10], FSD [11], and RATIO [14]) are pyramid-based methods. Two (SIDWT [47] and MSVD [19]) are transform domain-based methods. Recently proposed EPD-based method (CBF [22]) is also considered. Along with these, ADF method in Sect. 2.5 is also considered for comparison. For all of these methods, default parameter settings are adopted.

2.6.2.3

Objective Fusion Metrics

As discussed in Sect. 2.5.3.2, petrovic metrics [46] are considered for quantitative analysis

2.6.2.4

Parameter Analysis

As discussed in Sect. 2.6.1.2, in the TIF, saliency map extraction is done with the help of mean and median ﬁlters. If the window sizes of these ﬁlters vary, then the performance of the method also changes. So, the performance need to be analyzed by changing window sizes wμ and wη. This can be done with the help of Petrovic fusion metrics [46]. This analysis is performed on ten image pairs as shown in Fig. 2.34, and the average of these metric values are considered for accuracy. While examining the effect of wμ on the TIF, median ﬁlter window size is considered as wη ¼ 3. While inspecting the inﬂuence of wη on TIF, wμ ¼ 35 is taken. As shown in Fig. 2.34a, when wμ is increased from 3 to 35 the performance of the algorithm increases. However, when it is further increased beyond 35, it can be observed that the performance gradually decreases. The effect of wn on the TIF is shown in Fig. 2.34b. As demonstrated in the ﬁgure, at wn ¼ 3, TIF is giving maximum performance. If wn further increases, then the performance decreases. Hence, parameters for this experiment are considered as wμ ¼ 35, wη ¼ 3.

Fig. 2.34 Effect of free parameters wμ and wη on the performance of the proposed TIF method. (a) Effect of wμ on the performance when wη ¼ 3, (b) effect of wn on the performance when wμ ¼ 35

82

2.6.3

2 Pixel-Level Image Fusion

Results and Analysis

TIF is evaluated both qualitatively (by visual display) and quantitatively (by measuring fusion metrics) to verify its effectiveness. Computational time analysis of the TIF in comparison with other methods is also carried out.

2.6.3.1

Qualitative Analysis

The visual quality comparison of battleﬁeld dataset are presented in Fig. 2.35. In this ﬁgure, (a) is VI image, (b) is IR image, (c)–(i) are the outputs of GRAD, FSD, RATIO, SIDWT, MSVD, CBF, and ADF methods, respectively, and (j) is the resultant image of the TIF. In Fig. 2.35a, VI image provides the ﬁeld information, but there is no sign of existence of a person because of low lighting condition. Figure 2.35b is the IR image. It is able to provide information about a person. But, we need total information in a composite image. As shown in Fig. 2.35, outputs of GRAD (Fig. 2.35c), FSD (Fig. 2.35d), SIDWT (Fig. 2.35f), and MSVD (Fig. 2.35g) provide visually less information. RATIO image (Fig. 2.35e) is visually distorted. Blocking artifacts can be observed in CBF result (Fig. 2.35h). ADF (Fig. 2.35i) resultant image is good. However, the TIF method is able to integrate effectively both battleﬁeld and person information from VI from IR images into the fused image (Fig. 2.35j) compared to the remaining methods. Figure 2.36a is a VI image. It gives the visual information about the scene. Figure 2.36b is the IR image, which gives information which is not available in VI. A fused image has to contain the complementary information in it. In Fig. 2.36, GRAD, FSD, SIDWT, MSVD, and ADF are not able to give complete information about the scene. RATIO and CBF outputs are not clear because of artifacts whereas output of the TIF method provides complementary information of source images very well.

Fig. 2.35 Comparison of visual quality of fused images of various methods for battleﬁeld dataset. (a) VI image, (b) IR image, (c) GRAD, (d) FSD, (e) RATIO, (f) SIDWT, (g) MSVD, (h) CBF, (i) ADF, (j) TIF

2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection

83

Fig. 2.36 Comparison of visual quality of fused images of various methods for a house dataset. (a) VI image, (b) IR image, (c) GRAD, (d) FSD, (e) RATIO, (f) SIDWT, (g) MSVD, (h) CBF, (i) ADF, (j) TIF

2.6.3.2

Quantitative Analysis

Sometimes visual quality alone is not sufﬁcient to judge the effectiveness of a fusion algorithm. It has to be assessed both qualitatively and quantitatively. So far, visual quality analysis is discussed. Now, the TIF method will be assessed using Petrovic fusion metrics on ten VI and IR image datasets. Bar chart comparison of various methods along with this method is presented in Fig. 2.37. Average Petrovic metric

Fig. 2.37 Quantitative analysis of TIF method in comparison with other MSD methods. (a) Fusion XY=F score QXY/F, (b) fusion loss LXY/F, (c) fusion artifacts NXY/F, (d) modiﬁed fusion artifacts N k

84

2 Pixel-Level Image Fusion

Table 2.12 Average computational time comparison Time (s)

GRAD 0.2918

FSD 0.1077

RATIO 0.1544

SIDWT 0.6916

MSVSD 1.1710

CBF 56.7938

ADF 4.0574

TIF 0.4846

values calculated over 10 image datasets are considered for quantitative analysis. From this bar chart comparison, it can be observed that TIF is outperforming all the existing methods in terms of fusion metrics.

2.6.3.3

Computational Time

Any image fusion method is preferred to have less computational time along with the better visual quality and fusion metric values for real-time implementation. The computational time comparison of various methods for different image pairs is presented in Table 2.12. Experiments are conducted on a computer with 2.27 GHz CPU and 4 GB RAM. The average computational time is calculated over 10 image pairs as shown in Table. 2.12. From the average computational time, one can conclude that this method has more execution time than GRAD, FSD, RATIO methods and has less computational time than SIDWT, MSVD, CBF, and ADF methods. From the above simulations and analysis, it can be observed that when compared to the existing methods, TIF is capable of transferring most of the useful source information into the fused image with less information loss and less fusion artifacts. It also consumes considerably less computational time thus preferable in real-time implementation. So far, we have discussed about ADF and TIF methods in Sects. 2.5 and 2.6. These methods are developed for VI and IR images. In next section, a fusion algorithm for multi-focus images will be discussed.

2.7

Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency Detection (MSSSF)

In digital photography, two or more objects of a scene cannot be focused at the same time as discussed before. If one object is focused, then information about other objects may be lost and vice versa. Multi-focus fusion (MFF) is the process of generating an all-in-focus image from several out-of-focus images. In this section, a recently proposed multi-focus image fusion algorithm [49] based on visual saliency and weight map construction is presented. This method is very advantageous because the saliency map used here can highlight the saliency information present in source images with well-deﬁned boundaries. A weight map construction process based on saliency information is developed. This process can identify focus and defocus regions present in the source image very well. It can also integrate only

2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .

85

focused region information into the fused image. Now, we discuss the visual saliency detection process-based maximum symmetric surround.

2.7.1

Maximum Symmetric Surround Saliency Detection (MSSS)

SD [50] is the process of detecting and highlighting visually signiﬁcant regions which drags the human visual attention compared to other regions present in the scene. SD is useful in many applications such as object segmentation, object recognition, and adaptive compression. However, here SD is utilized for multifocus fusion. A good SD method exhibits properties mentioned below. • • • •

It should highlight largest salient regions than smallest regions. It should uniformly highlight salient regions. Boundaries need to be well-deﬁned. It should ignore texture or noise artifacts.

In this view, so far many SD methods have been proposed. SD algorithms (Frintrop et al. [51]; Harel et al. [52]; Hou and Zhang [43]; Itti et al. [53]; Ma and Zhang [54]) produce low-resolution saliency maps. Some SD algorithms (Harel et al. [52]; Hou and Zhang [43]; Ma and Zhang [54]) generate ill-deﬁned object boundaries. The saliency maps of these methods are not useful to generate weight maps for the purpose of fusion because of these limitations. Achanta et al. [50] proposed a frequency-tuned SD method which overcomes the limitations of the existing saliency methods. This SD method is able to generate uniformly highlighted full-resolution saliency maps with well-deﬁned boundaries. However, it fails if the image consists of complex background or large salient regions. To solve these problems, Achanta and Susstrunk [48] proposed another SD algorithm called maximum symmetric surround SD method which can highlight the salient object along with well-deﬁned boundaries. MSSS algorithm is preferred for fusion over other SD methods because: 1. It gives saliency maps with full-resolution and well-deﬁned boundaries. 2. Salient regions are calculated based on symmetric surrounds. Hence, it can effectively highlight salient regions in images with complex background. In multi-focus images, focused regions provide more visual information than defocused regions because focused regions are more salient than defocused regions. So, these salient regions (visually signiﬁcant regions) should be identiﬁed from source images using SD algorithms. Hence, we adopted a SD algorithm for visual saliency extraction for the purpose of fusion. At ﬁrst, Achanta et al. [50] proposed a frequency-tuned saliency detection algorithm to utilize almost all low-frequency content and most of the high-frequency content to obtain perceptually good saliency maps with full resolution. This saliency

86

2 Pixel-Level Image Fusion

map is obtained by taking the Euclidean distance between the average of an image Iμ and each pixel of the Gaussian blurred version If(u, v) of the same image. Sðu, vÞ ¼ I μ I f ðu, vÞ,

ð2:86Þ

where S(u, v) is the saliency map at a pixel location (u, v). Gaussian blur of size 3 3 is chosen to get If(u, v). However, this method fails when the source image contains complex background. It highlights the background along with the salient object since this method treats entire image as the common surround for all pixels in the image. Treating entire image as common surround is not desirable. To detect a pixel at the center of the salient object, it should contain small lower cut-off frequency. High lower cut-off frequency is required to detect a pixel near boundary. So, as we approach image boundaries, we should use local surround regions instead of common surround regions for detecting a pixel. This can be achieved by deﬁning surround symmetry around the center pixel of its own sub-image near the boundary. This process can increase the lower cut-off frequency. MSSS saliency map detection (Achanta and Susstrunk [48]) of an image I of width w and height h is deﬁned as Sss ðu, vÞ ¼ I μ ðu, vÞ I f ðu, vÞ:

ð2:87Þ

Here, Iμ(u, v) is the average of the sub-image at central pixel (u, v) and it is given as I μ ðu, vÞ ¼

uþu 1 X0 A i¼uu

0

vþv X0

I ði, jÞ,

ð2:88Þ

j¼vv0

where u0,v0 represent off-sets and A indicates the area. u0 ¼ min ðu, w uÞ,

ð2:89Þ

v0 ¼ min ðv, h vÞ, Sub-images obtained using Eqs. (2.88) and (2.89) are maximum symmetric surround regions for a given central pixel. In multi-focus images, focused regions provide visually more information than defocused regions. In other way, focused regions are more salient than defocused regions. So, salient regions need to be detected from these out-of-focus images using SD algorithms. It can be observed that the MSSS saliency detection algorithm is able to extract salient regions of multi-focus images. The multi-focus datasets used for simulations are shown in Fig. 2.38 and their corresponding saliency maps are displayed in Fig. 2.39. The process of saliency extraction using MSSS algorithm is denoted as

2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .

87

Fig. 2.38 Multi-focus image datasets: (a) ﬂower, (b) book, (c) bookshelf, (d) clock, (e) aircraft, (f) pepsi, (g) bottle, (h) parachute, (i) leopard, (j) ﬂower wage

Fig. 2.39 Saliency maps of multi-focus image datasets: (a) ﬂower, (b) book, (c) bookshelf, (d) clock, (e) aircraft, (f) pepsi, (g) bottle, (h) parachute, (i) leopard, (j) ﬂower wage

S ¼ MSSSðI Þ,

ð2:90Þ

where I is the input image and S is the output saliency map. Using this saliency map, a new multi-focus image fusion algorithm is developed in the following section.

2.7.2

MSSS Detection-Based Image Fusion (MSSSF)

The key idea of this method is illustrated in the block diagram (Fig. 2.40). It is summarized in the following steps:

88

2 Pixel-Level Image Fusion

Base layers

Base layer fusion

Source images Two-scale image decomposition

Two-scale image reconstruction

Detail layers

Detail layer fusion

MSSS Saliency Detection

Weight map

Fused Image

Construction

Fig. 2.40 General block diagram of the MSSSF algorithm

Decompose source images into base and detail layers using an average ﬁlter. Calculate saliency maps of each source image using MSSS detection algorithm. Compute weight maps from extracted saliency map of each source image. Scale detail layers with these weight maps and combine all the scaled detail layers to obtain ﬁnal detail layer. E. Compute ﬁnal base layer by taking the average of all base layers. F. Take linear combination of ﬁnal base and detail layers to get the fused image.

A. B. C. D.

2.7.2.1

Two-Scale Image Decomposition

Let us consider co-registered source images fI n ðx, yÞgNn¼1 of same size p q. These N-images are decomposed into base layers Bn containing large-scale variations. Bn ¼ I n A,

ð2:91Þ

where A is an average ﬁlter of size w. The convolution operation is indicated by . The detail layers Dn containing small-scale variations are obtained by subtracting base layers Bn from their corresponding source images In. Dn ¼ I n Bn :

ð2:92Þ

Base and detail layer decomposition for a ﬂower dataset is shown in Fig. 2.41.

2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .

89

Fig. 2.41 Base and detail layers of a ﬂower dataset: (a) left focused ﬂower image, (b) and (c) are base and detail layers of (a), (d) right focused ﬂower, (e) and (f) are base and detail layers of (d)

2.7.2.2

Saliency Detection Algorithm

Saliency information from out-of-focus images is extracted using MSSS detection algorithm [50]. This algorithm is reviewed in Sect. 2.7.1. The process of saliency extraction from source images In is represented as Sn ¼ MSSSðI n Þ,

ð2:93Þ

where Sn is the saliency map of n-th source image. Saliency maps of a ﬂower dataset are shown in Fig. 2.42.

Fig. 2.42 Saliency maps of a ﬂower dataset. (a) and (b) are saliency maps of left and right focused images

90

2 Pixel-Level Image Fusion

Fig. 2.43 Weight maps of a ﬂower dataset (red rectangle indicates the focused region; green rectangle indicates the defocused region). (a) and (b) are weight maps of left and right focused images

2.7.2.3

Weight Map Calculation

In digital photography, each multi-focus image provides information about a particular focused region. We need to integrate all focused regions into a single fused image. This can be done by properly choosing weight map of each source image. These weight maps should highlight the focused and defocused regions of the source images. Figure 2.43 shows the weight maps of the ﬂower dataset. These weight maps represent the complementary information, i.e., focused and defocused regions. For example, as shown in Fig. 2.43, weight maps of focused and defocused regions are highlighted in red and green rectangles, respectively, representing the complementary information. These weight maps are calculated by normalizing the saliency maps as follows: wi ¼

Si , N P Sn

8i ¼ 1, 2, . . . N

ð2:94Þ

n¼1

2.7.2.4

Detail Layer Fusion

Here, detail layers are scaled with the help of weight maps wn calculated from MSSS detection algorithm, and these scaled detail layers are combined to get the ﬁnal detail layer D as shown below: D¼

N X n¼1

wn Dn :

ð2:95Þ

2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .

91

Fig. 2.44 Visual display of ﬁnal base, detail layers and fused image. (a) Final detail layer, (b) ﬁnal base layer, (c) MSSSF fused image

2.7.2.5

Base Layer Fusion

Final base layer is generated by taking the average of base layers as B¼

2.7.2.6

N 1 X B : N n¼1 n

ð2:96Þ

Two-Scale Image Reconstruction

Fused image is synthesized by taking the linear combination of B and D. F ¼ B þ D:

ð2:97Þ

The reconstruction of the ﬂower dataset is displayed in Fig. 2.44.

2.7.3

Experimental Setup

In this section, image database, existing MFF methods, and free parameter analysis (effect on the performance of MSSSF for change of w) will be discussed.

2.7.3.1

Image Database

Experiments are conducted on several multi-focus image datasets. Results and analysis for 10 image datasets, viz. ﬂower, leopard, bookshelf, clock, air craft, pepsi, bottle, parachute, book, and ﬂower wage, are presented. These image datasets are shown in Fig. 2.38. These are available at https://sites.google.com/view/ durgaprasadbavirisetti/home.

92

2.7.3.2

2 Pixel-Level Image Fusion

Fusion Metrics

Assessment of a fusion method is really a challenging task when there is no ground truth available. Any image fusion algorithm can be assessed qualitatively by visual inspection and quantitatively by measuring the fusion metrics. Petrovic fusion metrics [46] are considered for the quantitative analysis.

2.7.3.3

Methods for Comparison

MSSSF method is compared with spatial domain MFF method BGS [6] and multiscale MFF methods DCT with variance measure (DCT + var) (Haghighat et al. [42]), DCT with variance measure and consistency veriﬁcation (DCT + var. + cv) (Haghighat et al. [55], DCHWT (Shreyamsha Kumar [22]), DWT with absolute measure (DWT + AB) [20], CBF (Shreyamsha Kumar [29]), and GFF [28]. Default parameter settings are adopted for all of these methods.

2.7.3.4

Effect of Free Parameters on the MSSSF Method

The MSSS method uses average ﬁlter to extract base and detail layer information from source images. Size of the average ﬁlter affects the performance of this algorithm. So, size of the average ﬁlter has to be tuned for the best performance. XY=F Petrovic metrics QXY=F , LXY=F , N XY=F , N k are plotted in Fig. 2.45 by averaging over 10 multi-focus datasets. It can be observed that w ¼ 5 is the best choice.

2.7.4

Results and Analysis

The aim of any MFF method is to obtain a properly focused image with less execution time. Performance of the MFF algorithm can be veriﬁed qualitatively by visual inspection, quantitatively using fusion metrics, and by measuring the computational time.

2.7.4.1

Qualitative Analysis

Qualitative analysis is presented for two multi-focus image datasets, namely ﬂower, book as shown in Fig. 2.38. Visual quality of the MSSSF along with various MFF methods is presented in Figs. 2.46 and 2.47. For these datasets, zoomed portion of a particular region of the fused image is also presented for in-depth qualitative analysis. In Figs. 2.46 and 2.47, subﬁgures (a)–(h) gives the visual display of our MSSSF, GFF, CBF, DCHWT, DWT + AB, BGS, DCT + var. + cv, and DCT + cv,

2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .

93

Fig. 2.45 Effect of w on the MSSSF method

respectively. The subﬁgures (i)–(p) illustrate the zoomed portions of (a)–(h), respectively. In Fig. 2.46, fused images for the ﬂower dataset are displayed. As emphasized in Fig. 2.46a with red rectangles, MSSSF is able to generate more focused regions (such as ﬂower wage and switch). For example, switch portions (Fig. 2.46m, n) of DWT + AB and BGS methods are blurred. Remaining methods (Fig. 2.46j–l, o, p) are visually good, but they are not able to provide more focused switch region. However, as shown in Fig. 2.46i, MSSSF method is able to get more sharpened switch region compared to remaining methods. Figure 2.47 displays the visual quality of the results on book dataset. Zoomed portions of various MFF algorithms for book dataset highlighted in Fig. 2.47a–h (with red rectangles) are displayed in Fig. 2.47i–p. Zoomed portions of GFF (Fig. 2.47j), CBF (Fig. 2.47k), DCHWT (Fig. 2.47l), DCT + var. + cv (Fig. 2.47o), and DCT + var. (Fig. 2.47p) are visually good. But, these methods are not able provide visually good images. Zoomed portions of DWT + AB (Fig. 2.47m) and BGS (Fig. 2.47n) are visually distorted. However, as shown in Fig. 2.47i, MSSSF is giving sharpened information about text, objects, and book present in the fused image. Hence, MSSSF integrates both foreground and background regions of source images in the fused image effectively compared to the state-of-the-art methods.

94

2 Pixel-Level Image Fusion

Fig. 2.46 Comparison of visual quality of fused images of various methods for a ﬂower dataset. (a) Proposed MSSSF, (b) GFF, (c) CBF, (d) DCHWT, (e) DWT + AB, (f) BGS, (g) DCT + var. + cv, (h) DCT + var. Subﬁgures (i)–(p) show the zoom version of switch portion of (a)–(h), respectively

2.7.4.2

Quantitative Analysis

It is difﬁcult to judge the performance of a fusion algorithm by visual inspection alone. Fusion algorithm has to be evaluated both qualitatively and quantitatively for better assessment. In the previous section, qualitative analysis has been performed. Here, quantitative analysis is conducted by evaluating the Petrovic fusion metrics XY=F (average QXY=F , LXY=F , N XY=F , N k metrics calculated over 10 datasets) of other MFF algorithms (GFF (Li et al. [28]), CBF (Shreyamsha Kumar [29]), BGS (Tian et al. [6]), DCT + var. (Haghighat et al. [42], DCT + var. + cv (Haghighat et al. [55], DCHWT (Shreyamsha Kumar [22], DWT + AB [20] along with the MSSSF algorithm. Bar chart comparison of fusion metrics for various MFF algorithms is shown in Fig. 2.48a–d. From this bar chart comparison, it is easy to observe that, in all fusion metrics, MSSSF got superior values.

2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .

95

Fig. 2.47 Comparison of visual quality of fused images of various methods for a book dataset. (a) Proposed MSSSF, (b) GFF, (c) CBF, (d) DCHWT, (e) DWT + AB, (f) BGS, (g) DCT + var. + cv, (h) DCT + var. Subﬁgures (i)–(p) show the zoom version of a particular focused region of (a)–(h), respectively

2.7.4.3

Computational Time

Computational time in seconds for various MFF methods is shown in Table 2.13. These experiments are performed on a computer with a 4 GB RAM and 2.2 GHz CPU. This time is calculated by averaging the run time of 10 multi-focus image datasets. As shown in the table, MSSSF computational time is less than BGS, DCHWT, DWT + AB, and CBF and more than that of DCT + var., DCT + var. + cv, and GFF methods. From the above results and analysis, one can conclude that MSSSF method is integrating more focused and sharpened regions of source images. MSSSF quantitative analysis against several state-of-the art MFF methods using Petrovic fusion metrics proves that it is outperforming the existing MFF methods. Its computational time is promising for the real-time implementation.

96

2 Pixel-Level Image Fusion

Fig. 2.48 Quantitative analysis of various MFF methods along with the MSSF (average fusion metrics over 10 image datasets are considered). (a) Fusion score QXY/F, (b) fusion loss LXY/F, (c) XY=F fusion artifacts NXY/F, (d) modiﬁed fusion artifacts N k

2.7.4.4

Summary

In this chapter, six new image fusion algorithms were introduced based on authors’ contribution after discussing the short comings of the state-of-the-art traditional pyramid and wavelet-based image fusion methods. New pyramid, discrete wavelet frame, optimal wavelet ﬁlter bank, and edge-preserving decomposition-based image fusion methods were introduced in Sects. 2.2–2.7 respectively. In Sect. 2.2, a new pyramid image fusion method based on pyramid decomposition was developed to avoid the disadvantage of poorly shift invariance due to wavelet and to make up for the shortcomings of pyramid image fusion of traditional pyramid structure in extracting texture and edge features. The MST approaches include the Laplacian pyramid, the ratio of low-pass pyramid, and the discrete wavelet transform (DWT) are shift-dependent. When there is a slight camera or object movement or when there is misregistration of the source images, the performance of those MST approaches will be degraded. To address this issue, a new image fusion method based on DWT was introduced in Sect. 2.3.

Time (s)

BGS 22.654

DCT + var 2.2423

DCT + var. + cv 2.5194

DCHWT 6.6002

DWT + AB 17.7984

Table 2.13 Average computational time comparison of various MFF methods along with the MSSF method CBF 66.8385

GFF 2.4904

MSSSF 2.8708

2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . . 97

98

2 Pixel-Level Image Fusion

Existing ﬁlter-bank design methods depends on biorthogonal ﬁlter banks with perfect reconstruction. Usually, these approaches do not introduce any errors themselves because of perfect reconstruction. However, in image fusion, information of the fused image is incomplete representation of the source images. Hence, small reconstruction errors introduced by ﬁlter banks do not necessary lead to worse fusion quality. In this method (Sect. 2.4), the perfect reconstruction condition in designing the optimal wavelet ﬁlter banks is relaxed to emphasize the overall fusion performance. In the ADF method (Sect. 2.5), a new fusion algorithm was developed using anisotropic diffusion and KL transform. Base and detail layers were extracted using the anisotropic diffusion. Base layers were averaged to get the ﬁnal base layer. Final detail layer was calculated by using KL transform. Finally, both base and detail layers were added to get the fused image. ADF method was evaluated qualitatively by visual inspection, quantitatively using Petrovic fusion metrics, and by measuring its computational time. ADF performance was compared with state-of-the-art MSD fusion methods. Results justify that this method can generate visually good fused images with best fusion metric values and appreciable computational time. In ADF method, base and detail layers were extracted using anisotropic diffusion. Even though ADF used for two-scale image decomposition process for the purpose of fusion, it has taken ten iterations (t ¼ 10) to achieve this decomposition process for better results. Hence, this process has some computational burden. To reduce the computational time further and improve the visual quality of fused images, we presented TIF algorithm (Sect. 2.6) based on two-scale decomposition using an average ﬁlter and visual saliency detection. In this, a new visual saliency detection algorithm was developed to extract visually signiﬁcant regions of source images. A new weight map construction process was developed based on visual saliencies. In the detail layers, this process was able to assign more weights to visually signiﬁcant information and less weights to visually insigniﬁcant information. Base layers were averaged to get the ﬁnal base layer. Fused image was obtained by combining the both ﬁnal base and detail layers. This algorithm was assessed with help of Petrovic fusion metrics. Its performance was compared with state-of-the-art MSD fusion techniques along with the ADF method. From the results and analysis, we understood that TIF method is able to generate visually good fused images with better fusion performance compared to that of remaining methods. Its computational time is more than pyramid based methods and less than that of remaining methods including the ADF method. In addition to multi-sensor fusion, we also developed a new fusion algorithm (MSSSF) (Sect. 2.7) for multi-focus images based on saliency detection and two-scale image decomposition. A new maximum symmetric surround (MSSS) saliency detection was explored for the purpose of fusion. This SD algorithm was able to highlight visually signiﬁcant regions. A new weight map construction based on MSSS saliency detection was implemented. This SD algorithm was able to identify focused and defocused regions of source images. Hence, we developed a new image fusion algorithm (MSSSF) which can integrate only visually signiﬁcant and focused regions of source images into a single image. This algorithm was

References

99

applied on various multi-focus images. Results reveal that MSSSF method is very reliable than that of the remaining MFF algorithms.

References 1. A.A. Goshtasby, S. Nikolov, Image fusion: advances in the state of the art. Inform. Fusion 2(8), 114–118 (2007) 2. J. Yonghong, Fusion of landsat TM and SAR images based on principal component analysis. Remote Sens. Technol. Appl. 13(1), 46–49 (2012) 3. N. Mitianoudis, T. Stathaki, Pixel-based and region-based image fusion schemes using ICA bases. Inform. Fusion 8(2), 131–142 (2007) 4. T.M. Tu, S.C. Su, H.C. Shyu, P.S. Huang, A new look at IHS-like image fusion methods. Inform. Fusion 2(3), 177–186 (2001) 5. W. Huang, Z. Jing, Evaluation of focus measures in multi-focus image fusion. Pattern Recogn. Lett. 28(4), 493–500 (2007) 6. J. Tian, L. Chen, L. Ma, W. Yu, Multi-focus image fusion using a bilateral gradient-based sharpness criterion. Optics Commun. 284(1), 80–87 (2011) 7. R. Shen, I. Cheng, J. Shi, A. Basu, Generalized random walks for fusion of multi-exposure images. IEEE Trans. Image Process. 20(12), 3634–3646 (2011) 8. M. Xu, H. Chen, P.K. Varshney, An image fusion approach based on Markov random ﬁelds. IEEE Trans. Geosci. Remote Sens. 49(12), 5116–5127 (2011) 9. A. Akerman III, Pyramidal techniques for multisensor fusion. Sensor Fusion V. Int. Soc. Optics Photon. 1828, 124–131 (1992) 10. P.J. Burt, A gradient pyramid basis for pattern-selective image fusion. Proc. SID 1992, 467–470 (1992) 11. P. Burt, E. Adelson, The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983) 12. A. Toet, A morphological pyramidal image decomposition. Pattern Recogn. Lett. 9(4), 255–261 (1989) 13. A. Toet, Image fusion by a ratio of low-pass pyramid. Pattern Recogn. Lett. 9(4), 245–253 (1989) 14. A. Toet, L.J. Van Ruyven, J.M. Valeton, Merging thermal and visual images by a contrast pyramid. Opt. Eng. 28(7), 287789 (1989) 15. H. Li, B.S. Manjunath, S.K. Mitra, Multisensor image fusion using the wavelet transform. Graph. Models Image Process. 57(3), 235–245 (1995) 16. T.A. Wilson, S.K. Rogers, M. Kabrisky, Perceptual-based image fusion for hyperspectral data. IEEE Trans. Geosci Remote Sens. 35(4), 1007–1017 (1997) 17. M. Choi, R.Y. Kim, M.G. Kim, The curvelet transform for image fusion. Int. Soc. Photogr. Remote Sens. 35(Part 88), 59–64 (2004) 18. B. Yang, S. Li, F. Sun, Image fusion using nonsubsampled contourlet transform, in Fourth International Conference on Image and Graphics (ICIG 2007), (IEEE, Piscataway, NJ, 2007), pp. 719–724 19. V.P.S. Naidu, Image fusion technique using multi-resolution singular value decomposition. Defence Sci. J. 61(5), 479–484 (2011) 20. J. Liang, Y. He, D. Liu, X. Zeng, Image fusion using higher order singular value decomposition. IEEE Trans. Image Process. 21(5), 2898–2909 (2012) 21. D. Looney, D.P. Mandic, Multiscale image fusion using complex extensions of EMD. IEEE Trans. Signal Process. 57(4), 1626–1630 (2009)

100

2 Pixel-Level Image Fusion

22. B.S. Kumar, Multifocus and multispectral image fusion based on pixel signiﬁcance using discrete cosine harmonic wavelet transform. Signal Image Video Process. 7(6), 1125–1143 (2013) 23. Q.G. Miao, C. Shi, P.F. Xu, M. Yang, Y.B. Shi, A novel algorithm of image fusion using shearlets. Optics Commun. 284(6), 1540–1547 (2011) 24. W. Gan, X. Wu, W. Wu, X. Yang, C. Ren, X. He, K. Liu, Infrared and visible image fusion with the use of multi-scale edge-preserving decomposition and guided image ﬁlter. Infrared Phys. Technol. 72, 37–51 (2015) 25. Y. Jiang, M. Wang, Image fusion using multiscale edge-preserving decomposition based on weighted least squares ﬁlter. IET Image Process. 8(3), 183–190 (2014) 26. G. Cui, H. Feng, Z. Xu, Q. Li, Y. Chen, Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Optics Commun. 341, 199–209 (2015) 27. J. Zhao, H. Feng, Z. Xu, Q. Li, T. Liu, Detail enhanced multi-source fusion using visual weight map extraction based on multi scale edge preserving decomposition. Optics Commun. 287, 45–52 (2013) 28. S. Li, X. Kang, J. Hu, Image fusion with guided ﬁltering. IEEE Trans. Image Process. 22(7), 2864–2875 (2013) 29. B.S. Kumar, Image fusion based on pixel signiﬁcance using cross bilateral ﬁlter. Signal Image Video Process. 9(5), 1193–1204 (2015) 30. Z. Zhang, R.S. Blum, A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application. Proc. IEEE 87(8), 1315–1326 (1999) 31. M. Unser, Texture classiﬁcation and segmentation using wavelet frames. IEEE Trans. Image Process. 4(11), 1549–1560 (1995) 32. R.S. Blum, R.J. Kozick, B.M. Sadler, An adaptive spatial diversity receiver for non-Gaussian interference and noise. IEEE Trans. Signal Process. 47(8), 2100–2111 (1999) 33. J. Yang, R.S. Blum, A statistical signal processing approach to image fusion for concealed weapon detection, in Proceedings. International Conference on Image Processing, vol. 1, (IEEE, Piscataway, NJ, 2002), pp. I–I 34. E. Diamant, Single-pixel information content, in Image Processing: Algorithms and Systems II, vol. 5014, (International Society for Optics and Photonics, Bellingham, WA, 2003), pp. 460–465 35. V.S. Petrović, C.S. Xydeas, Sensor noise effects on signal-level image fusion performance. Inform. Fusion 4(3), 167–183 (2003) 36. G. Qu, D. Zhang, P. Yan, Information measure for performance of image fusion. Electr. Lett. 38 (7), 313–315 (2002) 37. L.J. Chipman, T.M. Orr, L.N. Graham, Wavelets and image fusion, in Proceedings, International Conference on Image Processing, vol. 3, (IEEE, Piscataway, NJ, 1995), pp. 248–251 38. T.Q. Nguyen, P.P. Vaidyanathan, Two-channel perfect-reconstruction FIR QMF structures which yield linear-phase analysis and synthesis ﬁlters. IEEE Trans. Acoust. Speech Signal Process. 37(5), 676–690 (1989) 39. B.R. Horng, A.N. Wilson, Lagrange multiplier approaches to the design of two-channel perfectreconstruction linear-phase FIR ﬁlter banks, in International Conference on Acoustics, Speech, and Signal Processing, (IEEE, Piscataway, NJ, 1990), pp. 1731–1734 40. M. Antonini, M. Barlaud, P. Mathieu, I. Daubechies, Image coding using wavelet transform. IEEE Trans. Image Process. 1(2), 205–220 (1992) 41. O. Rioul, Simple regularity criteria for subdivision schemes. SIAM J. Math. Anal. 23(6), 1544–1576 (1992) 42. M.B.A. Haghighat, A. Aghagolzadeh, H. Seyedarabi, Real-time fusion of multi-focus images for visual sensor networks, in 2010 6th Iranian Conference on Machine Vision and Image Processing, (IEEE, Piscataway, NJ, 2010), pp. 1–6

References

101

43. D.P. Bavirisetti, R. Dhuli, Multi-focus image fusion using multi-scale image decomposition and saliency detection, Ain Shams Eng. J. (2016) 44. C.H. Anderson, 4,718,104. 5. U.S. Patent (1988) 45. P. Perona, J. Malik, Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 629–639 (1990) 46. V. Petrovic, C. Xydeas, Objective image fusion performance characterisation, in Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, vol. 2, (IEEE, Piscataway, NJ, 2005), pp. 1866–1871 47. O. Rockinger, Image sequence fusion using a shift-invariant wavelet transform, in Proceedings of International Conference on Image processing, vol. 3, (IEEE, Piscataway, NJ, 1997), pp. 288–291 48. R. Achanta, S. Süsstrunk, Saliency detection using maximum symmetric surround, in 2010 IEEE International Conference on Image Processing, (IEEE, Piscataway, NJ, 2010), pp. 2653–2656 49. D.P. Bavirisetti, R. Dhuli, Multi-focus image fusion using maximum symmetric surround saliency detection. ELCVIA: Electr. Lett. Comput. Vision Image Anal. 14(2), 58–73 (2015) 50. R. Achanta, S. Hemami, F. Estrada, S. Süsstrunk, Frequency-tuned salient region detection, in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2009), vol. CONF, (2009), pp. 1597–1604 51. S. Frintrop, M. Klodt, E. Rome, A real-time visual attention system using integral images, in International Conference on Computer Vision Systems: Proceedings, (2007) 52. J. Harel, C. Koch, P. Perona, Graph-based visual saliency, in Advances in Neural Information Processing Systems, (2007), pp. 545–552 53. L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1254–1259 (1998) 54. Y.F. Ma, H.J. Zhang, Contrast-based image attention analysis by using fuzzy growing, in Proceedings of the eleventh ACM international conference on Multimedia, (ACM, New York, 2003), pp. 374–381 55. M.B.A. Haghighat, A. Aghagolzadeh, H. Seyedarabi, Multi-focus image fusion for visual sensor networks in DCT domain. Comput. Electr. Eng. 37(5), 789–797 (2011)

Chapter 3

Feature-Level Image Fusion

Abstract Fusion at feature level is also referred to as an intermediate level of image fusion. This process can represent and analyze the multi-sensor data for realizing classiﬁcation and recognition tasks. The multi-resolution techniques are important, theoretical, and analytical tools of signal and image processing. These techniques mainly include pyramid and wavelet transform-based methods. The present chapter studies feature-level image fusion methods based on pyramid decomposition. New feature-level image fusion algorithms in pyramid domain are introduced based on multi-resolution gradient, texture, and fuzzy region features.

3.1

Introduction

Feature-level fusion is an intermediate-level fusion process that uses the feature information extracted from the original information of each source for comprehensive analysis and processing. Generally, the extracted feature information should be a sufﬁcient representation or statistics of the original information, so that the multisource information is classiﬁed, collected, and synthesized. The idea of feature-level fusion is to ﬁrst extract useful features from the original multi-sensor imaging and then merge these features into new feature vectors for further processing. Typical image features include edges, corners, lines, and the like. Compared with pixel-level fusion, feature-level fusion has more information loss and less computation. Featurelevel fusion is a fusion of information at the intermediate level. It not only retains a sufﬁcient number of important information but also compresses the information, which is beneﬁcial to real-time processing. At the intermediate level between feature-level fusion and decision level fusion, feature-level fusion is generally regarded as the next level of fusion of decision-level fusion. Although a variety of sensors (such as forward-looking infrared, laser imaging radar, synthetic aperture radar, etc.) that can obtain high-quality images

© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 G. Xiao et al., Image Fusion, https://doi.org/10.1007/978-981-15-4867-3_3

103

104

3 Feature-Level Image Fusion

have been developed, and more advanced research results have been obtained in pixel-level fusion algorithms, in the case of large differences in characteristics and limited data transmission bandwidth, the best practice is to extract the useful features of the image and perform feature-level image fusion. Some of the feature-level fusion methods are pixel-level and decision-level methods, but can also be used for feature-level fusion. Feature-level image fusion is mainly based on the comprehensive analysis and processing of scene feature information such as edge shape, contour, direction, area, distance obtained after pretreatment and feature extraction. Not only this kind of information fusion at the intermediate level retains a sufﬁcient number of important information, but also the information is compressed and is beneﬁcial to real-time processing. Therefore, the study of feature fusion has been focused on the extraction of the information contained in the image and the optimization of the fusion rule of the extracted feature information. Since feature extraction in a complex context is a difﬁcult and hot area in the ﬁeld of image comprehension, the preliminary results of image feature fusion are most successfully used in fusion of simple background, such as face recognition and character recognition. In the ﬁeld of remote sensing application, the research on the fusion of image features is basically limited to the assembly line, the airport extraction, fusion, and recognition, or the extraction and recognition of planar features with typical features. A feature image fusion system consists of four parts: information acquisition, information processing, feature extraction, and information fusion. This process is consistent with human cognition of things. Based on the obtained target characteristics, a sample bank of all kinds of target characteristics is established. Based on this, the model samples are preprocessed, and feature extraction highlights some useful information and suppresses useless information, thereby forming distinctive features that do not vary with distance, delay, and direction. Invariant feature extraction and fusion algorithms will be the focus of feature information fusion. Feature extraction is very important for the design of pattern recognition system. The feature extraction can effectively compress the information of the target model, highlighting the structural differences among the target models, reducing the dimension of the model space and the size of the target template library, so as to improve the generalization ability of the recognition system and reduce the calculation of the system quantity, improve the realtime nature of their work. McMichael D. [1] used the information fusion methods in the early stage to detect the edge of the image and perform feature extraction, and then used the neural network to carry out multi-source image fusion. Jocelyn Chanussot feature-level neural network-based morphological image fusion method and hierarchical neural network feature-level fusion method are suitable for road network detection and feature extraction. Subsequently, Heene G. proposed a method of multi-source image fusion method for coastline detection; Pigeon L. carried out knowledgebased multi-source image fusion method. Based on IHS color space transform and wavelet multi-resolution analysis, Liu Zhe deﬁnes multiple eigenvalues based on the features of high frequency wavelet

3.1 Introduction

105

coefﬁcients of the image and uses the eigenvalue product as a basis to propose a new image fusion algorithm. Geng Bo Ying proposed an edge image fusion method based on multi-resolution wavelet analysis and Gaussian Markov random ﬁeld theory. Based on the multiresolution wavelet analysis of the image, a set of statistical parameters is extracted respectively by using the method of regression analysis based on the Gaussian Markov random ﬁeld theory in the corresponding regions of different images. These parameters represent the local structural features of the image. After calculating the similarity measure, the similarity matrix of the input image and its features are input to generate the fused edge image. Jiang Xiao Yu aiming at the general law of feature-level image fusion, the multiresolution segmentation of the image is realized by using wavelet transform algorithm, the multi-resolution feature vector of the target is extracted, the information fusion of CCD image and the thermal image is completed at the feature-level classiﬁcation. Dennis uses multi-layer neural network sensors for feature-level fusion and classiﬁcation, and its classiﬁcation results are more accurate than K nearest-neighbor classiﬁcation and nonparametric Bayesian classiﬁcation. Anne H. et al. used the gray level correlation matrix to extract and fuse the statistical features of the texture information of SAR images, and then classiﬁed the features and fused the feature levels of the multi-sensor data for sampling immediately, and applied it to the detection of buried mines. It shows that the detection rate of feature-level fusion is higher than that of single-sensor detection and decisionlevel fusion. Yu Xiu lan, Qian Guo hui, Jia Xiao guang, et al. aiming at feature classiﬁcation fusion of TM and SAR image information for feature classiﬁcation, an iterative classiﬁcation method based on Markov random ﬁeld and BP neural network is proposed. Li Jun, Lin Zong, et al. used multi-resolution wavelet decomposition for data fusion between high-resolution and multispectral images. The fusion of subset data at different scales was carried out by using the maximum and variance of regional variance, feature-level remote sensing image data fusion of the corresponding baseband data. Chen Xiaozhong, Sun Huayan, et al. use the inherent multi-scale features of wavelet transform to detect edge features of different scales and different precisions, and then use them to fuse images. Li Qin Shuang, Chen Dong Lin, and others proposed a spectral feature fusion method based on spectral feature knowledge. In addition, Gao Xiu Mei, et al. use the fusion of face recognition, Ju Yan, and other characteristics of the integration of handwritten Chinese character recognition research. This chapter introduces four parts: the ﬁrst part brieﬂy describes the concept of feature-level fusion; the second part introduces the multi-scale image fusion method based on gradient features from the perspective of multi-resolution transformation; the third part introduces the joint texture and gradient features. In the fourth part,

106

3 Feature-Level Image Fusion

based on the fusion rules of image fusion, an image fusion method based on fuzzy region features is introduced.

3.2

Fusion Based on Grads Texture Characteristic

The gradient feature of the image reﬂects a transform coefﬁcient feature of the image. This section describes a multi-scale image fusion method based on gradient features. First, the gradient multi-scale transformation of the image is introduced, and then the image fusion strategy based on this method is introduced.

3.2.1

Multi-Scale Transformation Method Based on Gradient Features

The gradient of the image is calculated at each pixel. Each pixel reﬂects the gradient in four directions, namely horizontal, vertical, and two diagonal directions. In order to reﬂect these gradient features of the image at each scale of the image, a special connection must be established between this gradient operator and the traditional Laplace pyramid multi-scale transformation method. The Laplace pyramid is a set of band-pass sequence images that can be obtained by calculating the difference between adjacent levels of a Gaussian pyramid. That is Ln ¼ G N Li ¼ Gi EXPANDðGlþ1 Þ ¼ Gl Glþ1," ,

ð3:1Þ 0lN

ð3:2Þ

Suppose the matrix G0 represents the source image and the image G1 obtained after the low-pass ﬁltering is sampled at intervals, the length and width of the G1 are only half of G0; and the G1 is low-pass ﬁltered to obtain G2, so that each stage of the Gaussian pyramid is a repeated matrix, that is, a sequence of images reduced in size. Inter-level operation can be expressed as function REDUCE Gl ¼ REDUCEðGl1 Þ

ð3:3Þ

That is, the point (i, j) on the ﬁrst (1 l N, N is the total level of multi-scale expansion of the pyramid) level image, 0 i Cl, 0 j Rl (Cl, multi-scale level l image size), then

3.2 Fusion Based on Grads Texture Characteristic

Gl ði, jÞ ¼

2 2 X X

107

ωðm, nÞGl1 ð2i þ m, 2j þ nÞ,

ð3:4Þ

m¼2 n¼2

where ω(m, n) is a Gaussian template, usually ω(m, n) ¼ ω(m)ω(n), the ω functions are symmetric and normalized. It can be seen that the REDUCE function is equivalent to convolving the source image with a Gaussian template, followed by interval sampling. The size of two adjacent pyramid images is reduced by a multiple of 1/4, and the band is gradually increased, so Gaussian can be regarded as a multiresolution low-pass ﬁlter. The function EXPAND is the inverse operation of the function REDUCE, which is used to expand the image of a certain level in the Gaussian pyramid to the size of the previous image by interpolation. Let Gl, k denote the image obtained after EXPAND operation on Gl for k (0 k 1), then there are

Gl,k ði, jÞ ¼ 4

G1,0 ¼ G1

ð3:5Þ

Gl,k ¼ EXPANDðGl,k1 Þ

ð3:6Þ

2 2 X X

ωðm, nÞGl,k1

m¼2 n¼2

iþm jþn , 2 2

ð3:7Þ

After a series of EXPAND and subtraction operations, get a set of band pass images that form the Laplace pyramid. The Laplace pyramid can completely represent a source image, G0 can be accurately reconstructed by the inverse process of constructing a pyramid, and the reconstructed image is unique. Deﬁne GN ¼ LN, perform an EXPAND operation on GN, add GN + 1 to LN 1, perform an EXPAND operation on GN + 1, and add GN + 2 to LN 2. This continues until G0 is restored. Gradient pyramid decomposition can be obtained by performing gradient direction ﬁltering on all the images of the Gaussian pyramid (except the highest level) Dlk ¼ dk ðGl þ w Gl Þ,

0 l N,

k ¼ 1, 2, 3, 4,

ð3:8Þ

where is the convolution operation, Dlk is the k-th gradient pyramid image of the ﬁrst level, Gl is the l-th level image of the Gaussian pyramid, and dk is the k-th gradient ﬁlter operator, deﬁned as d1 ¼ ½1 1,

1 0 d2 ¼ pﬃﬃﬃ 2 1

1 , 0

1 d3 ¼ , 1

1 d4 ¼ 0

0 1

ð3:9Þ

where w is the weight function, to satisfy ω ¼ w w, ω is the Gaussian template. After the gradient direction ﬁltering of the images of the Gaussian pyramid, four decomposition images of horizontal, vertical, and two diagonal directions are obtained at each level.

108

3 Feature-Level Image Fusion

Gradient pyramids need to be transformed into Laplace pyramids to reconstruct the image. Each Dlk is transformed into a corresponding second-order partial derivative pyramid (direction pyramid), namely 1 * Llk ¼ d k Dlk 8

ð3:10Þ

The pyramid of all directions forms the Laplace pyramid Ll Ll ¼

4 X * Llk

ð3:11Þ

k¼1

3.2.2

Multi-Scale Transformation Based on Gradient Feature Fusion Strategy

The concrete integration process includes six steps: 1. Enter the original image x1(n) and x2(n) in the spatial registration (two original image input as an example). 2. According to the given gradient ﬁlter di (i ¼ 1,. . ., 4), a multi-resolution decomposition algorithm based on gradient features is established. 3. Multi-resolution pyramid decomposition based on joint texture and gradient features on the original image. 4. After the image is decomposed into multi-resolution forms based on texture and edge, the fusion method adopts the strategy of fusion based on the similarity measure and saliency measure. The marker multi-resolution indicates that the image of the k-th layer in the i direction is the signal Lki. The methods for calculating the similarity and signiﬁcance measures are as follows: Each pixel aimed at the starting point to open up a 3 3 small window, the window template coefﬁcient was taken as 2

1 6 α ¼ 41 1 The signiﬁcance measure is

3 1 1 7 1 8 15 16 1 1

ð3:12Þ

3.2 Fusion Based on Grads Texture Characteristic

Sðm, n, k, iÞ ¼

1 1 X X

109 !

αðs, t ÞL ki ðm þ s, n þ t, k, iÞ2

ð3:13Þ

s¼1 t¼1

Similarity measures are 2 M AB ðm, n, k, iÞ ¼

1 1 P P s¼1 t¼1

!A

!B

αðs, t ÞL ki ðm þ s, n þ t, k, iÞL ki ðm þ s, n þ t, k, iÞ S2A ðm, n, k, iÞ þ S2B ðm, n, k, iÞ ð3:14Þ

Set a threshold β (0 β 1), if the similarity measure MAB β, then 1 1 1 M AB ωA ¼ 2 2 1β

ω B ¼ 1 ωA

If the similarity measure MAB < β, then

ωA ¼ 1 if SA > SB ωA ¼ 0

else

ω B ¼ 1 ωA

ð3:15Þ

Finally, the fusion strategy !F L ki ðm, n, k, iÞ

!A

!B

¼ ωA ðm, n, k, iÞL ki ðm, n, k, iÞ þ ωB ðm, n, k, iÞL ki ðm, n, k, iÞ

ð3:16Þ

5. In order to avoid the situation that a certain point and its neighborhood come from different input original images, we verify the consistency of each direction image !F L ki ðm, n, k, iÞ after the fusion. The so-called consistency is that if a point from the image A and most of its neighborhood from the image B, then the point will be changed to the corresponding value of the image B. !F

6. The fused multi-resolution expanded image L ki ðm, n, k, iÞ is obtained by multiresolution inverse transformation based on gradient features. Speciﬁc examples of fusion and fusion evaluation will be introduced in the next section.

110

3.3

3 Feature-Level Image Fusion

Fusion Based on United Texture and Gradient Characteristics

Texture is a very important feature in many images, such as SAR images. For example, most aeronautical and satellite remote sensing images, medical microscopic images, artiﬁcial seismic proﬁling images from petroleum geophysical prospecting, etc. can be thought of as consisting of different types of textures, and therefore, the study of textures is image processing in the ﬁeld of important theoretical research topics and has a wide range of applications. There are three main signs of texture: (1) some local sequence repeats over a larger area than the sequence; (2) the sequence is composed of non-random basic elements; and (3) a uniform body of parts, approximately the same size of the structure anywhere in the texture area. The basic part of the series is often referred to as texture primitives. It is also thought that textures are arranged by texture primitives according to certain laws or only some statistical laws. The former is called deterministic texture, and the latter is called random texture. Texture description methods can be divided into three categories: statistical methods, structural methods, and ﬁltering methods. The Laws texture extraction method belongs to a method of extracting features by using ﬁltering analysis. The texture ﬁlter uses ﬁve extracted texture kernel vectors of Laws [2]. l5 ¼ ½ 1 4 6 4 1 e5 ¼ ½ 1 2 0 2 1 s5 ¼ ½ 1

0

2

0 1

u5 ¼ ½ 1 2 r 5 ¼ ½ 1 4

0 6

2 4

ð3:17Þ

1 1

These kernel vectors are referred to as ﬂatness, edge measure, speckle measure, waviness, and graininess, respectively.

3.3.1

Joint Texture and Gradient Features of Multi-Scale Transformation Method

The pyramid decomposition method based on texture and edge is obtained by improving the gradient pyramid method. Considering the important role of texture features in image processing and image segmentation and classiﬁcation, the texture features are added to make the pyramid decomposition. The resolution transformation domain can include the texture information in the original image. Thus, providing a more comprehensive measure of information for further integration. Each layer of the Gaussian pyramid is ﬁltered using a texture extraction ﬁlter and an edge-gradient ﬁlter template to generate a series of texture and edge images. The

3.3 Fusion Based on United Texture and Gradient Characteristics

111

L0(x) L0(x)

L0

L1(x) L2(x)

L2(x)

L2(x)

L1(x)

L1(x)

Gi Fig. 3.1 Multi-scale pyramid decomposition method based on texture and gradient features

image pyramid can be completely represented by these tower signals. When designing pyramid algorithms that combine texture and gradient features, you should ﬁrst consider the reconstruction conditions. This is mainly because the selected texture, edge ﬁlter template, and pyramid decomposition used by the two ﬁlters must ensure that the part of the dashed box in Fig. 3.1 has the same role, so as to ﬁnd the reconstruction conditions. Reconstruction Conditions: When a set of coefﬁcients (ti, ci) can be found to make the system P P4 P29 satisfy ð1 w_ new Þ ¼ 25 i¼1 t i T i T i þ i¼1 ci Di Di ¼ i¼1 vi U i U i , the multiresolution transformation method of joint texture and gradient features can be reconstructed. Wherein, w_ new is a new kernel window function constructed by two convolutions of the kernel window function w_ in Eq. (3.3), w_ new ¼ ðw_ w_ Þ ðw_ w_ Þ; Ti is a texture extraction ﬁlter, which is obtained by ﬁltering each texture extraction ﬁlter in Eq. (3.10), Di is the edge-gradient ﬁlter, which is obtained by performing two convolutions through the four gradient ﬁlters of Eq. (3.5), Ti and Di are collectively referred to as a feature extraction ﬁlter, and ti and ci are coefﬁcients to be determined that satisfy the reconstruction conditions. Suppose vi ¼

ti

i 25

ci

25 < i 29

,

Ui ¼

Ti

i 25

Di

25 < i 29

ð3:18Þ

It is proved that the so-called reconstruction refers to that when a signal ω is equal e obtained by performing to the reconstructed signal

* *ω inverse transform in the * * e . Where transforming domain V 0 , V 1 , . . . , V k , . . . , V N1 , GN , that is, ω ¼ ω

* * * * V 0 , V 1 , . . . , V k , . . . , V N1 is the high-frequency part based on multi-scale expan* sion of texture and gradient features, V k ¼ ½V k1 , . . . , V ki , . . . , V k29 , and (GN) is its low-frequency part. It has been previously demonstrated that the FSD Laplacian pyramid can be approximately completely reconstructed [3], and here only the multi-resolution

112

3 Feature-Level Image Fusion

pyramid based on texture and gradient features can be constructed to make the ﬁlter resampling Laplacian Si pyramid form. V ki ¼ U i ½Gk þ w_ new Gk ð1 w_ new Þ ¼

29 X

ð3:19Þ

vi U i U i

ð3:20Þ

i¼1

Direction Laplacian pyramid: Lk ¼ ð1 w_ new w_ new ÞGk ¼ ð1 w_ new Þð1 þ w_ new ÞGk ¼

29 X i¼1

vi U i ½U i ð1 þ w_ new ÞGk ¼

29 X

vi U i V ki

ð3:21Þ

i¼1

Therefore, as long as a set of coefﬁcients are found for the equation to be true, the multi-resolution pyramid based on texture and gradient features can produce the form of the ﬁlter resampling Laplacian pyramid. However, it is important to note here that since the FSD Laplacian pyramid is approximately reconstructed, that is, there is some error in the reconstructed result from the original image, the multi-resolution pyramid based on texture, and the gradient features described in this book. It also belongs to the approximate reconstruction of multi-resolution transformation method. The steps of multi-resolution pyramid transformation based on texture and gradient features are as follows: 1. To establish Gaussian multi-resolution pyramid Similar to the Gaussian multi-resolution pyramid, the difference is that the nuclear ﬁlter window function ω needs to be replaced by ω_ new ω_ new . 2. To create multi-resolution pyramid based on texture and gradient features. The pyramid image representation is obtained by ﬁltering each layer of the Gaussian pyramid in 29 directions. V ki ¼ ½ð1 þ w_ new Þ Gk U i

ð3:22Þ

where U ¼ [T1, T2, . . ., T25, D1, . . ., D4] is a combination of a texture ﬁlter and a directional gradient ﬁlter, the subscript i is used to designate a certain feature extraction ﬁlter, Gk is a Gaussian pyramid decomposition result of the k-th layer, and w_ new a 9 9 ﬁlter kernel window function.

3.3 Fusion Based on United Texture and Gradient Characteristics

113

w_ new ¼ ðw_ w_ Þ ðw_ w_ Þ

ð3:23Þ

Directional gradient ﬁlter D is a given 5 5 ﬁlter obtained by convolving the ﬁlter twice (Table 3.1). 2

0

6 60 6 6 D1 ¼ 6 61 6 60 4 0 2 0 6 60 6 6 D3 ¼ 6 60 6 60 4 0

0

0

0

0

4

6

0

0

0 0

0 1

0 4 0

6

0 4 0

1

0

0

3

7 07 7 7 4 1 7 7, 7 0 07 5 0

0 0 3 0 0 7 0 07 7 7 0 07 7, 7 0 07 5 0 0

2

0

0

0

0

6 6 0 0 0 1 6 6 D2 ¼ 6 0 1:5 0 6 0 6 6 0 1 0 0 4 0:25 0 0 0 2 0:25 0 0 0 6 6 0 1 0 0 6 6 6 D4 ¼ 6 0 0 1:5 0 6 6 0 0 0 1 4 0

0

0

0

0:25

3

7 0 7 7 7 0 7 7, 7 0 7 5

0 0

3

7 0 7 7 7 0 7 7 7 0 7 5

0:25 ð3:24Þ

By performing a convolution operation on the Gaussian pyramid layers based on joint texture and gradient direction operators, a decomposition image containing 29 characteristic information is available at each level. Therefore, the pyramid decomposition algorithm based on the texture and the gradient direction can well represent the texture information and the edge information of the image under the multi-resolution transform domain. This gives the image processing, and fusion operation provides a good condition. 1. Reconstruction of multi-resolution pyramid based on texture and gradient features Reconstruction of the pyramid is equivalent to obtaining the undetermined coefﬁcient of Eq. (3.20). Pending coefﬁcients can be obtained using the singular value decomposition method in matrix theory. Singular value decomposition (SVD) method is a commonly used mathematic method of matrix inversion. Any M N (M N ) matrix A can be written as the product of three matrices A ¼ UWV T

ð3:25Þ

where W is a N N diagonal matrix whose diagonal elements are the singular values of A. U and V are M N and N N matrices, respectively, columns are orthogonal to each other, UTU ¼ VTV ¼ I and I is an identity matrix. Since V is a square matrix, it is also row-orthogonal, that is, VVT ¼ I.

Table 3.1 Texture ﬁlter structure

T 1 ¼ e5 lT5 =96 T 2 ¼ s5 lT5 =64

T 7 ¼ s5 eT5 =24 T 6 ¼ e5 eT5 =36

T 12 ¼ s5 sT5 =16 T 11 ¼ e5 sT5 =24

T 17 ¼ s5 uT5 =24 T 16 ¼ e5 uT5 =36

T 22 ¼ s5 r T5 =64 T 21 ¼ e5 r T5 =96

T 3 ¼ u5 lT5 =96

T 8 ¼ u5 eT5 =36

T 13 ¼ u5 sT5 =24

T 18 ¼ u5 uT5 =36

T 23 ¼ u5 r T5 =96

T 4 ¼ r 5 lT5 =256

T 9 ¼ r 5 eT5 =96

T 14 ¼ r 5 sT5 =64

T 19 ¼ r 5 uT5 =96

T 24 ¼ r 5 r T5 =256

T 5 ¼ l5 eT5 =96

T 10 ¼ l5 sT5 =64

T 15 ¼ l5 uT5 =96

T 20 ¼ l5 r T5 =256

T 25 ¼ l5 lT5 =256

114 3 Feature-Level Image Fusion

3.3 Fusion Based on United Texture and Gradient Characteristics

115

When M < N, singular value decomposition can also be performed. At this time, the singular value wj ( j ¼ M + 1, . . ., N ) on the diagonal of the matrix W is equal to zero. The columns corresponding to matrix U with wj are also zero. It can be shown that Eq. (3.15) holds true at any time and is almost unique. Equation (3.20) can be converted to w ¼* vΩ

!

ð3:26Þ

where w is ð1 w_ new Þ 1 81 vector of b transform; * v is a vector of coefﬁcients to be determined, 1 29 in size; and Ω is a 29 81 matrix vector for transforming Ui Ui by a texture and gradient ﬁlter. !

By a singular value, decomposition can be written as Ω, and substituting Ω ¼ USVT into Eq. (3.26), we get

* v ¼* w VS1 U T

ð3:27Þ

where S1 is the pseudo inverse of S, so that the undetermined coefﬁcient can be obtained as 2

0:5625, 6 0:0808, 6 * v¼6 4 0:5625, 0:0478,

0:3750,

0:5625,

0:4767,

0:5625,

0:3164,

0:2109,

0:2125, 0:0808, 0:2125,

0:3750, 0:2109, 0:2445,

0:2109, 0:3164, 0,

0:0243, 0:2125, 0:0024,

0:2109, 0:0477, 0:0133,

0:1417, 0:2109, 0:0240,

...

3

... 7 7 7 ... 5 0:0133 ð3:28Þ

Reconstruction of the pyramid and the directional gradient pyramid reconstruction methods are basically the same. Pyramid reconstructions based on texture and gradient features are relatively complex, and the directional Laplacian pyramid and Filter-Subtract-Decimated (FSD) Laplacian pyramid images are constructed as intermediate results. Deﬁning the direction, the Laplacian pyramid is !

1 L ki ¼ U i GPki , 8

! L ki

ð3:29Þ

is the Laplacian pyramid image of the i-th feature of the k-th layer. Directions Laplace pyramid can be converted into FSD Laplace pyramid through accumulation, P ! to form FSD Laplacian pyramid, Lk ¼ 4l¼1 L kl convert Laplacian pyramid to Laplacian pyramid image, LPk ¼ [1 ω] Lk Si pyramid algorithm reconstruction to get the image, where ω ¼ ω_ new ω_ new .

116

3.3.2

3 Feature-Level Image Fusion

Multi-Scale Image Fusion Method Based on Gradient Feature

For multi-focus image fusion, we obtain the standard reference fusion image of multi-focus image by artiﬁcial shear splicing, and use the following two evaluation criteria to objectively determine the merits of the fusion results. 1. Root mean square error of standard reference fusion image and fusion image vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u M X N u 1 X RSME ¼ t F 0 ði, jÞ F ði, jÞÞ2 M N i¼1 J¼1

ð3:30Þ

where M, N is the number of rows and columns of the image; F0(i, j) is the pixel gray value of the algorithm fusion result at (i, j); and F(i, j) is the pixel gray value of the standard reference fusion image at (i, j). The smaller the p value, the better the fusion result. Conversely, the larger the p value, the poorer the fusion result. 2. The common information between the standard reference fusion image and the fused image MI ¼

LN X LN X i¼1

j¼1

hR,X ði, jÞ ln

hR,X ði, jÞ hR ðiÞhX ð jÞ

ð3:31Þ

where hR, X(i, j) is the joint probability f when the pixel gray in image R is i and the pixel with the same name in image X is j; hR(i), hZ( j) is the pixel gray value of image R (or Z ); and LN is the gray level. In general, the larger the MI, the more common information the two have. 1. For different sensor image fusion such as infrared and visible light, it is more difﬁcult to evaluate the fusion results quantitatively. We cannot ﬁnd an ideal fusion result as a reference image to measure the fusion result. Here, in order to be able to evaluate the fusion results objectively, two objective measures (Object Measure) are used: one is objective evaluation measure of image fusion proposed by Xydeas and Petrovic [4] in 2000, and for convenience, we call the mutual information of the edge; and another by the Chinese scholar Qu [5] and other information proposed measure, we call the pixel mutual information. Objective evaluation index of mutual information of edge measures how much edge information of “inherited” input image in fusion image. Based on the edge extraction of input image and fusion image, this method calculates the amount of edge information stored and uses the weighted amount of edge information as a measure to evaluate the fusion result. Therefore, the larger the mutual information of edge, the more edge information. Speciﬁc steps are as follows: Calculate the edge feature amplitude and phase image of the original image A, B and the fused image F.

3.3 Fusion Based on United Texture and Gradient Characteristics

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ sxA ðn, mÞ2 þ syA ðn, mÞ2 y s ðn, mÞ αA ðn, mÞ ¼ tan 1 Ax sA ðn, mÞ

gA ðn, mÞ ¼

117

ð3:32Þ ð3:33Þ

The above is the A-image to get the edge of the amplitude and phase images of the formula, sxA ðn, mÞ and syA ðn, mÞ, the application of Sobel operator to ﬁlter the horizontal and vertical results. 2. Calculate the relative amplitude value GAF(m, n) and phase value AAF(m, n) of the original image A and the fused image F, and the relative amplitude values and phase values of the B image and the fused image F.

GAF ðm, nÞ ¼

8 gF ðm, nÞ > > < g ðm, nÞ if

gA ðm, nÞ > gF ðm, nÞ

A

> g ðm, nÞ > : A gF ðm, nÞ

AAF ðm, nÞ ¼ 1

else

jαA ðm, nÞ αF ðm, nÞj π=2

ð3:34Þ

3. Calculate edge amplitude and phase retention value QAF g ðm, nÞ ¼ QAF α ðm, nÞ ¼

Γg 1þe

K g ðGAF ðn,mÞσ g Þ

1þe

Γα K α ðAAF ðn,mÞσ α Þ

ð3:35Þ ð3:36Þ

AF QAF g ðm, nÞ and Qα ðm, nÞ describe the marginal amplitude and phase reservation between A and F images, respectively. Γg, Kg, σ g and Γα, Kα, σ α are the adjustable parameters. 4. Calculate edge information retention

AF QAF ðm, nÞ ¼ QAF g ðm, nÞQα ðm, nÞ:

ð3:37Þ

118

3 Feature-Level Image Fusion

5. Finally get the objective evaluation of mutual information edge measures N P M P AB=F QF ðm, nÞ

¼

QAF ðn, mÞwA ðn, mÞ þ QBF ðn, mÞwB ðn, mÞ

n¼1 m¼1

ð3:38Þ

N P M P

wA ðn, mÞ þ wB ðn, mÞ

i¼1 j¼1

where wA(n, m) ¼ |gA(n, m)|L, wB(n, m) ¼ |gB(n, m)|L. In order to measure the performance of image fusion, according to the deﬁnition of Petrovic V., this book takes all the parameters as a ﬁxed value Γg ¼ 0:9994,

K g ¼ 15,

σ g ¼ 0:5,

Γα ¼ 0:9879,

K α ¼ 22,

σ α ¼ 0:8:

The objective evaluation index of mutual information of pixels adopts the mutual information as the evaluation index to quantitatively evaluate the overall effect of the fusion image. Mutual information describes the similarity of the information contained in images. For the input images A and B and the resulting fused image F, mutual information of F and A and B is calculated separately, and how much of the input image information contained in the fused image can be obtained MIFA ¼

L X L X i¼1

MIFB ¼

hF,A ði, jÞ hF ðiÞhA ð jÞ

ð3:39Þ

hF,B ði, jÞ ln

hF,B ði, jÞ hF ðiÞhB ð jÞ

ð3:40Þ

j¼1

L X L X i¼1

hF,A ði, jÞ ln

j¼1

In the formula, hF, A(i, j) is the joint probability that the pixel gray in the image is i and the pixel gray in the same position in the image is j; hF(i), hA( j) is the probability that the image pixel gray value is i; and L is the gray level. In general, the larger the MI, the more mutual information the two have. Similarly, the deﬁnition of a variable in Eq. (3.40) is consistent with the deﬁnition of a variable in equation. Then, the evaluation index of the fusion result can be deﬁned as MIAB F ¼ MIFA þ MIFB

ð3:41Þ

The above equation can reﬂect how much the fused image F contains the input image A, B information. Figure 3.2 shows the structure of the image fusion algorithm based on texture and gradient multi-resolution pyramid (with a single-layer decomposition as an example). Speciﬁc integration steps are as follows: 1. Register the original images x1(n) and x2(n) spatially (take the two input images as examples).

3.3 Fusion Based on United Texture and Gradient Characteristics

119

Image B

Image A Multi-scale transformation

L k,1 L k,6 L k,11

Lk,2 Lk,7 Lk,12

L k,3 L k,8 L k,13

L k,4 L k,9 L k,14

L k,5

Significan ce measure Similarity measure

Significan ce measure Similarity measure

L k,10

decision

L k,15

L k,16

Lk,17

L k,18

L k,19

L k,20

L k,21

Lk,22

L k,23

L k,24

L k,25

L k,26

Lk,27

L k,28

L k,29

operation Lk,1

Lk,2

L k,3

L k,4

L k,5

Lk,6

Lk,7

L k,8

L k,9

L k,10

Lk,11

Lk,12

L k,13

L k,14

L k,15

Lk,16

Lk,17

L k,18

L k,19

L k,20

Lk,21

Lk,22

L k,23

L k,24

L k,25

Lk,26

Lk,27

L k,28

L k,29

Multi-scale transformation

L k,1

L k,2

Lk,3

L k,4

L k,5

L k,6

L k,7

Lk,8

L k,9

L k,10

L k,11

L k,12

Lk,13

L k,14

L k,15

L k,16

L k,17

Lk,18

L k,19

L k,20

L k,21

L k,22

Lk,23

L k,24

L k,25

L k,26

L k,27

Lk,28

L k,29

Inverse transform Fusion image Fig. 3.2 Schematic diagram of image fusion based on joint texture and gradient feature multiresolution method

2. Construct the ﬁlter vector Ui (i ¼ 1, . . ., 29) from the given texture ﬁlter Ti (i ¼ 1, . . ., 25) and the directional gradient ﬁlter Di; and seek w_ new . 3. According to the reconstruction condition, using the singular value decomposition method, ﬁnd the coefﬁcient of undetermined coefﬁcient* v. 4. Multi-resolution pyramid decomposition based on joint texture and gradient features on the original image. After the image is decomposed into multi-resolution forms based on texture and edge, the fusion method adopts the strategy of fusion based on similarity measure and saliency measure. The multi-resolution ﬂag indicates that the image of the kth layer in the i direction is signal Lki. The methods for calculating the similarity and signiﬁcance measures are as follows: To each pixel as the starting point to open up a 3 3 small window, the window template coefﬁcient is taken as

120

3 Feature-Level Image Fusion

3 1 1 7 1 8 15 16 1 1

2

1 6 α ¼ 41 1

ð3:42Þ

Signiﬁcance measures are Sðm, n, k, iÞ ¼

1 1 X X

!

αðs, t ÞL ki ðm þ s, n þ t, k, iÞ2

ð3:43Þ

s¼1 t¼1

Similarity measures are 2 M AB ðm, n, k, iÞ ¼

1 1 P P s¼1 t¼1

!A

!B

αðs, t ÞL ki ðm þ s, n þ t, k, iÞL ki ðm þ s, n þ t, k, iÞ S2A ðm, n, k, iÞ þ S2B ðm, n, k, iÞ ð3:44Þ

Set a threshold β (0 β 1). If the similarity measure MAB β, then ωA ¼ 1 1 1 M AB ω B ¼ 1 ωA . 2 2 1β If the similarity measure MAB < β, then

ωA ¼ 1

if SA > SB

ωA ¼ 0

else

ω B ¼ 1 ωA

ð3:45Þ

Finally draw fusion strategy !F L ki ðm, n, k, iÞ

!A

!B

¼ ωA ðm, n, k, iÞL ki ðm, n, k, iÞ þ ωB ðm, n, k, iÞL ki ðm, n, k, iÞ

ð3:46Þ

5. In order to avoid the situation that the point of a certain point and its neighborhood come from different input original images respectively, we verify the !F consistency of each direction image L ki ðm, n, k, iÞ after fusion. The so-called consistency is that if a point is from the image A and most of its neighborhood from the image B, then the point will be changed to the corresponding value of the image B. !F 6. The fused multi-resolution expanded image L ki ðm, n, k, iÞ is obtained by multiresolution inverse transformation based on joint texture and gradient features. This section presents the results of the fusion of multi-focus images, infrared and visible images based on the combined texture and gradient features of the multiresolution image fusion method and is compared with the representative fusion method. These fusion methods include evaluation and comparison based on the

3.3 Fusion Based on United Texture and Gradient Characteristics

121

Fig. 3.3 (a) Texture feature extraction ﬁlter frequency response; (b) Gaussian directional gradient ﬁlter frequency response

Fig. 3.4 Original images and the fusion results. (a) Multi-focus original image 1; (b) multi-focus original image 2; (c) fusion results based on the LP method; (d) fusion results based on the FSD method; (e) fusion results based on the GP method; (f) fusion results based on the TGP method

Laplacian pyramid fusion method, the ﬁlter-subtract-decimation (FSD) Laplacian pyramid fusion method, and the gradient feature based multi-resolution pyramid fusion method. The ﬁlter used in the experiment is the Law’s texture feature extraction ﬁlter T1, T5, T8, T11 and the Gaussian direction gradient ﬁlter D1, D2, D3, D4. Figure 3.3 shows the frequency responses of the four texture feature extraction ﬁlters and the four sets of Gaussian direction gradient ﬁlters, which uses 3-layer decomposition. Figure 3.4a, b show an image with two clocks, respectively. The two clocks are located at different distances and one of the two images has a focused focus.

122

3 Feature-Level Image Fusion

Table 3.2 Image fusion results of index evaluation

Image fusion method Method based on the LP Method based on the GP Method based on the FSD Method based on the TGP

RSME 9.8688 10.6518 10.6619 9.6868

MI 2.6611 2.3352 2.3568 2.6613

Table 3.3 Image fusion results of index evaluation

Image fusion method Based on the LP method Based on the GP method Based on the FSD method Based on the TGP method

EMI 0.4344 0.4189 0.4179 0.4930

PMI 0.6456 0.6411 0.6427 0.6599

Figure 3.4c shows the fusion results obtained by using the fusion method (LP) based on Laplace’s tower transformation. Its multi-resolution decomposition method adopts the Laplace pyramid transformation method, and the fusion rules adopt the fusion rules consistent with those in this book. Fusion rules consistent with those described in this chapter are used to determine which multi-resolution method is more suitable for image fusion system. Figure 3.4d adopts the tower transformation fusion method based on the Laplace of ﬁlter subtract and decimate. The fusion rules are consistent with the method in this book, which is abbreviated as FSD in Table 3.2. Figure 3.4e shows the fusion results obtained by using the fusion method of directional gradient tower type transformation. The fusion rules are consistent with the method in this book, which is abbreviated as GP in Table 3.2. Figure 3.4f shows the result of fusion based on the multi-scale transform image fusion method with joint texture and gradient features, which is abbreviated as TGP in Table 3.2 which gives visually good fused result than rest of. From the fusion results, we can clearly notice that both the fusion method and the LP-based fusion method mentioned in this chapter are better than the FSD-based fusion method and the GP-based fusion method. Especially in the upper left part of the back clock, both the FSD-based method and the GP-based method show signiﬁcant distortion of the fused image. The proposed fusion method can get a better fusion effect. The results of the evaluation from Table 3.3 are consistent with our intuitive qualitative assessment. Figure 3.5a, b show the infrared and visible images of the same scene, respectively. Figure 3.5c–e show the results of fusion based on LP-based fusion method, DWT-based fusion method, and DWF-based fusion method, respectively. Figure 3.5f shows the result of fusion based on TGP fusion method. Table 3.3 shows the objective evaluation of the fusion results. The infrared and visible image fusion can clearly observe the speciﬁc position of the target, which is convenient for the computer to detect the target or manual detection.

3.3 Fusion Based on United Texture and Gradient Characteristics

123

Fig. 3.5 Original images and the fusion results. (a) Infrared images; (b) visible light images; (c) fusion results based on the LP method; (d) fusion results based on the FSD method; (e) fusion results based on the GP method; (f) fusion results based on the TGP method

124

3.4

3 Feature-Level Image Fusion

Fusion Algorithm Based on Fuzzy Regional Characteristics

We know that in the image fusion method based on multi-resolution decomposition, the fusion rule has a direct impact on the speed and quality of the fusion image, so the fusion rule is a very important part of the image fusion method. In 1984, Burt [6] proposed a fusion rule based on pixel selection. Based on the decomposition of the original image into different resolution images, the gray value of the pixel with the largest absolute value was selected as the pixel gray value after fusion. This is based on the fact that pixels with larger grayscale values contain more information in different resolution images. For example, pixels with larger coefﬁcient values contain edges, lines, and area boundaries in the image. While improving the contrast pyramid, Zhou [7] also improved the fusion rules based on pixel selection and improved the image fusion quality. Petrovic and Xydeas [4] proposed a pixel-selection fusion rule that considers the correlation of each image within and between the decomposition layers. The selection of pixels for multiple images in the decomposition layer is not a separate selection like the previous fusion rule, but considers the interrelationship between each image within the layer and each image between the layers. When Pu Tian [27] uses wavelet transform to perform image fusion, based on the characteristics of the human visual system sensitive to local contrast, a pixel-based fusion rule based on contrast is adopted. However, the pixel-based fusion selection is only a fusion rule that uses a single pixel as a fusion object. It does not consider the correlation between adjacent pixels of the image. The fusion result is not ideal. Considering the correlation between adjacent pixels of the image, Burt and Kolczynski [8] proposed a weighted average fusion rule based on the selection of the regional characteristics in 1993, linking the pixel gray value fusion selection with the window region in which it is located. According to the matching degree of the window area, different fusion rule operations are performed; the matching degree is greatly different, and the fusion point pixel values are directly selected according to the energy value; the difference in matching degree is small and the weighted average rule operation is used. In the fusion rule proposed by Li et al. [9], they use the i maximum gray value in the selected window area as the fused pixel value; usually the maximum gray value in the window area is selected as the fused pixel value, but taking into account the inﬂuence of noise can also be selected by fusion selection in the case of i ¼ 2 or i ¼ 3. While choosing the gray value of the pixel, this fusion rule also considers the correlation of pixels in the window area. Koren et al. [10] proposed a fusion rule based on the local direction energy. The local directional energy is obtained from the integral pairs of the directional ﬁlter. This rule is based on the characteristic that human vision is sensitive to energy in the largest local direction.

3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics

125

Chibani and Houacine [11] determine the fusion pixel selection by calculating the number of absolute values of the pixels in the corresponding window area of the input original image in its fusion rule. The fusion rule based on the window region reduces the erroneous selection of fused pixels by considering the correlation of adjacent pixels. The fusion effect is improved. Zhang et al. [12] proposed a region-based fusion rule, in which each pixel in the image is regarded as part of an area or an edge, and image information such as regions and boundaries is used to guide fusion and selection. The fusion effect obtained by adopting this fusion rule is better, but this rule is more complicated than other fusion rules. For complex images, this rule is not easy to implement. Other fusion rules include statistical and estimation methods. This method considers image fusion from the perspective of signal and noise and is very suitable for image fusion with noise. From the past research on image fusion rules, we can see that the fusion rules have evolved from fusion based on single pixel point selection to fusion strategy based on window measurement, and then a region-based fusion method has emerged. This chapter analyzes the method proposed by Zhang and Blum [12], and Piella [13] proposes an area-based fusion rule and discusses some of the problems. Therefore, an image fusion method based on fuzzy region features is proposed. This method guarantees important areas. While there is a good consistency information with the background area, the sub-important area has signiﬁcant high-frequency characteristics. This avoids the contradiction caused by simply pursuing consistency in all areas of the image. The image fusion method based on fuzzy region features is based on multi-resolution analysis, and K-means clustering is performed according to the low-frequency components of each layer of the image, and the low-frequency image is decomposed into important regions, sub-important regions, and background regions; each region of the image is different. The attributes are blurred, and the fusion strategy of each partial region is determined according to the respective fuzzy membership degree of each region. Finally, the multi-resolution representation of the fused image is obtained, and then the multi-resolution inverse transform is performed to obtain the image fusion result. Experiments show that our image fusion method based on fuzzy region features presented has a good fusion result.

3.4.1

Area-Based Image Fusion Algorithm

Zhang Z. and G. Piella’s [13] region-based multi-resolution image fusion methods are basically similar, and they are the ﬁrst to multi-resolution decomposition of the image, after decomposition of the corresponding low-frequency parts of the image for image segmentation; sensor image fragmentation results can be merged to get a unique segmented image. Then based on this segmentation image, the fusion strategy of multi-resolution high-frequency components is determined, and the

126

3 Feature-Level Image Fusion

Fig. 3.6 The result of the two region-based fusion results. (a) Original goal, (b) Zhang Zhong method, (c) G. Piella method

lowest frequency component is selected according to the selection rule of the highfrequency component. The multi-resolution representation of the fused image is obtained, and the corresponding multi-resolution inverse transform (reconstruction) is ﬁnally performed to obtain the fusion result. Although their multi-resolution transform methods and image segmentation methods are different, the fusion criteria selection and fusion measure are basically similar in design, and the fusion structure is similar. Therefore, the fused images obtained by these methods have the problem of the consistency of regional features. That is, the local area of the fused image cannot completely reﬂect the distribution characteristics of the pixels inside the corresponding area of the original image. Since the regional signiﬁcance measure is obtained after each multi-scale expansion in each frequency band, the selection in each frequency band may be inconsistent. That is, instead of selecting all the frequency bands in the corresponding area of the same image, this area appears. The inconsistency, as shown in Fig. 3.6, decreases both the contrast and the regional consistency of the target area, thus affecting the overall characteristics of the target area. The purpose of image fusion is to obtain a single image from multiple images. This image should be able to reﬂect all the important information of the original image. If artiﬁcially selecting important features of an image is to be fused, the ﬁrst consideration is to select an important image region, then an important edge feature, and ﬁnally consider the fusion of pixel points. The purpose of doing so is often to preserve the consistency of important regions, that is, to obtain all the information of the region from the corresponding part of an image; for less important regions, the requirements of regional consistency are not strong, and you can choose those comparisons. The high-frequency information of the pixels that can reﬂect the “edge” feature is used as a fusion result. This shows that the consistency of the area is more important than the signiﬁcance of a single pixel. Therefore, the image fusion process should also have the feature of the “ﬁrst region after pixel.” In other

3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics

127

words, the important areas must be merged ﬁrst, and the pixels of the unimportant areas are merged. However, because the importance of this region is a vague and uncertain concept, the proposed region-based multi-resolution fusion method is also performed in a fuzzy space.

3.4.2

Image Fusion Method Based on Fuzzy Region Features

3.4.2.1

K-Means Algorithm Image Segmentation

During the application and research of images, people are often interested in certain parts of the image. These parts generally correspond to speciﬁc areas of the image that have unique properties and thus need to separate and extract these areas. Image segmentation refers to the technique of dividing an image into regions with speciﬁc features and extracting the objects of interest. Image segmentation is an essential part of image understanding and analysis. The area in the image refers to a connected set of pixels with consistent “meaningful” properties. The so-called “signiﬁcant” attributes depend on the speciﬁc conditions of the image to be analyzed, such as the color, grayscale, statistical properties, or texture properties of the neighborhood of the image. “Consistency” requires that each zone has the same or similar feature attributes. Image segmentation methods generally include threshold segmentation method, cluster segmentation method, statistical segmentation method, and regional growth method and separate merger method. The K-means segmentation algorithm belongs to one of the cluster segmentation methods. The K-means algorithm divides n vectors xj into c classes of Gi and ﬁnds the cluster centers of each class so that the objective function of the dissimilarity (or distance) index is minimized. When the metric between the vector xk of the ﬁrst class Gi and the corresponding cluster center ci is a Euclidean distance, the objective function can be deﬁned as J¼

c X i¼1

Here J i ¼

P k, xk 2Gi

Ji ¼

c X

X

i¼1

k, xk 2Gi

! kx k c i k

2

ð3:47Þ

kxk ci k2 is an objective function within the class Gi. The value of

Ji depends on the geometry of the Gi and the position of the ci. Obviously, the smaller the value of j, the better the clustering effect. The basic idea of the K-means algorithm: 1. First randomly select c vectors as the center of each class. 2. Let U be a c n two-dimensional membership matrix. If the l-th vector xj belongs to class I, the element uij in U is 1; otherwise, the element takes 0, which is

128

3 Feature-Level Image Fusion

( uij ¼

1, for each k 6¼ i,

2 2 if x j ci x j ck

0

else

ð3:48Þ

3. Calculate the value of the objective function formula (3.48) according to uij. If it is lower than a given minimum threshold or if the difference between two consecutive values is less than one parameter threshold, the operation stops. P 4. Update each cluster center according to uij: ci ¼ jG1i j k,xk 2Gi xk , where jGi j ¼ Pn j¼1 uij represents the number of elements in the class Gi. Then return to step (2).

3.4.2.2

Fuzzy Theory and Regional Feature Fuzziﬁcation

Randomness is only a kind of uncertainty in the real world. In addition to this, there is another kind of more general uncertainty, which is ambiguity. In order to portray and deal with this uncertainty, Zadeh et al. [14] conducted a lot of research on the representation and processing of vagueness from the perspective of set theory in 1965 and proposed fuzzy sets, membership functions, and linguistic variables. Concepts such as language truth and fuzzy inference have created a new mathematical branch of fuzzy mathematics, thus providing a new way for quantitative description and processing of fuzziness. The fuzzy method shows great advantages in classifying pixels and extracting features. The so-called ambiguity refers to the undifferentiation of the objective things in terms of their forms and generics. The root cause is the existence of a series of transitional states between similar things. They inﬁltrate and interpenetrate each other, so that there is no clear dividing line between them. Ambiguity is a characteristic of something in the objective world. It is essentially different from randomness. For randomness, the meaning of the thing itself is clear, it may or may not happen under certain conditions, and it cannot be predicted in advance. Therefore, a number on [0,1] indicates the occurrence of the event possibility. The nature of ambiguity is ambiguous. Whether a speciﬁc object meets a fuzzy concept cannot be clearly determined. Deﬁnition Let U be the domain, μA is a function that maps any PPPP to a value on [0,1], i.e., μA : U ! ½0, 1

u ! μ A ð uÞ

ð3:49Þ

Then μA is deﬁned as a membership function deﬁned on U. The set A formed by μA(u) (u 2 U ) is called a fuzzy set on U, and μA(u) is called the membership degree of u to A. From the above deﬁnition, it can be seen that the fuzzy set A is completely characterized by its membership function. The membership function μA maps each element u in U to a value μA(u) on [0, 1], indicating the degree to which the element

3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics

129

belongs to A, and a larger value means a higher degree of membership. When the μA(u) value is only 0 or 1, the fuzzy set A degenerates into a normal set, and the membership function degenerates into a characteristic function. When multiple sensors image a scene, the real scene area of each sensor image can be roughly divided into three types. These three areas can be divided by a certain target-sensitive sensor (such as an infrared imaging sensor), including important areas of the target, sub-important areas with rich edges or texture information, and background areas containing background information. From the concept in the above-mentioned fuzzy theory, the complete set of these three regions deﬁnes a regional fuzzy set A on the real scene U. Due to the sensitivity of the area-divided sensor to the target, we collectively call it the target sensor, and other imaging sensors are called background sensors. First, we must determine the fusion rule for each element in the regional fuzzy set A when it is fused. If the area is divided into important areas, the importance of this area of the target sensor image is stronger than the corresponding area of the background sensor image, and the method of fusion is to use all the multi-resolution coefﬁcients of this part of the target sensor image as the corresponding image after fusion. Regional section: If it is an important area, it means that the multi-sensor image shows signiﬁcant features in this area. This only needs to fuse the information with edge features, and does not need to fuse all the information. In the process, the multi-resolution coefﬁcient with relatively signiﬁcant signiﬁcance can be selected as a fusion result, and the destruction of the consistency of the region will not deteriorate the ﬁnal fusion result. If the area belongs to the background area, it means that the background sensor image is more important in the area than the target sensor image. Similar to the importance area, the fusion method uses all the multiresolution coefﬁcients of this part of the background sensor image as the fused image. The corresponding area section: Fig. 3.7 shows the target sensor image and its image segmentation results. The ﬁgures, roads, and lawns in the ﬁgure are

(a) Original infrared image

(b) Segmented image

Fig. 3.7 Zone division of the target sensor image. (a) Original infrared image. (b) Segmented image

130

3 Feature-Level Image Fusion

Area attribute membership

µ

A1

A3

A2

µ A2 (u ) µA (u ) 1

µA3 (u ) u

Regional characteristics

Fig. 3.8 Schematic diagram of the membership function of the area attribute

the target importance area, the sub-important area, and the background area, respectively. Let the elements of regional fuzzy set A be A1, A2, A3, respectively, which represent the target importance area, the sub-important area, and the background area. The regional feature attribute is u(u 2 U ), then μA(u) is called u to the membership of regional fuzzy set A. Degrees, so when determining regional attributes, of the regional convergence strategy should be 8 > < A image F ¼ f ðA, BÞ > : B image

area

μA1 ðuÞ ¼ 1 μA2 ðuÞ ¼ 1

area

μA3 ðuÞ ¼ 1

ð3:50Þ

Since the importance of the image area is relative, that is, it cannot be judged whether the area is important or not important according to a certain feature of the image. The importance of the area is a vague concept, so it is necessary to blur the importance attribute of the image, and the fusion process is performed in the fuzzy space. This method is the more commonly used normal distribution membership function, as shown in Fig. 3.8. The deﬁnition of this function is "

2 # MEðuÞ E A j μA j ðuÞ ¼ exp Lmax Lmin

ð3:51Þ

2

Among them, μA j ðuÞ indicates that the u area belongs to the Aj membership function; Lmax and Lmin are the ideal clustering centers for the important areas and background areas of the image, E(A1) ¼ Lmin, E(A3) ¼ Lmax; E(Aj) indicates that Aj is the ideal clustering center, and Lmax and Lmin are the maximum and minimum gray levels of min the target sensor image, EðA2 Þ ¼ Lmax þL ; ME(u) is the actual clustering center of 2

3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics

131

the u area. We obtain a series of fuzzy region membership degrees obtained from a certain sensor image as the fuzzy region features of the image. In Fig. 3.8, A1, A2, A3, respectively, represent the three elements of the image fuzzy region set, which correspond to the three ideal fusion results, respectively. When ME(u) ¼ A1 indicates that the image area u is the background, the corresponding region of the background sensor image can be directly used as the fusion result and set to F1; when ME(u) ¼ A2 indicates that the region u is the next most important region, we adopt the pixel-based image fusion method [15]. The result of F2 fusion is obtained. When ME(u) ¼ A3 indicates that the area u is an important area, the corresponding area of the target sensor image is directly used as the fusion result F3. Finally, according to the characteristics of each area, μA j ðuÞð j ¼ 1, 2, 3Þ points of the membership degree space of all image pixels are deﬁned. Based on these degrees of membership, the fusion results of the images are determined. 3 P

μAi ðuÞF i F ¼ i¼13 P μ A i ð uÞ

ð3:52Þ

i¼1

F in the formula is a multi-resolution representation of the fusion result. Corresponding multi-resolution inverse transform can get the ﬁnal fusion result.

3.4.3

Image Fusion Method Based on Fuzzy Region Features

Based on the principle of multi-resolution image fusion, it can make the fusion result without artiﬁcial patchwork traces, and can show the characteristics of the original image while achieving a natural transition between features. For this reason, this book adopts the multi-resolution analysis method of the optimal ﬁlter bank wavelet frame or the multi-resolution analysis method based on the joint texture and edgegradient features mentioned in the previous section, to decompose and reconstruct the image and optimize the ﬁltering. The design of the instrument group and the establishment of the wavelet framework have been elaborated in this chapter. The fusion rules are based on the fusion method based on fuzzy region features. Figure 3.9 shows the structure of multi-resolution image fusion (FRF_MIF) based on fuzzy region features. The basic idea is to ﬁrst decompose the original image into an optimal ﬁlter bank based on the original image registration and then calculate the image segmentation result under multi-resolution transformation, according to the fuzzy region feature fusion rule for the decomposed coefﬁcient. The fusion algorithm is used to obtain new fusion coefﬁcients. Finally, the fusion image is obtained through the corresponding inverse transformation.

132

3 Feature-Level Image Fusion

Background sensor image

Target sensor image Multiresolution transformation Decomposition factor

High frequency component

Low-frequency component

Multiresolution transformation Decomposition factor

Low-frequency component

High frequency component

K mean segmentation Clustering results

Measurement index

Regional fuzzy membership function

Fusion decision

Fusion operation

After fusion coefficient Multiresolution inverse Fusion image Fig. 3.9 Image fusion algorithm structure based on fuzzy region features

Measurement index

3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics

133

The fusion algorithm may speciﬁcally include the following steps: 1. Registration of the input original image in space (taking two input original images as an example). 2. Each input original image is decomposed by the optimal ﬁlter bank wavelet frame to obtain its own multi-resolution image sequence. 3. The low-frequency component of the target sensor image is clustered and divided into three categories, which are respectively represented as an important target area, a sub-important area, and a background area. 4. According to the ideal cluster center, the fuzzy membership function is obtained for the segmented region, and the fuzzy region feature is obtained. The background sensor image can participate in the process of obtaining the fuzzy region feature, or it cannot participate in the process. 5. According to the characteristics of the fuzzy region and the measurement, the fusion decision is obtained after getting parameters of the high-frequency part of the multi-sensor image. 6. On the basis of obtaining the measurement index, a multi-resolution representation of the fused image is established according to the fusion rule based on the fuzzy region feature. 7. Consistent veriﬁcation of the multi-resolution representation of the merged image. 8. The fusion image is obtained by inverse wavelet frame transform of the optimal ﬁlter bank. Here, fusion experiments of millimeter-wave images and visible light images, infrared and visible light images, and infrared and SAR images have been conducted, respectively. The performance evaluation of image fusion results adopts two objective evaluation indicators: pixel mutual information and edge mutual information. Figure 3.10a, b are millimeter-wave images and visible-light images for inspecting concealed weapons. From (a), we can see that the ﬁrearms are imaged. According to the application, guns are an important target in the detection of dangerous goods. The rest of the area is the sub-important area and background area. The experiment uses a millimeter-wave image as a target sensor image. Through the image fusion technique, it can be clearly seen that the gun is hidden on the third person from the left. It uses the region-based fusion method of Zhang [12], the region-based fusion method of Piella [13] the method based on fuzzy region features introduced in this chapter, and the multi-scale method based on joint texture and gradient features described in previous section. Fuzzy feature region fusion method is used to fuse images. The multi-scale decomposition layers are all three layers. The fusion results are shown in Fig. 3.10c–f. It can be clearly seen that the image fusion method based on fuzzy region features has better regional consistency (or integrity) than other methods. The evaluation indicators are shown in Table 3.4. Both pixel mutual information and edge mutual information have improved signiﬁcantly.

134

3 Feature-Level Image Fusion

Fig. 3.10 Input original image and fusion image. (a) Millimeter-wave image 1; (b) Visible light image 2; (c) Fusion result using Zhang Z region fusion method; (d) Fusion result using Piella G [13] region fusion; (e) Based on fuzzy region feature image fusion result 1; (f) Image fusion results based on fuzzy region feature 2

3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics Table 3.4 Index evaluation of image fusion results

Image fusion method Using the method of Zhang Z Using the method of Piella G Based on FRF methoda Based on FRF methodb

135 EMI 0.5178 0.6021 0.6169 0.6134

PMI 1.3912 1.5668 1.6503 1.7702

a

Represents the fusion method based on fuzzy region features proposed in this book b Represents a multi-scale fuzzy region fusion method based on joint texture and gradient features

Among them, in Zhang Z’s [12] region-based image fusion method in the wavelet domain, the image segmentation algorithm is formed by ﬁrst detecting the edges and then linking the edges, and the fusion rules are in the area size and the activity-level measurement (the activity-level measure). The fusion rule combines the size and the activity-level measurement of the region, and is formed under certain priority conditions. The segmentation algorithm is more complex to implement, but the ﬁnal fusion result of the image is determined by its corresponding region fusion rule. Therefore, a relatively simple segmentation algorithm is used to generate the image region. The region fusion algorithm uses the method of obtaining the regional signiﬁcance measure to determine that a relatively close fusion result will be obtained. The current method uses a K-means algorithm instead of an edge-linked region segmentation method. This algorithm is relatively simple to implement and can also reﬂect the performance of region fusion in the wavelet domain to some extent. The speciﬁc algorithm is as follows: 1. Wavelet transform of a certain number of layers of two images with good registration. 2. In the wavelet transform, each layer is decomposed using the low-frequency component of the layer; we divide the low-frequency component to achieve the region; the K-means algorithm is used to obtain the segmentation results of each layer. 3. Using two regions of the divided image to overlap, a sub-region image is obtained; taking into account the saliency measurement of each region in the wavelet domain of the two original images, a decision surface can be obtained. 4. According to decision-making face fusion, the fusion results in the wavelet domain are obtained. 5. Consistency veriﬁcation of the multi-resolution representation of the merged image. 6. The wavelet inverse transform of the fusion results in the wavelet domain results in the ﬁnal fusion result. The region-based image fusion method of Piella G [13] is performed in the Laplacian pyramid decomposition space. The segmentation algorithm of each layer adopts a multi-resolution “inheritance” segmentation strategy, and the method of fusion rules is similar to that of Zhang [12]. In order to get a better fusion effect, in this chapter, the algorithm also needs to verify the consistency.

136

3 Feature-Level Image Fusion

Fig. 3.11 Input original image and fused image. (a) Infrared image; (b) SAR image; (c) Fusion result using Zhang Z region fusion method; (d) Using Piella G region fusion result; (e) Fuzzy region feature fusion result; (f) Based on fuzzy region feature fusion result

It can be seen from Fig. 3.11a that the road is very clear, and this area can be regarded as an important target area. From Fig. 3.11b, it can be seen that the background information is very rich. Experiments use infrared images as target sensor images. Figure 3.11c–f are Zhang et al. [12] region-based fusion method, Piella [13] region-based fusion method, fuzzy region feature-based fusion method, and the second chapter is based on texture and gradient features. The scale method combines the fusion results obtained by the fusion method of fuzzy region features. The multi-scale decomposition layer is of three layers.

3.5 Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic. . . Table 3.5 Image fusion result indicator evaluation

Image fusion method Using the method of Zhang Z Using the method of Piella G Based on FRF methoda Based on FRF methodb

EMI 0.5207 0.5178 0.5692 0.5835

137 PMI 1.4870 1.5969 1.6400 1.6515

a

Represents the fusion method based on fuzzy region features proposed in this book b Represents a multi-scale fuzzy region fusion method based on joint texture and gradient features

Compared to other methods, the image fusion method based on fuzzy region features has relatively complete target information and background information. In addition, according to the evaluation index, the fusion result still retains more edge information, as shown in Table 3.5. It can also be seen from the above that when the texture information of the original image is relatively rich, a multi-scale transformation method based on texture and gradient features can obtain better fusion effects.

3.5

Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic Algorithm

Feature-level fusion recognition refers to the synthesis and processing of information obtained after preprocessing and feature extraction, and then classify and identify. Commonly used feature extraction methods include principal component analysis (PCA) [16] linear discriminant analysis (LDA) [17] and independent component analysis (ICA) [18]. However, when these methods are applied to infrared human faces, they do not take full advantage of the details provided by the infrared human face and do not achieve satisfactory results. At present, Gabor wavelet is widely used in feature detection because of its excellent time–frequency aggregation and good directional selectivity [19]. However, the maximum bandwidth of the Gabor wavelet is limited to one frequency, and the widest possible spectrum information with the best spatial location cannot be obtained. The Log-Gabor function is a good alternative to the Gabor function [20]. The bandwidth of the Log-Gabor wavelet can be arbitrarily constructed, which overcomes the disadvantages of the general Gabor wavelet for over-representation of low-frequency components and insufﬁcient representation of high-frequency components. The study of feature-level fusion algorithms has not received due attention in recent years compared with other-level fusion algorithms [21]. However, featurelevel fusion is very important in information fusion processing. First, it extracts more efﬁcient feature information from the original data space and reduces the spatial dimension. Second, it eliminates redundant information between the feature representation vectors obtained by each data source, thereby facilitating subsequent decision-making. In short, feature-level fusion can achieve an effective, low-dimensional feature representation vector that is conducive to the ﬁnal decision.

138

3 Feature-Level Image Fusion

The existing feature-level fusion algorithms can be divided into two categories: feature selection and feature combination. The so-called feature selection is ﬁrst of all the feature representation vector together and then uses a suitable method to generate a new feature representation vector, the elements of each position of the new vector are selected from the same position of the original vector elements. The fusion method based on dynamic programming proposed by Zhang [22] and the fusion method based on supervised neural network proposed by Battiti [23] all fall into this category. The so-called feature combination, that is, all feature representation vectors are directly combined into a new vector. The most typical feature combination method is a serial fusion strategy that serializes two or more feature representation vectors into a large vector [21].

3.5.1

Feature-Level Fusion Algorithm

1. Serial Strategy (Serial Strategy) [6, 21] It is assumed that A and B are two feature spaces deﬁned in the pattern sample space Ω. For any sample Γ 2 Ω, the corresponding feature representation vector is α 2 A and β 2 B. The serial fusion strategy chains the two feature representation vectors into a large vector γ. α γ¼ β

ð3:53Þ

Obviously, if α is the n dimension and β is the m dimension, the synthesized vector γ is the (n + m) dimension. In this way, all serially synthesized vectors form a feature space of a (n + m) dimension. 2. Parallel Strategy [21, 24] It is assumed that A and B are two feature spaces deﬁned in the pattern sample space Ω. For any sample Γ 2 Ω, the corresponding feature representation vector is α 2 A and β 2 B. The parallel fusion strategy represents these two features into a complex vector γ. γ ¼ α þ iβ

ð3:54Þ

Among them, i is an imaginary unit. It should be noted that if the dimensions of α and β are inconsistent, the dimension is consistent by zeroing the low dimensional vectors. For example, α ¼ (a1, a2, a3)T and β ¼ (b1, b2)T ﬁrst convert β to (b1, b2, 0)T and then synthesize vector γ ¼ (a1 + ib1, a2 + ib2, a3 + i0)T. Deﬁne a parallel fusion feature space Ω on C ¼ {α + iβ|α 2 A, β 2 B}. Obviously, this is an n-dimensional complex vector space, where n ¼ max {dimA, dimB}. In this space, the inner product can be deﬁned as

3.5 Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic. . .

ðX, Y Þ ¼ X H Y

139

ð3:55Þ

Among them, X, Y 2 C and H indicate a total transfer. The complex vector space that deﬁnes the inner product above is called unitary space. In the space, you can introduce the following norms vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ n pﬃﬃﬃﬃﬃﬃﬃﬃﬃ u uX a2j þ b2j kZ k ¼ Z H Z ¼ t

ð3:56Þ

j¼1

Among them Z ¼ (a1 + ib1, , an + ibn)T. Correspondingly, the distance (unitary distance) between the complex vector Z1 and Z2 can be deﬁned as kZ 1 Z 2 k ¼

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ðZ 1 Z 2 ÞH ðZ 1 Z 2 Þ

ð3:57Þ

Compared with the serial fusion strategy, the parallel fusion strategy reduces the dimension of the fused vector. More importantly, it introduces the concept of unitary space, which transforms the fusion problem of two real vector spaces into a mathematical problem of a complex vector space. Compared with the serial fusion strategy, the parallel fusion strategy reduces the dimension of the fused vector. More importantly, it introduces the concept of unitary space, which transforms the fusion problem of two real vector spaces into a mathematical problem of a complex vector space. 3. Genetic Algorithm (GA) [1, 25] Genetic algorithms mimic the evolution of living things. In the process of biological evolution, each species is more and more adapted to the environment in the process of continuous development; the basic characteristics of each individual of the species are inherited by its descendants, but the descendants are not exactly the same as their fathers; individual characteristics that are more adaptable to the environment in the survival and development of individuals can be preserved, reﬂecting the principle of survival of the ﬁttest. The basic idea of genetic algorithms is based on this. The algorithm interprets the possible solutions of the problem into 0 and 1 code strings called chromosomes. Given a set of initial chromosomes, the genetic algorithm manipulates them using genetic operators to generate a new generation. The new generation of chromosomes may contain better solutions than previous generations. Every chromosome needs to be evaluated by its ﬁtness function. The goal of the genetic algorithm is to ﬁnd the most suitable chromosome. The general genetic algorithm consists of four parts: coding mechanism, ﬁtness function, genetic operator, and control parameters. The coding mechanism is the basis of the genetic algorithm. The genetic algorithm is not a direct discussion of the research object, but through a certain coding mechanism to the object is uniﬁed to a speciﬁc symbol (letter) arranged in a certain sequence of strings (chromosomes). In the commonly used genetic algorithm, the chromosome consists of 0 and 1, and the code is a binary string.

140

3 Feature-Level Image Fusion

The code of the genetic algorithm can have a very broad understanding. In the optimization problem, one chromosome corresponds to one possible solution. The survival of the ﬁttest is the principle of natural evolution. Good and bad must have standards. In genetic algorithms, the degree of ﬁtness for each chromosome is described by a ﬁtness function. The purpose of introducing the ﬁtness function is to evaluate and compare the chromosomes according to their ﬁtness and determine the degree of good or bad. There are three most important operators of genetic algorithms: selection, crossover, and mutation. The role of selection is to determine whether it is eliminated or copied in the next generation based on the degree of chromosomes. In general, by selecting, there will be a greater chance of the chromosomes with good ﬁtness, while the chance that the chromosomes with low ﬁtness, i.e., inferior chromosomes, will continue to exist. Crossover operators allow different chromosomes to exchange information. A mutation operator is a value that changes a position on a chromosome. In the actual operation of the genetic algorithm, certain parameters need to be properly determined to improve the effect of selection. These parameters are: population size per generation, crossover rate (probability of performing crossover operator), mutation rate (probability of performing mutation operator), in addition to genetic algebra, or other indicators that can be used to determine the discontinuation of reproduction. The main calculation process of the genetic algorithm is as follows: • • • • • • • • • • •

Begin. Determine ﬁtness function and coding rules. Identify genetic algebra, population size, crossover rate, and mutation rate. Randomly generate the initial population and calculate its ﬁtness. Repeat. Selecting two (or more) chromosomes in the population to perform crossover at the crossover rate. Choosing chromosomes in the population to mutate at the mutation rate. Calculate the ﬁtness of each chromosome. Select offspring groups (properly breed; those who are disqualiﬁed). Until the speciﬁed algebra is reached or a satisfactory result is obtained. End.

3.5.2

Fusion Recognition Based on Genetic Algorithm

The previous section explored three representative feature-level fusion algorithms. The serial fusion strategy is very simple, but it increases the number of dimensions. In addition, the direct vector cascades the effect of the fusion. The parallel fusion strategy introduces unitary space, and the dimension after the fusion remains unchanged. However, it is clear that the ﬁnal calculation results are the same for the parallel fusion distance and the Euclidean distance of the serial fusion. In this

3.5 Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic. . . Visible face

Log -Gabor

141

ICA GA

Top -match

Result

IR face Log -Gabor

ICA

Feature Extraction

Feature Fusion

Classification

Fig. 3.12 Fusion identiﬁcation framework

way, parallel fusion does not seem to make much sense. In fact, in practical applications, whether it is a serially obtained large vector or a parallel-derived complex vector, it is often necessary to perform feature extraction on these vectors in order to effectively fuse information [21]. But this undoubtedly increases the workload. The genetic algorithm, because of its use of random operations, has no special requirements on the search space and no requirements and has the advantages of simple operation, fast convergence rate, global optimization, etc., has been rapidly developed in recent years, and has been widely used in many ways. Therefore, this chapter proposes a multisource face fusion recognition algorithm based on genetic algorithm. Figure 3.12 shows a fusion recognition framework based on genetic algorithms. Assume that each face corresponds to a pair of face images at the same time, one is a visible light face, and the other is an infrared face. They are acquired synchronously and undergo a strict registration. This can be ensured through the use of integrated multi-source sensors. The following describes in detail the various parts of the identiﬁcation framework. 1. Extracting Independent Log-Gabor Features Log-Gabor wavelet and independent component analysis (ICA) are used to calculate independent Log-Gabor features for visible and infrared face images. By obtaining the convolution with a set of Log-Gabor wavelets, a multi-level Log-Gabor feature of the face image can be obtained. Figure 3.13 shows the amplitude of the test plot and its convolution output with a set of Log-Gabor wavelets (2 scales, 6 directions). By serially outputting the columns of the matrix, each output can be represented as a column vector. Before the concatenation, the output of the ρ is ﬁrst down sampled to reduce the dimensionality of the vector space. The output column vectors obtained at different frequencies and directions are ﬁnally concatenated into a large vector, the Log-Gabor feature vector O, to represent the input face image. Since the dimension of O is very high, y is reduced by PCA to obtain a low-dimensional feature vector F. By using the transformation matrix z obtained by the ICA method, an independent Log-Gabor feature z of the face image can be obtained, as shown in the following equation y ¼ Fz

ð3:58Þ

Since ICA does not provide the order between independent components, the following is deﬁned based on the order of the ratios of intraclasses to intraclass variances [26]

142

3 Feature-Level Image Fusion

(a)

(b)

(c)

(d) Fig. 3.13 Test pattern and its amplitude with a set of Log-Gabor wavelets (2 scales, 6 directions) convolution output. (a) Visible light test chart; (b) Infrared test chart; (c) The amplitude of the convolution output of a visible light test chart and a set of Lo-Gabor wavelets; (d) The amplitude of the convolution output of an infrared test chart with a set of Log-Gabor wavelets

3.5 Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic. . .

P r¼

ð xk xÞ 2

σ between k ¼ PP σ within ðxki xk Þ2 k

143

ð3:59Þ

i

Among them, σ between is the variance between classes, and σ within is the sum of intraclass variances. Figure 3.13 shows sampling image and the magnitudes of the convolution outputs of the sampling image with the Log-Gabor wavelets: (a) visible sampling image, (b) LWIR sampling image, (c) the convolution outputs with visible sampling image, and (d) the convolution outputs with LWIR sampling image. 2. Feature Fusion Using Genetic Algorithm (GA) Let zV and zI represent independent Log-Gabor features obtained from visible and infrared face images, respectively. Through the genetic algorithm (GA), the fused feature representation vector Z can be obtained Z ¼ f ðzV , zI , xÞ

ð3:60Þ

Among them x is to ﬁnd the optimal chromosome by the genetic algorithm. Each bit of x is related to the feature of a particular location. The value of this bit determines whether the feature at this location is selected from zV (value 1) or zI (value 0). In this chapter, the ﬁtness function is designed based on the fusion recognition rate. 3. Using Top-Match Classiﬁcation The vector Z obtained by feature fusion is used to classify the input face. Top-match, the nearest neighbor, is a simple and effective method of classiﬁcation. The face is awarded to the class k if the Euclidean distance is minimized ε k ¼ kZ C k k

ð3:61Þ

Among them, Ck is a fusion vector describing the k face.

3.5.3

Experimental Results and Evaluation

In order to verify the algorithm performance of the fusion recognition based on genetic algorithm proposed in this section, the following experiments are performed. The experimental data is from the Equinox database [3], where the visible light image has a gray resolution of 8 bits and a long wave infrared (LWIR) image of 12 bits. All data is obtained from the newly designed CCD and LWIR integrated sensors, which can simultaneously acquire visible and infrared images with an image registration accuracy of 1/3 pixels. The original image size is 320 240, and the face is 180 140. Sixty objects are selected, and one training library, one veriﬁcation

144

3 Feature-Level Image Fusion

Table 3.6 Recognition performance of PCA, ICA, Log-Gabor, and the proposed scheme (%)

Test_no Test_illumination Test_eyeglasses Test_expression Test_all

PCA 98.83 97.83 30.17 93.83 86.83 71.67 97.83 96.83 28.67 63.50

ICA 97.83 97.83 72.67 87.83 87.83 69.67 95.83 96.83 62.00 68.17

LogGabor + PCA 97.83 97.83 83.83 91.83 97.83 85.83 95.83 96.83 78.67 83.83

LogGabor + ICA 97.83 98.83 87.83 95.83 97.83 90.00 95.83 98.83 83.33 85.83

The algorithm proposed in this chapter 100 97.00 97.83 99.50 92.83

library, and ﬁve test libraries are created for the visible light face and the infrared face. There are two training charts for each object in the training library, and there are ten test charts for each object in the test library and veriﬁcation library. Select the training library, test library, and veriﬁcation library according to the following criteria: • • • • • • •

Training set: Forward lighting, no glasses, no expression. Test no set: Forward lighting, no glasses, no expression. Test illumination set: Sidelight, no glasses, no expression. Test eyeglasses set: Forward lighting, glasses, expressionless. Test expression set: Forward lighting, no glasses, expression. Test all set: Lateral lighting, glasses, expression. Validation set: Lateral lighting, glasses, expression.

The genetic algorithm ﬁnds an optimal chromosome based on the veriﬁcation of the library’s image data. Each test library corresponds to a test condition. In the experiment, the performance of the fusion recognition algorithm mentioned in this chapter was evaluated by various test conditions. In the experiment, the Log-Gabor wavelet used has four dimensions and six directions. The parameters are set as follows: the wavelength of the smallest ﬁlter is 3, the scale factor of the adjacent ﬁlter is 2, the ratio of the standard deviation of the radial Gaussian function to the center frequency of the ﬁlter is 0.65, and the ratio of the spacing angle in the ﬁlter direction to the standard deviation of the angle Gaussian function is 1.5. The down sampling factor ρ is 4. The control parameters of the genetic algorithm are set as follows: the population size is 100, the genetic algebra is 100, the crossover rate is 0.96, and the mutation rate is 0.02. Table 3.6 compares the performance of the algorithm proposed in this chapter with the single-sensor face recognition algorithm (PCA, ICA, Log-Gabor). In each row, visible light results are listed above the infrared. In all cases, the number of

3.6 Summary

145

Table 3.7 Fusion recognition performance by using different feature extraction techniques and GA (%) Test_all

PCA 71.67

ICA 78.67

Log-Gabor + PCA 87.83

Log-Gabor + ICA 92.83

feature elements taken by PCA and ICA is 60, and the classiﬁcation method uses top-match. Comparing the second, third, and fourth rows in Table 3.6 with the ﬁrst row respectively, the effects of lighting, glasses, and expression on the face recognition can be obtained. Comparing the ﬁfth row with the ﬁrst row, the inﬂuence under joint interference can be obtained. According to the analysis in the table, the performance of the Log-Gabor+ICA algorithm is best in both single-sensor and visible-to-infrared face. Even so, under the inﬂuence of light, the recognition rate of the visible light face is 87.83%; under the inﬂuence of glasses, the recognition rate of the infrared face is 90%; under the joint and interference, the recognition rates of visible light and infrared face are 83.33% and 85.83%, respectively. Under all test conditions, the performance of the proposed algorithm is higher than 90%, which is obviously better than the face recognition algorithm under the single sensor. Table 3.7 compares the face recognition performance after feature-level fusion using different face features (PCA, ICA, Log-Gabor) under test-all conditions (i.e., under combined interference conditions). After the feature extraction of the multisource face, genetic algorithm (GA) was used for fusion processing. The experimental results show that the independent Log-Gabor feature of the algorithm proposed in this chapter achieves better recognition performance than the fusion of other face features.

3.6

Summary

Image feature-level fusion belongs to the intermediate level of image fusion. This is used to comprehensively analyze and process the multiple characteristic information obtained by multi-sensor to realize the classiﬁcation, collection, and synthesis of multi-sensor data. In general, the extracted feature information should be a sufﬁcient representation of the pixel information and sufﬁcient statistics, including the edge of the target, texture, and regional characteristics. This chapter discusses multiresolution gradient features (corresponding to edge information), texture features, and multi-resolution fuzzy region features and their corresponding fusion algorithms from the perspective of multi-resolution transform space. The multi-resolution method of the image is an important analytical tool of signal processing and an important theoretical tool of image fusion. This transformation method generally includes two major branches of theory: one is multi-resolution transformation method based on pyramid transformation, and another is multiresolution transformation method based on wavelet. This chapter studies pyramid-

146

3 Feature-Level Image Fusion

based transformation methods and proposes a multi-scale transformation method based on texture and gradient features. The transform method displays the salient features of the image, such as texture and edge information, as much as possible in the transform coefﬁcients, so that the fused image has the feature information of each scale and direction of the original image. When performing multi-scale image fusion based on texture and gradient features, the image is ﬁrst decomposed into sub-band images with different features according to the texture ﬁlter and the gradient ﬁlter, and then the sub-band images are fused according to the contrast-based fusion rule so as to obtain a new set of fused sub-band images. Finally, the fused image is obtained by using the multi-scale inverse transform. The fusion results based on the Laplacian pyramid transform, the fusion method based on the FSD pyramid transform, and the pyramid transform based on the gradient feature are compared to show the effectiveness of the image fusion algorithm based on the joint texture and the gradient feature. Image fusion algorithm based on fuzzy region features focuses on the study of image fusion rules. So far, image fusion rules are still very important research topics in image fusion. The quality of the fusion rules directly affects the speed and quality of the fusion image. Therefore, the fusion rules are very important in the image fusion method. In this chapter, different sensor image fusion methods based on fuzziﬁed regional features are also arranged by ranking the importance of regions, and the region attributes are classiﬁed according to the features of regions. Different attributes of the region are used to fuse the images in the fuzzy space, and the contrast of the image is improved while preserving the regional consistency of the important region and the background region. Experiments show that this method can obtain a better fusion image, and its fusion result is better than the region-based fusion method of Zhang et al. [12] and the region-based fusion method of Piella et al. [13]. The selection of regional features and the fuzzy membership function in the image are crucial to the fusion performance. Therefore, the suitable regional features and the fuzzy membership function should be designed according to the different applications and the imaging characteristics of the original image.

References 1. M.N. Do, M. Vetterli, Frame reconstruction of the Laplacian pyramid, in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol. 6, (IEEE, Piscataway, NJ, 2001), pp. 3641–3644 2. D.R. Barron, O.D.J. Thomas, Image fusion through consideration of texture components. Electron. Lett. 37(12), 746–748 (2001) 3. A. Toet, L.J. Van Ruyven, J.M. Valeton, Merging thermal and visual images by a contrast pyramid. Opt. Eng. 28(7), 287789 (1989) 4. C.A. Xydeas, V. Petrovic, Objective image fusion performance measure. Electron. Lett. 36(4), 308–309 (2000) 5. G. Qu, D. Zhang, P. Yan, Information measure for performance of image fusion. Electron. Lett. 38(7), 313–315 (2002)

References

147

6. P.J. Burt, The pyramid as a structure for efﬁcient computation, in Multiresolution Image Processing and Analysis, (Springer, Berlin, 1984), pp. 6–35 7. Y.T. Zhou, Multi-sensor image fusion, in Proceedings of 1st International Conference on Image Processing, vol. 1, (IEEE, Piscataway, NJ, 1994), pp. 193–197 8. P.J. Burt, R.J. Kolczynski, Enhanced image capture through fusion, in 1993 (4th) International Conference on Computer Vision, (IEEE, Piscataway, NJ, 1993), pp. 173–182 9. H. Li, B.S. Manjunath, S.K. Mitra, Multisensor image fusion using the wavelet transform. Graph. Models Image Process. 57(3), 235–245 (1995) 10. I. Koren, A. Laine, F. Taylor, Image fusion using steerable dyadic wavelet transform, in Proceedings, International Conference on Image Processing, vol. 3, (IEEE, Piscataway, NJ, 1995), pp. 232–235 11. Y. Chibani, A. Houacine, On the use of the redundant wavelet transform for multisensor image fusion, in ICECS 2000. 7th IEEE International Conference on Electronics, Circuits and Systems (Cat. No. 00EX445), vol. 1, (IEEE, Piscataway, NJ, 2000), pp. 442–445 12. Z. Zhang, R.S. Blum, Region-based image fusion scheme for concealed weapon detection, in Proceedings of the 31st Annual Conference on Information Sciences and Systems, (1997), pp. 168–173 13. G. Piella, A general framework for multiresolution image fusion: from pixels to regions. Inform. Fusion 4(4), 259–280 (2003) 14. L.A. Zadeh, Fuzzy sets. Inf. Control. 8(3), 338–353 (1965) 15. V.S. Petrovic, C.S. Xydeas, Cross-band pixel selection in multiresolution image fusion, in Sensor Fusion: Architectures, Algorithms, and Applications III, vol. 3719, (International Society for Optics and Photonics, Bellingham, WA, 1999), pp. 319–326 16. Y. He, Multi-Sensor Information Fusion with Application (Publishing House of Electronics Industry, Beijing, 2007) 17. T. Liu, Data Mining Technology and Application (National University of Defence Technology Press, Hunan, 1998) 18. A.N. Steinberg, Data fusion system engineering, in Proceedings of the Third International Conference on Information Fusion, vol. 1, (IEEE, Piscataway, NJ, 2000), pp. MOD5–MOD3 19. Y. Kang, Data Mining Theory and Application (Xidian University Press, Xi’an, 1997) 20. A. Farina, F.A. Studer, Radar Data Processing: Introduction and Tracking, vol 2 (Research Studies Press, Baldock, 1985) 21. E. Waltz, J. Llinas, Multisensor Data Fusion, vol 685 (Artech House, Boston, 1990) 22. D.L. Hall, S.A. McMullen, Mathematical Techniques in Multisensor Data Fusion (Artech House, Boston, 2004) 23. P.J. Burt, E.H. Adelson, The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983) 24. A. Toet, Image fusion by a ratio of low-pass pyramid. Pattern Recogn. Lett. 9(4), 245–253 (1989) 25. B. Aiazzi, L. Alparone, F. Argenti, S. Baronti, I. Pippi, Multisensor image fusion by frequency spectrum substitution: subband and multirate approaches for a 3:5 scale ratio case, in IGARSS 2000. IEEE 2000 International Geoscience and Remote Sensing Symposium. Taking the Pulse of the Planet: The Role of Remote Sensing in Managing the Environment. Proceedings (Cat. No. 00CH37120), vol. 6, (IEEE, Piscataway, NJ, 2000), pp. 2629–2631 26. B. Aiazzi, L. Alparone, A. Barducci, S. Baronti, I. Pippi, Multispectral fusion of multisensor image data by the generalized Laplacian pyramid, in IEEE 1999 International Geoscience and Remote Sensing Symposium. IGARSS’99 (Cat. No. 99CH36293), vol. 2, (IEEE, Piscataway, NJ, 1999), pp. 1183–1185 27. P. Tian, Q. Fang, Contrast-based multiresolution image fusion. Acta Electron. Sin. 28(12), 116–118 (2000)

Chapter 4

Decision-Level Image Fusion

Abstract Image fusion can be performed at three levels: pixel level, feature level, and decision level. Among them, decision-level fusion is a high-level information fusion, which is less explored and is a hot spot in the ﬁeld of information fusion. Compared to low- and middle-level fusion, high-level fusion is accurate, supports real time, and is also able to solve the disadvantages of single sensor imaging. However, the main disadvantage is the loss of information, which is high in this type of fusion. This chapter mainly introduces decision-level information fusion algorithms, including voting, Bayesian inference, evidence theory, fuzzy integral, and other speciﬁc methods. Taking the SAR and FLIR images as an example, the algorithm process and implementation of decision-level fusion are presented.

4.1

Introduction

Decision-level fusion is a high-level information fusion [1, 2]. It can be performed by following the four steps. They are: First is the multi-sensor imaging processing. Second is the decision generation. Third is the convergence in the fusion center. Final step is the concluding fusion process. In the information processing architecture, the peak of the fusion level is the decision-level fusion, also known as high-level fusion. People combine different types of current information and prior knowledge to fuse them into intelligent (best) decisions, which is a good example of decision-level fusion. In general, decision-level fusion is more perfect than the others and can better overcome the shortcomings of each sensor. For other fusion levels, the failure of one sensor means the failure of the entire system. Compared with pixellevel fusion [3, 4] and feature-level fusion [5, 6], the decision-level fusion has the best real-time performance, but the main drawback of this method is the information loss. Before the fusion, each sensor completes the goal of the decision-making. Then according to a certain fusion criteria and the credibility of each decision-making, it will make the best decision.

© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 G. Xiao et al., Image Fusion, https://doi.org/10.1007/978-981-15-4867-3_4

149

150

4

Decision-Level Image Fusion

Decision-level information fusion algorithms include voting, Bayes inference, evidence theory, fuzzy integrals, and various other speciﬁc methods. This chapter ﬁrst introduces these fusion methods and then presents the algorithm implementation of the fusion between SAR and FLIR images at decision level.

4.2

Fusion Algorithm Based on Voting Method

The voting law is conceptually very simple, similar to the democratic election in daily life, with the minority obeying the majority and passing by more than half. Therefore, voting methods (including Boolean “and”, “or”) are the simplest techniques for multisensory comprehensive labeling. Each sensor provides an input statement that observes the identity of the entity, which is then searched by voting to ﬁnd a statement that “more than half the sensors agree” (or other simple decision criteria) and announce the result of the vote: a union description. Of course, sometimes it may be necessary to introduce weighting methods, thresholding techniques, and other methods of determination, thus increasing the complexity of the voting method to a certain extent. Voting is very useful when accurate prior probabilities are not available, especially for real-time blending.

4.3

Fusion Algorithm Based on D-S Evidence Theory

Among all kinds of information fusion methods, it is the characteristic of D-S evidence theory to carry on uncertainty reasoning. It uses the information collected by the sensor as evidence to establish a corresponding basic credibility in the decision-making target set. In this way, evidence reasoning can use the Dempster consolidation rules to combine different pieces of information into a uniﬁed message representation within the same decision framework. Evidence decision theory allows the credibility to be given directly to sensor information, which avoids simplifying assumptions about unknown probability distributions and preserves information. These advantages of evidence reasoning make it widely used in the multiinformation fusion of various intelligent systems. D-S evidence theory uses a recognition framework Θ to represent the set of topics of interest, which deﬁnes a set of functions m : 2Θ ! [0, 1]

4.3 Fusion Algorithm Based on D-S Evidence Theory

ð 1Þ ð 2Þ

151

mðΦÞ ¼ 0 X mðAÞ ¼ 1

ð4:1Þ

A⊆Θ

We can call m as the basic credibility distribution of the recognition framework. If A belongs to the recognition framework, then m(A) is called the basic credible number of A. The basic credible number reﬂects the credibility of A itself. For any set of propositions, the D-S evidence theory proposes the concepts of credibility (Bel) and plausibility (Pl) functions PlðAÞ ¼ 1 Bel A X BelðAÞ ¼ mðBÞ

ð4:2Þ ð4:3Þ

B⊆A

(Bel(A), Pl(A)) can be used to describe the uncertainty of A. Given several pieces of credibility function based on different evidences in the same recognition frame and using Dempster’s synthesis method, the credibility function produced by the combined effect of different evidences can be obtained. Suppose that Bel1 and Bel2 are two independent evidence-based credibility functions in the same recognition frame, m1 and m2 are their corresponding basic credibility distributions, respectively, and focal components are A1, A2, Ak and B1, B2, Bl. By the Dempster synthesis rule, we can get

mðAÞ ¼

8 > > > < > > > :

m 1 ðA i Þ m 2 B j Ai \B j ¼A A 6¼ Φ P 1 m1 ðAi Þ m2 B j P

ð4:4Þ

Ai \B j ¼Φ

0

A¼Φ

where i ¼ 1,2,3. . .k; j ¼ 1,2,3. . .l. Evidence theory is an algorithm based on approximate reasoning. Compared with other pattern recognition algorithms, its algorithm is simple and effective, which can make better use of people’s experience knowledge for pattern recognition. The Dempster–Shafer method is a promotion of the Bayesian decision-making test. Evidence theory satisﬁes a weaker axiom than probabilistic theory and shows great ﬂexibility in distinguishing between evidence gathering processes of uncertainty.

152

4.4

4

Decision-Level Image Fusion

Fusion Algorithm Based on Bayes Inference

The Bayesian theory was published by Thomas Bayes in 1763. Its basic principle is that given a priori likelihood estimation of a hypothesis, the Bayesian method can update the hypothetical likelihood function with the advent of new evidence (observational data). Bayesian inference is an important method for dealing with stochastic patterns. Therefore, many scholars devote themselves to the research of information fusion methods based on Bayesian decision theory for different application backgrounds and various methods [7, 8]. Assume that X is a group of information sources, X ¼ {x1, x2, , xR}, and the target is judged to ωj according to the maximum posteriori probability (MAP). Z ! ωj If P ω j jx1 , , xR ¼ max Pðωk jx1 , , xR Þ k

ð4:5Þ

According to Bayesian theory, the maximum posterior probability (MAP) can be expressed as Pðωk jx1 , , xR Þ ¼

pðx1 , , xR jωk ÞPðωk Þ pðx1 , , xR Þ

ð4:6Þ

p(x1, , xR) is the joint probability density, which can be expressed as pðx1 , , xR Þ ¼

m X p x1 , , xR ω j P ω j

ð4:7Þ

j¼1

Assuming that information sources are statistically independent, we get pðx1 , , xR jωk Þ ¼

R Y i¼1

By the formula (4.6), (4.7), (4.8), available

p ð x i j ωk Þ

ð4:8Þ

4.5 Fusion Algorithm Based on Summation Rule

153

Pðωk Þ Pðωk jx1 , , xR Þ ¼

R Q

p ð x i j ωk Þ

i¼1 R Q

m P P ωj j¼1

p xi ω j

ð4:9Þ

i¼1

Substituting (4.9) into (4.5), we can get the following decision rule Z ! ωj If R R Y Y P ωj p xi ω j ¼ max Pðωk Þ p ð x i j ωk Þ k

i¼1

ð4:10Þ

i¼1

Using the posterior probability of each sensor, we can draw R R Y Y PðR1Þ ω j P ω j jxi ¼ max PðR1Þ ðωk Þ P ð ωk j x i Þ i¼1

k

ð4:11Þ

i¼1

The Bayesian approach is actually a special case of D-S evidence theory, so all data fusion problems using the Bayesian approach can be replaced by D-S evidence theory. Evidence theory can well describe the actual problem of decision fusion. However, there are also many documents that question the condition of evidence independence in Dempster’s synthesis rule. At the same time, Dempster’s synthesis rule has the problem of an exponential explosion. The implementation of this algorithm is a key issue. With the increasing demands on the performance of data fusion systems, it is not enough to rely solely on a certain method, and the joint use of multiple methods will become a development trend. Through various methods to learn from each other, we will receive satisfactory results.

4.5

Fusion Algorithm Based on Summation Rule

The summing rules and the rules of maxima and minima to be introduced later can all be regarded as the evolution of Bayesian inference and play an important role in the practical application [9]. Not only they simplify calculations, but also they work well. From the above description, we get Bayesian reasoning Z ! ωj If

154

4

Decision-Level Image Fusion

R R Y Y PðR1Þ ω j P ω j jxi ¼ max PðR1Þ ðωk Þ P ð ωk j x i Þ k

i¼1

ð4:12Þ

i¼1

Assume that X is a group of information sources, X ¼ {x1, x2, , xR}, and the target is judged to ωj according to the maximum posteriori probability (MAP). Take the logarithm of the left and right ends of (4.12) to get the summation rule of decision fusion Z ! ωj If R R X X ð1 RÞP ω j þ P ω j jxi ¼ max ð1 RÞPðωk Þ þ Pðωk jxi Þ k

i¼1

4.6

ð4:13Þ

i¼1

Fusion Algorithm Based on Min-Max Rule

Bayesian inference and summation rules construct the basic framework of sensor fusion. According to the following formula, we can deduce other fusion strategies R Y i¼1

R

Pðωk jxi Þ min Pðωk jxi Þ i¼1

R R 1X Pðωk jxi Þ max Pðωk jxi Þ R i¼1 i¼1

ð4:14Þ

Equation (4.14) shows that Bayesian inference, and summation rules can be approximated by the upper bound and the lower bound.

4.6.1

Maximum Rules

According to the summing rule and formula (4.14), we can get the maximum rule Z ! ωj If

4.6 Fusion Algorithm Based on Min-Max Rule

155

R R ð1 RÞP ω j þ R maxi¼1 P ω j jxi ¼ max ð1 RÞPðωk Þ þ R max Pðωk jxi Þ i¼1

k

ð4:15Þ Assuming the same probability of occurrence in any one class, we get Z ! ωj If R R max P ω j jxi ¼ max max Pðωk jxi Þ i¼1

4.6.2

k

i¼1

ð4:16Þ

Minimum Rules

According to Bayesian inference and formula (4.14), we can get the minimum rule Z ! ωj If R R PðR1Þ ω j min P ω j jxi ¼ max PðR1Þ ðωk Þ min Pðωk jxi Þ i¼1

i¼1

k

ð4:17Þ

Assuming the same probability of occurrence in any one class, we get Z ! ωj If R R min P ω j jxi ¼ max min Pðωk jxi Þ i¼1

k

i¼1

ð4:18Þ

156

4.7

4

Decision-Level Image Fusion

Fusion Algorithm Based on Fuzzy Integral

Fuzzy integrals have the ability to incorporate the importance of multi-source information (fuzzy measures) and the objective evidence (h-functions) provided by each source. Fuzzy integral is a nonlinear function deﬁned on the basis of fuzzy measure, which has the ability to fuse multi-source information. Fuzzy measures and fuzzy sets are two different concepts. Fuzzy sets reﬂect the degree of a known element subordinate to a collection that does not have a distinct boundary. However, fuzzy measures consider the degree of trust or the probability that an undetermined element belongs to a (fuzzy or non-ambiguous) set. In the mid-1970s, Japanese scholar Sugeno extended the classical probability measure, replacing the additive conditions in classical probability with monotonicity with weak constraint and proposing the concept of fuzzy measure. Suppose X is a group of information sources, P(X) is the power of X, g is a Sugeno fuzzy measure of X. In this case, g satisﬁes 1. Bounded conditions: g(Φ) ¼ 0, g(X) ¼ 1. 2. Monotonicity: 8A, B 2 P(X), if A ⊆ B

g(A) g(B).

3. Continuity: if 8Ai 2 P(X) and fAi g1 is monotonous, lim g ð A Þ ¼ g lim A i i . i¼1 i!1

i!1

4. Suppose 8A, B 2 P(X), A \ B ¼ Φ, we can get gðA [ BÞ ¼ gðAÞ þ gðBÞ þ λgðAÞ gðBÞ,

λ > 1

5. Suppose gi ¼ g({xi}) is a fuzzy density function, we can get

λþ1¼

n Y

1 þ λgi

ð4:19Þ

i¼1

Assuming hk(xi) is evidence that the target belongs to Ck from source xi, the fuzzy integral can be expressed as Z A

hk ðxÞ∘gðÞ ¼ sup min min hk ðxÞ, gðA \ E Þ E⊆X

x2E

¼ sup ½ min ðα, gðA \ F α ÞÞ α2½0, 1

ð4:20Þ

F α ¼ fxjhk ðxÞ αg Consider two sources x1 and x2, and then assume hk(x1) hk(x2). (If not, rearrange the sources of information.) Then fuzzy integrals can be expressed as

4.7 Fusion Algorithm Based on Fuzzy Integral Visible image

ICA

157

SVMs Fuzzy Integral

IR image

ICA

Result

SVMs

Feature Extraction

Classification

Decision Fusion

Fig. 4.1 Flowchart of the fusion recognition scheme

ek ¼ max ½ min ðhk ðx1 Þ, gðA1 ÞÞ, min ðhk ðx2 Þ, gðA2 ÞÞ

ð4:21Þ

A1 ¼ fx1 g, A2 ¼ fx1 , x2 g When using fuzzy integral for fusion, the importance gi of the sensor can be subjectively determined by the experts, but also can be given by the speciﬁc data. In the experiment, we assume the classiﬁed success rate of each sensor as a fuzzy measure [10]. Experimental results show that using fuzzy integral is an effective fusion algorithm for fuzzy decision. The improvement of the fusion system performance comes from the mutual compensation between the sensors, which has roughly the same division area for each sensor. Multi-sensor fusion cannot signiﬁcantly improve the system performance. The superiority of data fusion lies in the fact that the sensors can compensate each other. Figure 4.1 depicts a fusion recognition framework based on fuzzy integrals. Assuming that each face corresponds to a pair of face images at the same moment, one is a visible face and the other is an infrared face, which are acquired synchronously and undergo strict registration.

4.7.1

ICA Feature Extraction

Using independent component analysis (ICA) [11] to analyze the face image and then extract the features of visible and infrared face images respectively. In order to reduce the computational complexity, the signal needs to be dimensionally reduced before performing ICA processing. Therefore, the use of ICA means that the PCA dimension reduction must be performed before. Assuming that the vector representation of face image is Γ, the low-dimensional feature vector y can be solved by PCA, and then the ICA feature representation vector z of face image can be obtained by using the obtained conversion matrix F, as shown in the following formula: y ¼ U T ðΓ ΨÞ ¼ Fz

ð4:22Þ

where U is the Eigen face space constructed from the face database and Ψ is the average of the face database.

158

4.7.2

4

Decision-Level Image Fusion

SVM Classiﬁcation

The vector z output from the ICA is entered into the SVMs for classiﬁcation. Before using SVMs, we ﬁrst need to train the data in the following steps: proportionally adjusting, selecting kernel functions, ﬁnding the best parameter values using crossvalidation, and training SVMs. Of course, in the use of SVMs, the test data should also be with the same method of proportional adjustment. Here, the kernel function chooses radial basis function (RBF). 2 K xi , x j ¼ exp γ xi x j ,

γ>0

ð4:23Þ

Thus, there are two parameters for SVMs: C and γ. Each attribute of training data and test data is proportionally adjusted to a range of [1, +1]. The best parameter values can be found by cross-validation: C ¼ 1, γ ¼ 0.01. In this work, the sigmoid model is used to complete the mapping from the standard output of SVMs to the identiﬁcation of matching degrees [12].

4.7.3

Decision Fusion with Fuzzy Integral

In the fusion recognition, assume that x1 represents a visible light sensor and x2 represents an infrared sensor. The fuzzy density gi ¼ g({xi}) is obtained from the statistical analysis of the recognition rate of the sensor xi. In this way, fuzzy integrals Fk can be expressed as follows: Fk ¼

max ð min ðSkV , g1 Þ, SkI Þ max ð min ðSkI , g2 Þ, SkV Þ

SkV SkI else

ð4:24Þ

where SkV and SkI are the categorical evidence (visible matching degree) obtained from the visible face image and the infrared face image, respectively. If the fuzzy integral is the largest, the test face Z will be classiﬁed as ωj. Z ! ω j,

F j ¼ max F k : k

ð4:25Þ

4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting

4.8 4.8.1

159

Label Fusion for Segmentation Via Patch Based on Local Weighted Voting Introduction

Segmentation is one of the fundamental problems in biomedical image analysis. The traditional approach to segment a given biomedical image involves the manual delineation of the regions of interest (ROI) by a trained expert. However, due to this highly labor-intensive process and its poor reproducibility, it is often desirable to have accurate automatic segmentation techniques. Early segmentation algorithms mainly dealt with tissue classiﬁcation, in which local image intensity contains important information. However, these algorithms cannot guarantee accuracy. Generally, multi-atlas segmentation sometimes needs to select a subset of best atlases for a given target image based on a certain predeﬁned measurement of anatomical similarity. It also consists of the following two steps: ﬁrst, in the registration step, all selected atlases and their corresponding label maps are aligned to the target image. Second, the label fusion step, where the registered label maps from the selected atlases are fused into a consensus label map for the target image by proper algorithms. The label fusion methods based on multi-atlas can effectively solve the problem of segmentation using a priori knowledge without artiﬁcial interference, and ﬁnish the image segmentation of speciﬁc organization automatically with high accuracy. A novel patch-driven level set method for label fusion takes advantage of a probabilistic model and locally weighted voting scheme. First, the patches of the target image and training atlases are extracted. Second, probabilistic models of label fusion are built based on the patch. Bayesian inference is utilized to extend the popular method (local weighted voting). When calculating the label prior, we analyze the label fusion procedure concerning the image background and regard it as an isolated label. The Kronecker delta function is employed as the model of the label prior.

4.8.2

Method

Generally, we assume that similar patches share the same label. That is to say, if we extract patches from the target image and training scans and ﬁnd they share similar image intensity, we will deem they have the same label at the same location of voxels. Based on this assumption, the probabilistic model between the target patch and the training atlas patches is established. Then the labels from the training data are propagated to the target image by the label fusion segmentation algorithm of local weighted voting. For each voxel in the test image, its intensity patch can be taken from the ww w w w w neighborhood. Its patch dictionary can be adaptively built

160

4

Decision-Level Image Fusion

from all N aligned atlases as follows. First, let NnðxÞN nx be the neighborhood of voxel x in the nth atlas, with the neighborhood size as wp wp wp (called patch area here). Then, for each voxel 2n(x), we can obtain its corresponding patch from the nth atlas, i.e., a ww w w w w w w w dimensional column vector. By gathering all these patches from wp wp wp neighborhoods of all N training atlases, we can build a patch dictionary, which contains M training patches, from all N aligned atlases, where M ¼ (wp)3 N. In the label fusion step, local weighted voting strategy in this chapter is used to segment the target image. We build the probabilistic model for intensity prior and label prior and use Bayesian inference to derive the segmentation algorithm. The speciﬁc process of the proposed algorithm is shown in Fig. 4.2. 1. Intensity prior A voxel from a target image with a certain intensity value can be treated as a weight to perform local weighted voting scheme. Intensity prior describes a probability where a voxel belongs to a certain training image patch or a likelihood. We adopt a Gaussian distribution for intensity information between the patch of the target image and the patch dictionary from training subjects as the intensity prior, which can be written as follows: h i 1 1 pm ðI ðxÞ; I m Þ ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ exp 2 ðI ðxÞ I m ðxÞÞ2 2σ 2πσ 2

ð4:26Þ

where σ is the standard deviation of the Gaussian distribution, I(x) and Im(x), m ¼ {1, 2, . . ., M} m 2 {1, 2, . . ., M}, respectively, represent intensity patch of the target image and the mth training patch at location x. 2. Label prior Label prior is a kind of metric to describe a possibility that a voxel at a certain location belongs to a certain label. If only considering these labeled anatomical structures (regions of interest, ROI) as competitors to be involved in label fusion procedure, we will more likely get the speciﬁc label, even if it should be background. Based on a large number of our experiments, the Kronecker delta function proves to be more suitable to be label prior, where the function is 1 if the label from automatic segmentation method at a certain location and manual label in training label maps at the same position are equal, and 0 otherwise. However, the Kronecker delta function is rarely used as label prior to most of the label fusion segmentation methods based on patch and the probabilistic model. Here, we deﬁne other anatomical regions (not ROIs) to be the background label. Namely, while estimating the label prior, the background should be taken into account as an isolated label. Its value is set at 0 so that the background has the same privilege as the other labels. The Kronecker delta function is used for label prior as follows:

patch 1 patch 2

Ă

Ă Ă

Ă

Ă

Ă

Ă

Ă

Ă

patch Mth

Extract Extract Patch Patch

ANTs Registration

Atlas Nth

Label Label Fusion Fusion (Local (Local Weighted Weighted Voting Voting strategy) strategy))

3 patch ((wp) ×2) th

Extract Extract Patch Patch

ANTs Registration

Atlas 2

Probabilistic Probabilistic Model: Model: (Intensity (Intensity prior, prior, Label Lab a el prior) prior)

patch (wp)3 th

Extract Extract Patch Patch

ANTs Registration

Atlas 1

Fig. 4.2 The process of label fusion algorithm via patch based on probabilistic model and local weighted voting. We extract patches from the target image and training scans and then establish the probability model between the target patch and the training atlases patches. Finally, the segmentation result can be obtained through the proposed label fusion algorithm

Target patch

Extract Extract Patch Patch

Target Target Image Image

4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting 161

162

4

Pm ðSðxÞ ¼ l; Lm ðxÞÞ ¼

Decision-Level Image Fusion

1 ðδðSðxÞ ¼ lÞÞ Z m ð xÞ

ð4:27Þ

where S(x) denotes estimation of the label at location x and Lm represents the P training patches label maps l 2 {0, labels of target}. Z m ðxÞ ¼ ℒ l¼0 δðSðxÞ ¼ lÞ PL Z m ðxÞ ¼ l¼0 δðSðxÞ ¼ lÞ is the partition function that makes the probability pm(S(x) ¼ l; Lm) between zero and one at location x. ℒ is the total number of labels including the background label. δ(∙) refers to Kronecker delta. The function is 1 if the variables are equal, and 0 otherwise δðSðxÞ, lÞ ¼

0, 1,

if SðxÞ 6¼ l : if SðxÞ ¼ l

ð4:28Þ

3. Label fusion Through a patch-based process, we may assume that the voxels are independent identical distribution, so that the label on every location can be estimated separately. The label fusion scheme becomes a classical local weighted voting scheme, except for the intensity prior and the minor modiﬁcation of the label prior. The segmentation is formulated as b SðxÞ ¼ arg max pm ðSðxÞjI ðxÞ; fLm , I m gÞ SðxÞ

¼

arg max

M X

l2f0, labels of targetg m¼1

ð4:29Þ pm ðI ðxÞjI m Þpm ðSðxÞ ¼ l; Lm Þ:

The Bayesian condition and marginal distribution are used in equation (4.29). Then, we make the following corrections b Sð x Þ ¼

arg max

M X

l2f0, labels of targetg m¼1

pm ðI ðxÞ; I m Þpm ðSðxÞ ¼ l; Lm Þ

ð4:30Þ

where pm(I(x); Im) serves as weights and label prior values serve as votes. It is well known that the local weighted voting strategy is based on voxel, and the neighborhood of voxel x has no impact on the central point in one patch. So ﬁnally we take the center location label value from the labeled patch as the segmentation result and put it in the location x on the whole label image. In the program, we unfold the 3D patch matrix into a 1D array, and the position (ww w w w w 1)/2 + 1(w w w 1)/2 + 1 corresponds to the center position. It is not difﬁcult to see that when a patch area becomes a 1 1 1 voxel, the proposed label fusion algorithm can become the local weighted voting strategy without the patch.

4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting

4.8.3

163

Experiments and Results

1. Dataset The experiments employ 20 brain MRI scans and corresponding label maps (as illustrated in Fig. 4.3), which were selected from a large dataset. We note that the 20 brain scans were resampled to 1 mm isotropic resolution, skullstripped, bias ﬁeld-corrected, and intensity-normalized with FreeSurfer. The intensity normalization is necessary because it enables us to directly compare image intensities (even across datasets). The scans were then pairwise registered with the software package ANTs. We used the default parameters for the deformation model, the number of iterations (20 30 50), and cost function (neighborhood cross-correlation). The MRI images are of dimensions 256 256 256. In training label maps, each one has been labeled for 45 anatomical regions by experts. But nine anatomical regions of interest in two of the hemispheres we used are white matter (WM), cerebral cortex (CT), lateral ventricle (LV), hippocampus (HP), thalamus (TH), caudate (CA), putamen (PU), pallidum (PA), and amygdala (AM). During experiments, considering both reasonable computational processing time and segmentation accuracy, we use union sets along the ROIs of all training label maps to fetch voxels of ROIs to segment the target image. The remaining 36 anatomical regions will be treated as background in label fusion. 2. Comparison of methods In order to verify our theoretical analysis, massive experiments in the context of multi-atlas label fusion are performed to compare our proposed algorithm with a variety of related methods. We use a volume overlap measure known as the Dice score to quantitatively assess the accuracy of segmentation. The evaluation criterion is of the form

Fig. 4.3 2D slices are shown for visualization. The left is the intensity image of the human brain and the right is the corresponding label map

164

4

DS ¼ 2

V ðSman \ Sauto Þ V ðSman Þ þ V ðSauto Þ

Decision-Level Image Fusion

ð4:31Þ

where Sauto is an automatic segmentation of the target image, Sman refers to manual segmentation, V denotes the region of segmentation. The range of DS is between 0 and 1, with 1 meaning a perfect consistency volume overlap between the two segmentations. In the experiments, all methods were implemented to segment brain MRI scans automatically. With the exception of FreeSurfer and Joint fusion, we all concern the image background and take it as an isolated label while estimating label prior to the Kronecker delta function in the remaining methods. Details are as follows: • FreeSurfer We directly use the FreeSurfer software package (http://www.freesurfer.net/) to segment all anatomical regions of the test images one by one. It does not rely on the preprocessing step of pairwise registration compared with the following label fusion methods. • Joint fusion (JF) In this technique [13], weighted voting is formulated in terms of minimizing the total expectation of labeling error and in which pairwise dependency between atlases is explicitly modeled as the joint probability of two atlases making a segmentation error at a voxel. • Majority voting (MV) Majority voting can be seen as the most likely labeling in a probabilistic model in which the segmentation S(x) is sampled randomly from one of the N atlases. • Local weighted voting (LWV) In this label fusion strategy, it takes advantage of the image intensities of the deformed atlases which can improve segmentation quality. Compared with the proposed algorithm, the difference is that the processing is not at patch level. • Majority voting based on patch (MVP) The method does not consider the intensity of the image, but it computes label prior to patch level which is an improvement of MV. • Local weighted voting based on patch (LWVP) As Sect. 4.8.2 described, the probabilistic models of intensity prior and label prior are established based on the patch. We assume the similar patches share the same label and build a patch dictionary between the target image and training data. Then label fusion will be processed on M training patches instead of N training subjects. We also investigate another representation to deﬁne the label prior term using LogOdds model based on SDM [14]. However, it is a compromise between segmentation accuracy and processing time, which can reduce large running time but will cause lower-accuracy without regard to the background. The Kronecker delta function is used to build label prior and make background analysis in this chapter.

4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting

165

1

0.95

Dice

0.9

0.85

0.8

0.75

0.7

1*1*1

3*3*3 Patch area

5*5*5

Fig. 4.4 Dice scores of LWVP scheme according to the patch size (w ¼ 7) and patch area (wp ¼ 1, 3, 5). The segmentation results are obtained with N ¼ 2 in a leave-one-out way. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers span the extreme data points not considered as outliers (which are marked with red crosses)

3. Parameter optimization The four mentioned label fusion methods (except FreeSurfer and JF) were used to segment the scans in a leave-one-out manner, where the test subject was left out of the atlas set. To determine the optimal parameter σ in the intensity prior probability, we tried many values, from 1 to 10. In LWV and LWVP, by contrast, we found that when σ ¼ 5 the segmentation result is the best. The other parameter setting and optimization are as follows. 4. Impact of the patch size and patch area In MVP and LWVP label fusion schemes, we studied the impact of the patch size and patch area on segmentation accuracy. In one patch, the neighborhood of voxel x has no impact on the central point, because the strategies of local weighted voting and majority voting are both based on voxel. During experiments, the patch size is set as 7 7 7 voxels (namely w ¼ 7). In the LWVP scheme, we set N ¼ 2, and when a patch area of 3 3 3 voxels is chosen (namely wp ¼ 3), segmentation precision is at its best (as illustrated in Fig. 4.4). The optimal patch area seems to reﬂect the complexity of the anatomical structure. As can be seen from Fig. 4.5, when we set N ¼ 19 and wp ¼ 3, the accuracy of segmentation results will likewise be at their best. 5. Impact of the number of training scans N Segmentation accuracy was studied from N ¼ 2 to N ¼ 19. The results of the LWV method are presented in Fig. 4.5 which reports the average Dice score as a

166

4

Decision-Level Image Fusion

1

0.95

Average Dice

0.9

0.85

0.8

0.75

0.7

5

10 15 Number of training scans (N)

20

Fig. 4.5 The average Dice scores for LWV method as a function of the number of training scans. It is obtained with σ ¼ 5, w ¼ 7, and wp ¼ 3 in leave-one-out way and reaches 90.8% when all 19 subjects are used

(a)

(b)

Fig. 4.6 (a) Sagittal slice of a segmentation which only includes 9 ROIs in the left hemisphere using the proposed algorithm. (b) 3D rendering of the segmentation

function of N. As expected, the segmentation accuracy can be improved by increasing the number of selected training subjects. In other experiments, we ﬁnd the rest of methods have the same regularity. So the remaining results here are based on N ¼ 19. 6. Results We segmented the brain MRI images in a leave-one-out cross-validation fashion and by label fusion methods which were described in Sect. 4.2. Figure 4.6 shows the automated segmentation of a scan on the nine ROIs in the left hemisphere

4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting

167

1 0.95 0.9 0.85

Dice

0.8 0.75 0.7 0.65 0.6 0.55 0.5

WM

CT

LV

HP

TH

CA

PU

PA

AM

Fig. 4.7 Dice scores for the three methods: LWV-S (green), LWV-K (blue), LWVP (red). On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers span the extreme data points that are not considered outliers (which are marked with red crosses)

using LWVP. We ﬁrst show the sagittal slice of the result in Fig. 4.6a. 3D rendering of the segmentation is vividly shown in Fig. 4.6b. Some anatomical regions in relation to Alzheimer’s disease, especially hippocampus, can be seen more clearly. Kronecker delta in LWV (represented for LWV-K) and LWVP. In this process, the image background is made analysis and taken as an isolated label in the latter two methods. Figure 4.7 shows box plots for the Dice scores for the nine ROIs in the left hemisphere. Our method based on patch produced more accurate results. The mean Dice scores of LWV-K are almost closed to the one in LWV-S and are even better in WM structure. But the LWV-K takes less running time than the LWV-S in program processing. Furthermore, it proves that making the background analysis is necessary, and the Kronecker delta function is more suitable for label prior. We also studied other methods to segment brain MRI scans, such as FreeSurfer, JF, MV, and MVP. FreeSurfer segments whole-brain anatomical regions, and the remaining label fusion algorithms only yield segmentation of ROIs. Table 4.1 and Fig. 4.8 report the average Dice scores achieved by all algorithms for segmentation accuracy of the nine ROIs in the left hemisphere. Establishing label prior, we adopt SDM in LWV (represented for LWV-S), and to evaluate the performance of the proposed method, we measured the Dice scores for the manual and automated segmentation for each ROI. LWVP, a novel patch-driven level set method, yields the most accurate segmentation in all ROIs but the pallidum (PA). LWV gets better results than other methods including FreeSurfer which is a state-of-the-art whole-brain segmentation tool. The segmentation of LWV and LWVP clearly beneﬁts from the

168

4

Decision-Level Image Fusion

Table 4.1 Average Dice scores for each ROI corresponding to the ﬁve methods FS JF MV LWV MVP LWVP

WM 0.943 0.940 0.922 0.952 0.902 0.958

CT 0.890 0.863 0.827 0.892 0.806 0.906

LV 0.935 0.921 0.903 0.935 0.899 0.941

HP 0.864 0.874 0.840 0.878 0.835 0.883

TH 0.815 0.924 0.920 0.925 0.916 0.925

CA 0.904 0.896 0.871 0.909 0.863 0.915

PU 0.898 0.910 0.901 0.913 0.898 0.916

PA 0.760 0.874 0.877 0.878 0.873 0.873

AM 0.825 0.851 0.840 0.851 0.838 0.856

“FS” represents FreeSurfer. Boldface font indicates best scores 1 0.95 0.9 0.85

Dice

0.8 0.75 0.7 0.65 0.6 FS

0.55 0.5

WM

CT

LV

HP

JF TH

MV CA

MVP PU

LWV PA

LWVP AM

Fig. 4.8 Dice scores corresponding to the mentioned methods: FreeSurfer (dark green), JF (purple), MV (light green), MVP (black), LWV (blue), LWVP (red). On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers span the extreme data points that are not considered outliers (which are marked with red crosses)

additional use of local intensity information. The difference between FreeSurfer and the above other methods is that pairwise registration of the latter is based on ANTs which is a maturity algorithm and can improve segmentation accuracy. For this reason, in the TH, PU, PA, and AM anatomical ROIs, the mean Dice scores for FreeSurfer are lower than other methods. Joint fusion, a label fusion strategy with Kronecker delta function as label prior, yields better segmentation accuracy than MV and MVP, but worse than LWV and LWVP. Thus the experimental results demonstrate that it is necessary to take the background as an isolated label when the label prior is built. The superiority of the proposed algorithm based on the patch is apparent. Compared with FreeSurfer, joint fusion obtains higher average Dice scores in HP, TH, PU, PA, and AM. In addition, we note that the results of MV and MVP are worse than those in the weighted label fusion methods. The reason is that the majority of voting strategies ignore the image intensity which contains a signiﬁcant amount of the

References

169

relevant information. Also, we ﬁnd MVP performs slightly worse than MV. This might be due to the conﬂict between patch level and the discard of image intensity information in MVP, which has an inﬂuence on the accuracy of the segmentation.

4.9

Summary

Decision-level fusion is a high-level information fusion, which is a hot spot in the ﬁeld of information fusion. High-level fusion than other low-level fusion is more perfect, better real-time, and can better overcome the shortcomings of each sensor, but the disadvantage is the loss of information up. Before the fusion, each sensor has completed the goal of the decision-making, and then according to certain fusion criteria and the credibility of each decision-making to make the best decision. This chapter mainly introduced decision-level information fusion algorithms, including voting, Bayesian inference, evidence theory, fuzzy integral, and other speciﬁc methods. By considering the SAR and FLIR images as an example, the algorithm process and implementation of decision-level fusion are introduced.

References 1. A.H. Gunatilaka, B.A. Baertlein, Feature-level and decision-level fusion of noncoincidently sampled sensors for land mine detection. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 577–589 (2001) 2. J.A. Benediktsson, I. Kanellopoulos, Classiﬁcation of multisource and hyperspectral data based on decision fusion. IEEE Trans. Geosci. Remote Sens. 37(3), 1367–1377 (1999) 3. G. Kiremidjian, Issues in image registration. IEEE Proc. SPIE Image Understand. Man Mach. Interface 758, 80–87 (1987) 4. G. Liu, Study of Multisensor Image Fusion Methods. PhD Thesis, Xidian University 5. R.C. Gonzalez, P. Wintz, Digital Image Processing (Addison-Wesley, Reading, MA, 1977) 6. M.E. Ulug, C.L. McCullough, Feature and data level fusion of infrared and visual images. Proc. SPIE 3719, 312–318 (1999) 7. J. Kittler, Multi-Sensor Integration and Decision Level Fusion (The Institution of Electrical Engineers, London) 8. B. Jeon, D.A. Landgrebe, Decision fusion approach for multitemporal classiﬁcation. IEEE Trans. Geosci. Remote Sens. 37(3), 1227–1233 (1999) 9. L.O. Jimenez, A. Morales-Morell, A. Cresus, Classiﬁcation of hyperdimensional data based on feature and decision fusion approaches using projection pursuit, majority voting, and neural networks. IEEE Trans. Geosci. Remote Sens. 37(3), 1360–1366 (1999) 10. M. Petrakos, J.A. Benediktsson, I. Kanellopoulos, The effect of classiﬁer agreement on the accuracy of the combined classiﬁer in decision level fusion. IEEE Trans. Geosci. Remote Sens. 39(11), 2539–2546 (2001) 11. W.K. Pratt, Digital Image Processing (Wiley, New York, 1978) 12. E.L. Hall, Computer Image Processing and Recognition (Academic Press, New York, 1979)

170

4

Decision-Level Image Fusion

13. H. Wang, J. Suh, S. Das, J. Pluta, C. Craige, P. Yushkevich, Multi-atlas segmentation with joint label fusion. IEEE Trans. Pattern Anal. Mach. Intell. 35, 611–623 (2012). https://doi.org/10. 1109/TPAMI.2012.143 14. M.R. Sabuncu, B.T.T. Yeo, K. Van Leemput, B. Fischl, P. Golland, A generative model for image segmentation based on label fusion. IEEE Trans. Med. Imaging 29(10), 1714–1729 (2010)

Chapter 5

Multi-sensor Dynamic Image Fusion

Abstract Image fusion can be performed on dynamic images along with the static images. Here, static is referred to images captured from the static platform. For example, multi-focus and remote sensing image fusion assumes source images as static. However, in real-time surveillance applications, dynamic image sequences have to be fused. Here, dynamic refers to images captured from the moving platform. In this chapter, a detailed discussion on dynamic image fusion is presented. In addition, new dynamic image fusion algorithms are also introduced. They are an image fusion algorithm based on region target detection, an improved dynamic image fusion algorithm, a new FOD recognition algorithm based on multi-source information fusion, a multi-cue mean-shift target tracking approach based on fuzziﬁed region dynamic image fusion, and a new tracking approach based on tracking-before-fusion.

5.1

Introduction

Previous chapters have studied the fusion of multi-source still images. However, in practical applications such as target detection and identiﬁcation in safety monitoring and battleﬁeld environments, it is often necessary to fuse moving images (sequence images) from multiple sensors. This chapter studies the fusion problem of multisensor dynamic images based on multi-scale decomposition method and proposes a dynamic image fusion system, in which the sequence image super-resolution restoration and moving object detection are introduced into the multi-sensor dynamic image fusion. For multi-focus images and remote sensing multi-source images, they are not included in the scope of multi-source dynamic image fusion because they are usually static images in practical applications. The fusion of static images has been widely studied, but there are few researches on the dynamic image fusion algorithms. If the sequence images obtained by multisensor are fused directly by the static image fusion method frame by frame, the motion information of the sequence images on the time axis cannot be used to guide the image fusion process. Oliver R. et al. proposed a multi-sensor moving image fusion algorithm based on the discrete wavelet frame transformation, but the © Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 G. Xiao et al., Image Fusion, https://doi.org/10.1007/978-981-15-4867-3_5

171

172

5 Multi-sensor Dynamic Image Fusion

algorithm still processes the sequence images obtained by the multi-sensor according to a static image fusion method. However, utilizing the motion information in the multi-sensor sequence image is still a difﬁcult task. In this chapter, we make a bold attempt to research the multi-sensor dynamic image fusion and propose a multi-sensor dynamic image fusion system based on multi-scale decomposition. In this system, we apply the sequence image superresolution restoration and moving target detection theory to multi-sensor dynamic Image fusion, the better use of sequence images in the time axis of motion information.

5.2

Multi-sensor Dynamic Image Fusion System

The fusion system of multi-sensor dynamic images proposed in this chapter is shown in Fig. 5.1. There are four processes in the fusion of sequence images of two sensors: sequential image fusion (sequence image super-resolution restoration), moving object detection, image multi-scale decomposition, and sequence fusion between images. The following describes each process in detail. First, in-sequence image fusion (sequence image super-resolution restoration): sequence image super-resolution restoration (or reconstruction) refers to recovering (or reconstructing) a high-resolution sequence image from a low-resolution sequence image. Many imaging systems, such as forward-looking infrared imagers and visible light cameras, are limited by the inherent array density of sensor arrays in rapidly acquiring wide-ﬁeld images. The resolution of the images cannot be very high, and in practical applications, image transmission speed and image storage capacity and other factors also limit the resolution of the image enhancement; at the same time, the imaging process of the under sampling (continuous image discretization) effect in turn causes the image spectral aliasing, the resulting image quality degradation. If

Multi-scale decomposition A Sensor sequence

Super resolution

B Sensor sequence

Super resolution

Moving target detection

Multi-scale decomposition

Fig. 5.1 Multi-sensor dynamic image fusion system

Fusion of target areas

Non-target area

The fused sequence images

5.2 Multi-sensor Dynamic Image Fusion System

173

you increase the resolution of the sensor array to increase the image resolution, the cost can be very expensive or difﬁcult to achieve. The super-resolution restoration technique can estimate high-resolution sequence images from low-resolution sequence images while eliminating additive noise and blurring caused by limited sensor array densities and point spread functions of optical imaging processes. The two images of the sensor sequence are super-resolution restoration, and two enhanced sequence images can be obtained. The following process is the resolution of these two sequences of images for processing. The second step, the moving target detection: The purpose of moving target detection is to ﬁnd out the moving target area in the two sensor sequence images. The target area often contains important information that people need, such as targets for illegal intrusion detection in surveillance, tanks, and ﬁghters on the battleﬁeld. In image fusion, in order to preserve the complete information of the target area, we need to adopt a special fusion strategy for the target area. If the target area is detected at a certain position in the A sensor sequence image and the target area is not detected at the position in the B sensor sequence image, in order to preserve the integrity of the target information, during the fusion process, the A sensor sequence image information in the target area should be retained, and vice versa. If the target is detected at the same position in both sensor sequence images, in order to preserve the integrity of the target information, the similarity matching needs to be performed on the gray information of the two sequence images in the target area: if the similarity is higher than a certain threshold, a fusion strategy of weighted average is adopted for the fusion of the target area; otherwise, the target area with more detail information is selected as the fusion target area. The speciﬁc integration process sees the following two steps. The third step, the image multi-scale decomposition: The purpose of image multiscale decomposition is to transform the images of the two images of the sequence of images of the two sensors to obtain their respective multi-resolution representation. In this chapter, the direction pyramid frame transform or direction of the inseparable wavelet transform framework is adopted for multi-scale decomposition. This step is the basis of the image fusion between sequences in the next step. The fourth step, the image fusion between sequences: Image fusion between sequences is to fuse the coefﬁcients of multi-scale decomposition of each frame of two sensor sequence images according to the target area information detected by moving target. The fusion processing of the target area and the fusion of the non-target area are separately performed: when the target area is fused, if the target area is detected in the sequence image of the A sensor and the target area is in the B sensor sequence image when no multi-scale decomposition coefﬁcient of the two sensor sequence images is fused, the multi-scale decomposition coefﬁcient of the A sensor sequence image is selected as the fused coefﬁcient, and vice versa. If the target area is fused between two sensor sequence images, the similarity matching is ﬁrst performed on the target area when the similarity is greater than a certain threshold, then the multi-scale decomposition coefﬁcients of the two sensor sequence images in the target area are weighted averagely; otherwise, the fused coefﬁcient is the multi-scale decomposition coefﬁcients of the two sensor sequence

174

5 Multi-sensor Dynamic Image Fusion

image with large energy in the target area. For the fusion of non-target regions, in order to improve the efﬁciency of the algorithm and the fusion of high-frequency coefﬁcients after the multi-scale decomposition, a window-based fusion strategy is adopted to obtain the coefﬁcient similarity in the local window. If the similarity is large, weighted average processing; Otherwise, according to the local window of the energy coefﬁcient of the coefﬁcient of choice. For the fusion of low-frequency coefﬁcients after multi-scale decomposition, an average processing method is adopted.

5.3

Improved Dynamic Image Fusion Scheme for Infrared and Visible Sequence Based on Image Fusion System

Improved dynamic image fusion scheme for infrared and visible sequence based on image fusion system is introduced in this section. Target detection technique is employed to segment the source images into target and background regions as the ﬁrst step of the improved dynamic image fusion for infrared and visible sequence. Different fusion rules are adopted respectively in target and background regions. Two quantitative performance indexes are feedback to determine optimum weight coefﬁcients for fusion rules. This feedback performance information improved the fusion result and effectiveness in target regions. Fusion experiments on real-world image sequences indicate that the improved method is effective and efﬁcient, which achieves better performance than those fusion methods without feedback.

5.3.1

Introduction

The techniques of multi-source image fusion originated in the military ﬁelds and their impetuses also came from military ﬁelds. The battleﬁeld detecting technology, based on the pivotal content of multi-source image fusion, has become one of the most important military advanced technologies, including target detection, track, and recognition and scene awareness. Image fusion is a specialization of the more general topic of data fusion, dealing with image and video data [1]. It is the process by which multi-modality sensor imageries from same scene are intelligently combined into single view of the scene with extended information content. Image fusion has important applications in the military, medical imaging, remote sensing, and security and surveillance ﬁelds. The beneﬁts of image fusion include improved spatial awareness, increased accuracy in target detection and recognition, reduced operator workload, and increased system reliability [2]. Image fusion processing must satisfy the following requirements, as described in [3]: Preserve (as far as possible) all salient information in the source images; do not

5.3 Improved Dynamic Image Fusion Scheme for Infrared and. . .

175

introduce any artifacts or inconsistencies; be shift invariant; be temporal stable and consistent. The last two points are especially important in dynamic image fusion (or image sequences fusion) as human visual system is highly sensitive to moving artifacts introduced by the shift-dependent fusion process [3]. Fusion process can be performed at different levels of information representation, sorted in ascending order of abstraction: signal, pixel, feature, and symbol levels [4]. From the simplest weighted pixel averaging to more complicated multiresolution (MR) method (including pyramidal schemes and wavelet schemes), pixel-based fusion methods were well researched [3, 5–10]. Recently, feature-level fusion with region-based fusion scheme has been reported both qualitative and quantitative improvements over the pixel-based method as more intelligent semantic fusion rules can be considered based on actual features [11–16]. An improved dynamic image fusion scheme for infrared and visible sequence based on feedback optimum weight coefﬁcients is introduced in this section.

5.3.2

Generic Pixel-Based Image Fusion Scheme

The generic pixel-based fusion scheme is brieﬂy reviewed, and more details can be found in [3, 5–16]. Figure 5.2 illustrates the generic wavelet fusion scheme, which can be divided into three steps: First, all source images are decomposed by using multi-resolution method, which can be the pyramid transform (PT) [5, 6], discrete wavelet transform (DWT) [7–9], discrete wavelet frames (DWF) [3], or dual-tree complex wavelet transform [10]. Then the decomposition coefﬁcients are fused by applying a fusion rule, which can be a point-based maximum selection (MS) rule or more sophisticated area-based rules [6, 7]. Finally, the fused image is reconstructed by using the corresponding inverse transform on the fused coefﬁcients. Source sensors for most image fusion systems have different ﬁelds of view, resolutions, lens distortions, and frame rates. It is vital to align the input images properly with each other, both spatially and temporally, a problem addressed by image registration [13]. To minimize this problem, the imaging sensors in many practical systems are rigidly mounted side-by-side and physically aligned as closely as possible. However, in more complex systems where sensors move relative to each other, the registration of input images becomes a very challenging problem, to some extent, larger than the fusion algorithm itself. Most applications of a fusion scheme are interested in features within the image, not in the actual pixels. Therefore, it seems reasonable to incorporate feature Fig. 5.2 Generic pixelbased image fusion scheme

Image A

PreProcessing

Image B

Fused Image

MR Transform Inverse Transform

Fusion

176

5 Multi-sensor Dynamic Image Fusion

information into the fusion process [14]. A number of region-based fusion schemes have been proposed [11–16]. However, most of the region-based schemes are designed for still image fusion, and every frame of each source sequence is processed individually in image sequence case. These methods do not take full advantage of the wealth of inter-frame information within source sequences.

5.3.3

Improved Dynamic Image Fusion Scheme Based on Region-Based Target

For pixel-based approaches, the MR decomposition coefﬁcient is treated independently (MS rule) or ﬁltered by a small ﬁxed window (area-based rule). However, most of the applications of a fusion scheme are interested in features within the image, not in the actual pixels. Therefore, it seems reasonable to incorporate feature information into the fusion process [14]. A number of region-based fusion schemes have been proposed [11–16]. However, most of the region-based schemes are designed for still image fusion, and every frame of each source sequence is processed individually in image sequence case. These methods do not take full advantage of the wealth of inter-frame information within source sequences. The novel region-based fusion scheme proposed for fusion of visible and infrared (IR) image sequences is shown in Fig. 5.3, where the target detection (TD) techniques are introduced to segment target regions intelligently. For convenience, we assume both source sequences are registered well before fusion. First, both the visible and IR sequences are enhanced by using pre-processing operator. Then each frame of the source sequences is transformed by using an MR method (where the LR DWT is adopted, see Sect. 3.1). Simultaneously, the frames are segmented into object and background regions by using a TD method. Different fusion rules are adopted in target and background regions. Finally, the fused coefﬁcients belonging to each region are combined, and fused frames are reconstructed by using the corresponding inverse transform.

IR Sequence

MR Transform Preprocessi ng

Visible Sequence Fused Sequence

Target Region Fusion

TD MR Transform

Background Region Fusion

Inverse Transform

Fig. 5.3 Region-based IR and visible dynamic image fusion scheme

5.3 Improved Dynamic Image Fusion Scheme for Infrared and. . .

5.3.3.1

177

The Limited Redundancy Discrete Wavelet Transform

It is well known that the standard DWT produces a shift-dependent [17, 18] signal representation due to down-sampling operations in every sub-band, which results in a shift-dependent fusion scheme, as described by Rockinger [3]. To overcome the problem, Rockinger presents a perfect shift-invariant wavelet fusion scheme by using DWF. However, this method is much computationally expensive due to high redundancy (2m n:1 for m-D and n-level decomposition) of the representation. Bull et al. [10] further develop the wavelet fusion method by introducing DT CWT, which provides approximately shift invariance by introducing limited redundancy (2m:1 for m-D and any level decomposition). However, the DT CWT employs two ﬁlters banks, which must be designed rigorously to achieve appropriate delays while satisfying perfect reconstruction (PR) conditions. Moreover, the decomposition coefﬁcients from every tree should be regarded as the real or imaginary part of the complex, increasing difﬁculty of subsequent processing in the fusion rule. A new implementation of the DWT is introduced in this chapter, which provides approximate shift invariance and nearly perfect reconstruction while preserving the properties of DWT: computational efﬁciency and easy implementation. Figure 5.4 shows decomposition and reconstruction scheme for the new transform in cascading form, which can be extended easily to 2D by separable ﬁltering along rows and then columns. For the limited redundancy (not more than 3:1 for 1D) of the new transform, we use LR DWT (the limitedly redundant discrete wavelet transform) here to distinguish it from DWT and DWF.

5.3.3.2

Region Segmentation Algorithm

The target detection (TD) operator aims for segmenting both source frames into target regions, in which the signiﬁcant information is included such as moving human and vehicle, and background regions. A novel target detection method is proposed in this chapter based on the characteristics of IR imaging. At ﬁrst, a regionmerging method [19] is adopted to segment the initial IR frame. It is easy to ﬁnd the Fig. 5.4 The LR DWT and inverse transform

178

5 Multi-sensor Dynamic Image Fusion

target regions, which have high contrast with the neighboring background, in the segmented IR frame. A conﬁdence measure [20] for each candidate region is computed. It is very inefﬁcient to compute the conﬁdence measure for each candidate within every frame. Therefore, a model-matching method is adopted to ﬁnd the target regions in the subsequent frames. A target model is obtained by using intensity information of the target region in pre-frame. Not the whole but a small region in post-frame which corresponds with (and is little larger than) the target region in pre-frame is matched. The initial detection operator based on segmentation and conﬁdence measure will be repeated in case no target being detected in certain successive frames. The target detection in the visible sequence is similar to the IR sequence.

5.3.3.3

Fusion Rules in the Target Region

To preserve the full information as far as possible in the target region, a special fusion rule should be employed 1 in2 the object region. Assume that target detection M gives M target maps: T ¼ t , t , , t IR IR IR IR in IR frame and N target region maps: T V ¼ t 1V , t 2V , , t NV in the corresponding visible frame. The target map is downsampled by 2m (according to the resolution of decomposition coefﬁcients) to give a decimated target map at each level. The target maps in both the source frames are analyzed jointly TJ ¼ TIR [ TV. The frame is segmented into three sets: single, overlapped target region sets, and background region set. Overlapped target regions are deﬁned as TO ¼ TIR \ TV. Single target regions are all the target regions where no overlap T S ¼ T J [ T O . Clearly, there is TJ ¼ TS [ TO. Background regions are deﬁned as B ¼ T J . In the single-target regions, fusion rule can be written as. cf ðx, yÞ ¼

cir ðx, yÞ, if ðx, yÞ 2 T IR cv ðx, yÞ, if ðx, yÞ 2 T V

ð5:1Þ

In a connected overlapped target region t 2 TO, a similarity measure between two sources is deﬁned as 2 M ðt Þ ¼ P ðx, yÞ2t

P ðx, yÞ2t

I ir ðx, yÞ I v ðx, yÞ

½I ir ðx, yÞ2 þ

P ðx, yÞ2t

½I v ðx, yÞ2

ð5:2Þ

where Iir and Iv denote IR and visible frames, respectively. Then an energy index of the coefﬁcients within the overlapped region is computed respectively in IR and visible frames

5.3 Improved Dynamic Image Fusion Scheme for Infrared and. . .

Si ðt Þ ¼

X

179

ci ðx, yÞ2

ð5:3Þ

ðx, yÞ2t

where t 2 TO and i ¼ ir, v mean the IR and visible frames, respectively. A threshold of similarity α is introduced, where α 2 [0, 1] and normally α ¼ 0.85 is appropriate. In case M(t) < α, the fusion rule in overlapped target region can be written as cf ðx, yÞ ¼

cir ðx, yÞ, if Sir ðt Þ Sv ðt Þ cv ðx, yÞ,

ð5:4Þ

otherwise

In case M(t) α, a weight average method is adopted cf ðx, yÞ ¼

ϖ max ðt Þ cir ðx, yÞ þ ϖ min ðt Þ cv ðx, yÞ, if Sir ðt Þ Sv ðt Þ ϖ min ðt Þ cir ðx, yÞ þ ϖ max ðt Þ cv ðx, yÞ, if Sir ðt Þ < Sv ðt Þ

ð5:5Þ

where the weights ϖ min(t) and ϖ max(t) can be obtained 1 M ðt Þ 1 1 1α 2 : ϖ max ðt Þ ¼ 1 ϖ min ðt Þ 8