141 85 10MB
English Pages 208 Year 2023
Advanced and Intelligent Manufacturing in China
Chuan He Changhua Hu
Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems
Advanced and Intelligent Manufacturing in China Series Editor Jie Chen, Tongji University, Shanghai, Shanghai, China
This is a set of high-level and original academic monographs. This series focuses on the two fields of intelligent manufacturing and equipment, control and information technology, covering a range of core technologies such as Internet of Things, 3D printing, robotics, intelligent equipment, and epitomizing the achievements of technological development in China’s manufacturing sector. With Prof. Jie Chen, a member of the Chinese Academy of Engineering and a control engineering expert in China, as the Editorial in Chief, this series is organized and written by more than 30 young experts and scholars from more than 10 universities and institutes. It typically embodies the technological development achievements of China’s manufacturing industry. It will promote the research and development and innovation of advanced intelligent manufacturing technologies, and promote the technological transformation and upgrading of the equipment manufacturing industry.
Chuan He · Changhua Hu
Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems
Chuan He High-Tech Institute of Xi’an Xi’an, Shaanxi, China
Changhua Hu High-Tech Institute of Xi’an Xi’an, Shaanxi, China
ISSN 2731-5983 ISSN 2731-5991 (electronic) Advanced and Intelligent Manufacturing in China ISBN 978-981-99-3749-3 ISBN 978-981-99-3750-9 (eBook) https://doi.org/10.1007/978-981-99-3750-9 Jointly published with Chemical Industry Press The print edition is not for sale in China (Mainland). Customers from China (Mainland) please order the print book from: Chemical Industry Press. ISBN of the Co-Publisher’s edition: 978-7-122-31507-6 The translation was done with the help of artificial intelligence (machine translation by the service DeepL.com). A subsequent human revision was done primarily in terms of content. © Chemical Industry Press 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publishers, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publishers nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publishers remain neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
Due to the influence factors of equipment, environment and human, degradation inevitably occurs in the process of image acquisition, transformation, and transmission. Significant image degradation will seriously affect the subsequent application. To improve the image quality, it is necessary to restore the degraded image. Image compressed sensing realizes the synchronous process of low-speed image sampling and compression. Under certain conditions, the original image can be accurately reconstructed from the sampled data. If the acquisition of degraded images or compressive sampling data is regarded as a positive problem, image restoration problems, such as image denoising, deblurring, inpainting, super-resolution, and compressed sensing reconstruction, belong to the same kind of image inverse problem. They all need to recover the original signal as accurately as possible from the degraded results or incomplete observations. This kind of problems not only have important theoretical research value, but also have a wide range of engineering application background. The biggest challenge to solve this kind of inverse problem is the highly ill-conditioned nature of the degenerate process, that is inverse operation is highly sensitive to noise, and even the inverse operation does not exist. The key to successful image restoration is to construct a regularization model that reasonably reflects the prior information of the image and to design an accurate, concise, and fast algorithm to solve the problem. In recent years, the operator splitting method emerging in the field of signal processing can decompose a nonsmooth image restoration optimization problem into several sub-problems that are easy to solve. At the same time, the arrival of the image big data era has put forward higher requirements for the quality and efficiency of image restoration. The development of highly automated parallel operator splitting methods suitable for large-scale distributed computing has become a fundamental problem to be solved in the field of image restoration in the era of big data. This book summarizes the author’s research work in the field of image restoration and systematically summarizes the academic research results published in authoritative journals and magazines in recent years. This book focuses on several problems in image restoration, such as adaptive regularization parameter estimation, compound regularization strategy, and parallel solution of objective function. Although the v
vi
Preface
methods studied in the book take image denoising, deblurring, inpainting, and compressed sensing reconstruction as examples, they can also be extended to image processing problems such as image segmentation, hyperspectral decomposition, and image compression. This book is divided into six chapters, and its main contents can be summarized as follows: The first chapter is the introduction. On the basis of describing the research background, the mechanism and modeling method of image degradation are briefly described. The research status and development trend of regularization methods and nonlinear objective function algorithms for image restoration are discussed in detail. In the second chapter, some basic theories such as convolution, discrete Fourier transform, and fixed point theory in Hilbert space are introduced. In Chap. 3, taking image deblurring as an example, the ill-conditioned root causes and influencing factors of image degradation are revealed from the perspective of eigenvalue analysis and image inverse filter. The necessity of image restoration regularization and the effectiveness of total generalized variation and shearlet regularization in maintaining image details are demonstrated. In Chap. 4, the adaptive estimation of regularization parameter in the objective function of image restoration is studied, which balances the prior regularization term and the fidelity term of observation data. A fast algorithm is proposed to estimate the regularization parameter and restore the image at the same time. The adaptive estimation of regularization parameter is an important basis for the automatic image restoration. Experimental results show that, compared with some famous algorithms, the proposed algorithm has simpler structure, more accurate parameter estimation, and faster convergence speed. In Chap. 5, a parallel alternating direction method of multipliers for compound regularized image restoration is proposed. Its convergence analysis is proved, and its minimum convergence rate is established. A single type of regularization is easy to make the image restoration results focus on one property and suppress other properties, while the compound regularization combining multiple image prior models makes the objective function difficult to solve. Experiments show that the proposed method provides a feasible way to solve the problem of compound regularized image restoration, and it is suitable for distributed computing. As inverse problems, most image restoration algorithms involve linear inverse operator. When processing multichannel (such as multispectral) images, the inverse operation is not easy to be solved, which will significantly affect the computational efficiency of the algorithm. In Chap. 6, a parallel primal-dual splitting method is proposed to eliminating the inverse operator in image restoration methods. Its convergence is proved, its convergence conditions are given, and its convergence rate is established. The inclusion of the algorithm for parallel linearized alternating method of multipliers is proved, and it is extended to optimization problems with Lipschitz continuous gradient terms. Experimental results show that, compared with the parallel alternating direction method of multipliers, this method has higher one-step efficiency under additional convergence conditions and is more suitable for multichannel image processing.
Preface
vii
In the process of carrying out relevant research work and writing this book, the author has the honor to receive help and support of the following people: Prof. Licheng Jiao of Xidian University, Research Fellow Weiming Hu of National Laboratory of Pattern Recognition of the Chinese Academy of Sciences, Prof. Xuelong Li of Northwestern Polytechnical University, Prof. Nong Sang of Huazhong University of Science and Technology, and Prof. Xiangyu Kong and Prof. Xiaosheng Si of the High-tech Institute of Xi’an. In the process of writing this book, we have mainly referred to Prof. Mouyan Zou’s works on deconvolution and signal restoration, as well as Prof. Heinz H. Bauschke and Prof. Patrick L. Combettes’ works on convex analysis and monotone operator theory in Hilbert space. In order to ensure the systematicness and integrity of knowledge structure, we have incorporated some of their research results as mathematical foundations into Chap. 2 of the book. Hereby we express our heartfelt thanks to them. Our research was funded by the Natural Science Foundation of Shaanxi Province under Grant 2021KJXX-22, the Post Doctoral Science Foundation of China under Grant 2019M663635, the Special Support Plan for High-level Talents in Shaanxi Province under Grant TZ0328, and the National Natural Science Foundation of China under Grants 61773389 and 61025014. Thanks for the support and help of Hui Song, Editor of Chemical Industry Press. Due to the author’s level of knowledge, omissions and inadequacies in the book are inevitable. It is respectful to receive your criticization and correction. Xi’an, Shaanxi, China
Chuan He
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Implications for Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Regularization Methods for Image Restoration . . . . . . . . . . . . . . . . . . 1.2.1 Image Degradation Mechanisms and Degradation Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Regularization Methods Based on Variational Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Regularization Methods Based on Wavelet Frame Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Regularization Methods Based on Sparse Representation of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.5 Random Field-Based Regularization Methods . . . . . . . . . . . . 1.3 Nonlinear Iterative Algorithm for Image Restoration . . . . . . . . . . . . . 1.3.1 Traditional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Operator Splitting Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Convergence Analysis of the Splitting Algorithms . . . . . . . . 1.3.4 Adaptive Estimation of the Regularization Parameter . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 3
10 12 13 14 16 25 26 28
2 Mathematical Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Summarize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 One-Dimensional Discrete Convolution . . . . . . . . . . . . . . . . . 2.2.2 Two-Dimensional Discrete Convolution . . . . . . . . . . . . . . . . . 2.3 Fourier Transform and Discrete Fourier Transform . . . . . . . . . . . . . . 2.4 Theory and Methods of Fixed-Points in Hilbert Spaces . . . . . . . . . . . 2.4.1 Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Non-expansive Operators with Fixed-Point Iterations . . . . . .
35 35 35 35 38 40 44 44 46
4 7 9
ix
x
Contents
2.4.3 Maximally Monotone Operator . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Solution of the l 1 -ball Projection Problem . . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Ill-Poseness of Imaging Inverse Problems and Regularization for Detail Preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Summarize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Typical Types of Image Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 The Ill-Posed Nature of Image Deblurring . . . . . . . . . . . . . . . . . . . . . 3.3.1 Discretization of Convolution Equations and Ill-Posed Analysis of Blur Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Image Restoration Based on Inverse Filter . . . . . . . . . . . . . . . 3.4 Tikhonov Image Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Tikhonov Regularization Idea . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Wiener Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Constrained Least Square Filtering . . . . . . . . . . . . . . . . . . . . . 3.5 Detail-Preserving Regularization for Image . . . . . . . . . . . . . . . . . . . . 3.5.1 Total Generalized Variational Regularization Model . . . . . . . 3.5.2 Shearlet Regularization Model . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Image Quality Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48 49 50 51 51 52 54 54 59 61 61 62 62 63 64 68 70 72
4 Fast Parameter Estimation in TV-Based Image Restoration . . . . . . . . 73 4.1 Summarize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2 Overview of Adaptive Parameter Estimation Methods in TV Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 Fast Adaptive Parameter Estimation Based on ADMM and Discrepancy Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3.1 Augmented Lagrangian Model for TV Regularized Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.3.2 Algorithm Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3.3 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.3.4 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.4 Extension of Fast Adaptive Parameter Estimation Algorithm . . . . . . 89 4.4.1 Equivalent Splitting Bregman Algorithm . . . . . . . . . . . . . . . . 89 4.4.2 Interval Constrained TV Image Restoration with Fast Adaptive Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . 91 4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.5.1 Experiment 1: Implications for Significance Regularization Parameter Estimation . . . . . . . . . . . . . . . . . . . . 94 4.5.2 Experiment 2—Comparison with Other Adaptive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.5.3 Experiment 3—Comparison of Denoising Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Contents
5 Parallel Alternating Derection Method of Multipliers with Application to Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Summarize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Parallel Alternating Direction Method of Multipliers . . . . . . . . . . . . 5.2.1 A General Description of the Regularized Image Restoration Objective Function . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Augmented Lagrangian Function with Saddle Point Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Algorithm Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Convergence Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Convergence Rate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Application of PADMM to TGV/Shearlet Compound Regularized Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Grayscale Image Deblurring Experiment . . . . . . . . . . . . . . . . 5.5.2 RGB Image Deblurring Experiment . . . . . . . . . . . . . . . . . . . . 5.5.3 MRI Reconstruction Experiment . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Parallel Primal-dual Method with Application to Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Summarize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Parallel Primal-dual Splitting Method . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 A General Description of the Objective Function for Image Restoration with Proximity Splitting Terms . . . . . 6.2.2 Variational Conditions for Optimization of the Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Algorithm Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Convergence Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Convergence Rate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Further Discussion and Extension of the Primal-Dual Splitting Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Relation to Parallel Linear Alternating Direction Method of Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Further Extensions of the Parallel Primal-Dual Splitting Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Application of PPDS to TGV/Shearlet Compound Regularized Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Image Deblurring Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.2 Image Inpainting Experiment . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
107 107 108 109 109 112 115 115 118 121 124 126 128 133 138 141 141 142 142 143 144 148 148 150 153 153 154 159 161 162 175
xii
Contents
6.6.3 Image Compressed Sensing Experiments . . . . . . . . . . . . . . . . 177 6.6.4 Experiments on the Validity of Pixel Interval Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Uncited References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
About the Authors
Chuan He is an associate professor in the High-Tech Institute of Xi’an. He has been engaged in teaching and research in image processing and navigation guidance for a long time and has a deep research in the field of image restoration. He presided over 10 scientific research projects, published 20 papers, won the Excellent Doctoral Dissertation Award of Shaanxi Province, and won 3 provincial scientific research awards. He was selected into the Special Support Program for the top young talents of Shaanxi Province and technology stars of Shaanxi Province. Besides, he is a reviewer of more than ten international journals including IEEE TIP/TNNLS/TMM. Changhua Hu received the B.Eng. and M.Eng. degrees from the High-Tech Institute of Xi’an, Xi’an, China, in 1987 and 1990, respectively, and the Ph.D. degree from the Northwestern Polytechnic University, Xi’an, China, in 1996. He is currently a Cheung Kong Professor with the High-Tech Institute of Xi’an, Shaanxi, China. He was a visiting scholar with the University of Duisburg, Duisburg, Germany (September 2008–December 2008). He has authored or coauthored two books and about 100 articles. His research interests include signal processing, fault diagnosis and prediction, life prognosis, and fault-tolerant control.
xiii
Chapter 1
Introduction
1.1 Implications for Image Restoration Since the end of the twentieth century, along with the rapid progress of computer technology and the continuous improvement of discrete mathematical theory, digital image processing technology has made rapid development and has been widely used in various fields. In the military field, digital imaging technology and image processing technology have provided indispensable technical means for military tasks such as target detection, weapon guidance and strike evaluation. In all hightech wars, visible light, infrared and synthetic aperture radar (SAR) and other imaging technologies have been used throughout, and their application has greatly improved the level of information technology of military equipment, fundamentally overturning the traditional style and concept of combat. It can be said that modern “information warfare” has been deeply branded with the imprint of digital image processing technology. In the civilian field, image processing technology has penetrated into all aspects of human society, such as astronomical observation, remote sensing of the Earth, biomedicine, social communication, film production and video surveillance. In today’s society, human beings have entered the era of image big data, and images or videos provide people with countless resources of information. However, in the process of image acquisition, conversion and transmission, many image degradation phenomena [1] inevitably arise due to human operations, imaging system defects and external environmental uncertainties. Some degradation situations are artificially set, such as image compression which can drastically reduce the storage space and transmission time of image data, and image compressed sensing [2] (CS) which can relax image sampling conditions and drastically reduce the storage, transmission and processing costs of massive data. More types of degradation are undesirable, such as image degradation caused by noise and blurring. Image degradation brings about a reduction in resolution, which in turn severely affects post-order processing such as analytical interpretation, feature extraction and pattern recognition. For example, in an infrared-guided supersonic cruise weapon, the complex turbulent flow field © Chemical Industry Press 2023 C. He and C. Hu, Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems, Advanced and Intelligent Manufacturing in China, https://doi.org/10.1007/978-981-99-3750-9_1
1
2
1 Introduction
and gas density variations caused by the violent interaction between the optical guidance head and the atmosphere can cause thermal radiation interference and image transmission interference to the optical imaging system. This will result in aero-optical degradation effects such as pixel shifting and blurring of the imaging image, which can seriously affect the ability of the guidance head to detect, identify and track targets and reduce the accuracy of weapon hits. In order to obtain more realistic and reliable information, operations such as distortion correction, denoising, deblurring, inpainting, super resolution reconstruction and compressed sensing reconstruction are required before advanced processing of images. Image restoration is an effective way to suppress noise, eliminate blurring, improve image resolution and reconstruct images. As one of the most fundamental research topics in image processing, image restoration has received extensive attention from researchers in computer vision, signal processing and applied mathematics. Image restoration can be achieved in two ways, one is to use hardware technology, such as the use of higher quality imaging equipment. This approach is fast and effective. But its high cost and lack of flexibility often makes it limited to specific applications. The other is the software approach, that is, through the algorithm to achieve resolution enhancement of degraded images or image reconstruction. This method is inexpensive, convenient and flexible, and has a strong vitality since it was proposed. Image degradation usually implies the loss of some important elements or the compression of the observed data with respect to the original data dimension. Therefore, image restoration as its inverse operation is often an ill-posed inverse problem. The ill-posed nature of the inverse problem is manifested by the discontinuous dependence of the solution on the observed data; in other words, even when the degradation mechanism is completely known, slight noise in the observed data and small perturbations in the computation process can lead to large variations in the solution. The key to solving the ill-posed problem lies in regularization [3], i.e., the use of a prior information about the solution to construct additional constraints, thus converting the ill-posed problem into a fitness problem with a stable solution [4]. The basic implementation of image restoration is to construct and minimize the objective function, which should be understood as the objective functional when the image function is continuous. In this process two topical problems in the field of image restoration have been derived. (1) Construction of the image regularization model Tikhonov, a pioneer in the study of inverse problems, introduced the idea of regularization in 1963 and subsequently proposed the classical Tikhonov regularization model [3] based on the l2 -norm. The overly strong Tikhonov regularization restricts the solution to a smooth solution, which is not usually desired in the recovery of image signals. Edges and textures in an image constitute important detail features, and the difficulty in image regularization is to strike a balance between noise suppression and detail preservation. Image details and high frequency noise are mixed in the frequency domain, and overly strong regularization suppresses detail information in the image while denoising. Subsequent regularization methods invariably use
1.2 Regularization Methods for Image Restoration
3
an image prior model to achieve the goal of preserving image details. Therefore, constructing regularization models that can better preserve image detail information has become one of the current research hotspots in the field of image inverse problems. (2) Solution of the nonlinear regularization function A major advantage of the traditional Tikhonov regularization method is that a closedform analytic solution can be obtained by linear filtering. But this solution is shown to be oversmoothed. Subsequent edge-preserving image restoration methods have used more nonlinear regularization models, such as the total variation (TV) model and the wavelet model. However, nonlinear regularization functions are difficult to find closed-form solutions or even have no closed-form solution. While improving the results, nonlinear regularization models introduce a number of problems such as nonlinearity, non-smoothness, and even non-convexity. These problems together with the high dimensionality of the image data and the non-sparsity of the degeneration process modeling operator make the iterative solution of the nonlinear regularization function an extremely challenging task. Referring to the structural features of regularized functions and constructing accurate, concise, fast and parallel function solving algorithms have become the focus in several research areas such as applied mathematics, computer vision and signal processing. Image restoration problems are a class of scientific problems with important theoretical significance and extensive engineering applications. The key to solving such problems lies in constructing regularization functions that reasonably reflect the a prior model of images and designing accurate, concise, fast and parallel algorithms for solving the functions. Operator splitting [5] is an effective method developed in recent years for solving nonlinear functions accurately. The theory of operator splitting can be used to derive efficient algorithms that facilitate distributed computing, which provides a better solution to the image restoration problem in the era of big data. Focusing on the accurate and fast solution of image restoration problems, this book conducts a systematic and in-depth study on the construction of the image compound regularization model, the parallel implementation of the operator splitting method and its application in the image restoration inverse problems.
1.2 Regularization Methods for Image Restoration In the past decades, academic research on image restoration has developed rapidly. UCLA [6–8], Rice University [9], Northwestern University [10], [11–14], Instituto Superior Técnico [15], Centre National de la Recherche Scientifique (CNRS) [16], National University of Singapore [17], and other related research institutes in the world have carried out distinctive and fruitful research work. IEEE Computer Society, IEEE Signal Processing Society, Society for Industrial and Applied Mathematics, and other related academic organizations hold academic conferences in the field of image and video every year to discuss the research progress of image restoration technology and to promote the development of this field. Many famous journals,
4
1 Introduction
such as “IEEE Transactions on Pattern Analysis and Machine Intelligence”, “International Journal of Computer Vision”, “IEEE Transactions on Imaging Processing” and “SIAM Journal on Imaging Sciences”, publish a large number of academic papers on the topic of image restoration, discussing its key technologies, specific applications and development trends. In China, research work on image restoration is developing rapidly, and research institutes such as the Chinese Academy of Sciences [18, 19], Tsinghua University [20], Peking University [21], Zhejiang University [22], National University of Defense Technology [23], Nanjing University [24], Xidian University [25], [26], [27], and the Chinese University of Hong Kong [28–30] are actively carrying out research work on image restoration.
1.2.1 Image Degradation Mechanisms and Degradation Modeling Image blur is the most typical class of image degradation phenomena, so we take image blur for example to illustrate the degradation mechanism and its modeling process. The factors that cause image blurring are various, such as imperfect imaging system, inaccurate focusing, relative motion of imaging equipment and scene, and atmospheric disturbance, which may lead to blurring of the image, and the interference of various noises is inevitable. Image blurring causes a significant decrease in image resolution, where each point on the image is the result of a mixture of several points in the imaging scene. The process can be described by a two-dimensional convolution as follows ⎛ ⎞ ¨ f (x, y) = S ⎝ k(x, y; a, b)u(a, b)dadb⎠ + n(x, y). (1.1) Ω
where Ω is a bounded region in the two-dimensional plane; (x, y) and (a, b) denote the coordinates of points in the image and object planes, respectively; the point spread function (PSF) k(x, y; a, b) characterizes the point spread nature of the imaging process and is also known as the blur kernel or blur function; S is a point-bypoint nonlinear operation; n(x, y) is the additive noise in the observed process. PSF k(x, y; a, b) is generally related to the spatial location of points in the imaging scene, i.e., it is spatially varying, but for a large class of image degradation processes, it can be considered as spatially invariant. The effect of nonlinearity in the imaging process can usually be ignored, because visually, the human eye is more sensitive to abrupt information such as edges than to slowly changing gray intensity. In most cases, nonlinearities in the imaging process do not significantly corrupt the edge information of the image. Neglecting the nonlinear and spatial variation factors in Eq. (1.1) yields the more commonly used linear shift-invariant degradation model shown in Fig. 1.1.
1.2 Regularization Methods for Image Restoration
5
n (x,y)
Fig. 1.1 Linear shift-invariant degeneracy model for images
u (x,y)
k (x,y)
f (x,y)
¨ f (x, y) =
k(x − a; y − b)u(a, b)dadb + n(x, y)
(1.2)
Ω
Common types of blur functions [31] include linear motion blur functions, outof-focus blur functions, and Gaussian blur functions, and common types [31] of noise include Gaussian noise, Poisson noise, impulse (salt-and-pepper) noise, and multiplicative Gamma noise. The task of image restoration is to derive an estimate about the original scene starting from the noise-stained observed image f (a, b). Image restoration is a regular deconvolution problem if the PSF of the imaging system is known, and conversely, it is a blind deconvolution problem if the PSF is unknown. Intuitively, the deconvolution can be achieved by inverse filter. Assuming that the additive noise is Gaussian white noise, the inverse filter is expressed as U (μ, ν) = F(μ, ν)/K (μ, ν).
(1.3)
Its frequency domain representation in the least square form is given by U (μ, ν) =
K ∗ (μ, ν)F(μ, ν) , |K (μ, ν)|2
(1.4)
where U (μ, ν) and K (μ, ν) are the two-dimensional Fourier transforms of u(x, y) and k(x, y), respectively, and K ∗ (μ, ν)is the conjugate of K (μ, ν). However, since the eigenvalues of the linear convolution operator associated with K (μ, ν) tend to zero, the inverse filter still has an amplifying effect on high-frequency noise even in the least square form, which makes the results unusable. This is analyzed in depth in [32] in terms of compact self-adjoint operators. In practical application, Eq. (1.2) can be discretized as f = K u + n.
(1.5)
where u, f ∈ Rmn denote the original and observed images, respectively, both of size m × n; K is the blur (convolution) matrix, the construction of which is described in detail in Chapter 2 of the paper; and n ∈ Rmn is the additive noise. In this paper, the images are all written in vector form by lexicographic arrangement, whereby the ((i − 1)n, j ) th element of the m × n image matrix is the ((i − 1)n, j ) th element of the image vector. In Eq. (1.5), if K is known, the corresponding inverse problem
6
1 Introduction
is a regular deconvolution problem, and if K is unknown, the corresponding inverse problem is a blind deconvolution problem. When K changes form, Eq. (1.5) can also be used to model other image degradation processes. For example, if K = P, where when P is a selection matrix, i.e., P is a diagonal array whose elements take only 0 or1, then Eq. (1.5) can describe the loss of image data, and its corresponding inverse problem is image inpainting. If K = P F, where P is the selection matrix and F is the Fourier transform matrix, then Eq. (1.5) can be used to model the magnetic resonance imaging (MRI) process, and its corresponding inverse problem is the MRI reconstruction problem, which is a typical example of a compressed sensing application. The key to solving the ill-posed image restoration problem is to regularize it, i.e., to incorporate some prior knowledge about the original image into the solution of the image inverse problem, to suppress the noise and obtain a solution with some regularity (smoothness). The prior knowledge of the image is denoted by the prior model of the image, however, there is no unanimous conclusion about the image model in academia, due to the different nature and uses of the image. Galatsanos and Katsaggelos [33] used mean square error (MSE) analysis to demonstrate that regularization can effectively improve the quality of image restoration. Image restoration problems with regularization typically involve minimization function of the following form. min J (u) s.t. D(K u, f ) ≤ c, u
(1.6)
By Lagrange’s principle, its equivalent unconstrained form is min J (u) + λD(K u, f ). u
(1.7)
where D(K u, f ) is the fidelity term that reflects the accuracy of the observed data, and its specific form depends on the type of noise in the observed image; obviously, if there is no noise, there should be a constraint K u = f ; J (u) is the regularization term that incorporates the prior knowledge of the image, which plays the role of noise suppression, result smoothing, and numerical stabilization; the upper bound c is a constant that depends on the noise level; λ is the regularization parameter, which plays a key role in balancing the regularization term with the fidelity term. The solution is optimal only when λ takes the optimal value. If λ is too large, the noise in the image cannot be effectively suppressed; on the contrary, if λ is too small, the final result cannot fully reflect the valid information in the observed data. Compared with the unconstrained optimization problem Eq. (1.7), the constrained optimization problem Eq. (1.6) is more difficult to solve; therefore, most of the current literature takes Eq. (1.7) as the optimization objective. Currently, image models based on variational partial differential equations (PDEs), wavelet frame theory, sparsity theory, and random field theory are mostly
1.2 Regularization Methods for Image Restoration
7
used in the field of image processing, all of which have their own advantages, disadvantages, and applicability occasions. In the following, the regularization methods based on each of these models are discussed.
1.2.2 Regularization Methods Based on Variational Partial Differential Equations Regularization methods based on the variational principle are built on the classical functional theory and variational methods, in which the image is considered as a deterministic two- or multi-dimensional function. Most of the early such image regularizations were based on the Tikhonov regularization theory. Tikhonov proposed to restrict the solution of the deconvolution to the Sobolev space H n or W (n,2) , in which the function itself and its derivatives or partial derivatives up to the n th order are considered to belong to L 2 , i.e., squared productable. Following this theory, a linear square combination of the certain partial derivatives of the image, from 0 up to order l, is used as a regularized generalized function J (u) in image restoration, which has the following form. J (u) =
¨ ∑ l Ω
r =0
[( qr
∂r u ∂ xr
)2
( +
∂r u ∂ yr
)2 ] dxdy.
(1.8)
where the weight qr is a given non-negative constant or continuous function. The classical Wiener filtering and constrained least square filtering can be seen as two special cases of Tikhonov regularization methods. Although Tikhonov regularization can make the image restoration problem moderate (the solution is continuously observation-dependent), its excessive smoothness (regularity) can likewise lead to loss of detailed information such as edges of the image. Tikhonov regularization theory is more applicable to one-dimensional signals than to two-dimensional or high-dimensional signals such as images. In response to the shortcomings of Tikhonov regularization, non-quadratic regularized generalizations have been introduced into image restoration, mainly Green’s method [34], Besag’s method [35], and Geman and Yang’s semi-quadratic regularization method [36]. However, these regularization methods are strongly nonlinear or even nonconvex, and are much more complicated to solve than Tikhonov regularization methods, and their practical applications are greatly limited. The classical total variation (TV) model [6], also known as the ROF model in some literature, was introduced by Rudin et al. in 1992 and caused a great sensation in the academic community. The model remains one of the most popular regularization models until now, and much work has been devoted to the study [37–40] of the regularization properties of TV. The bounded variation (BV) space induced by the TV norm is a more extensive class of spaces than the Sobolev space. Suppose Ω is a bounded open set in the two-dimensional plane (usually assumed to be a Lipschitz
8
1 Introduction
domain) and a two-dimensional function u(x, y) ∈ L 1 (Ω), then its isotropic total variation is defined as √( ) ( )2 ¨ ∂u 2 ∂u |∇u|dxdy, |∇u| = TV(u) = + . (1.9) ∂x ∂y Ω
If TV(u) is bounded, then u is said to be a bounded variational variance and BV(Ω) is noted as the space of bounded variational functions in L 1 (Ω). BV norm is defined as follows ¨ ||u||BV = |u|dxdy + TV(u). (1.10) Ω
It can be shown that BV(Ω) is a complete linear norm space under the BV norm and the norm is stronger than the L 1 -norm. More analysis of BV space can be found in [37]. TV-based image restoration usually uses only TV(u) rather than BV norm as the regularization term, and thus TV(u) is also called TV semi-norm or TV norm in many occasions. Compared to Tikhonov regularization, TV regularization has good edgepreserving ability, and therefore, it is widely used. However, TV regularization introduces two major difficulties while achieving edge-preserving. On the one hand, the TV norm is not differentiable at (0, 0), which makes the traditional gradient method cannot be used to solve the TV minimization functional; on the other hand, it has been proved that the TV regularization is optimal only when the image function is piecewise constant, and natural images are mostly difficult to satisfy this harsh condition, and the staircasing effects of the TV regularization results can be very serious when the signal-to-noise ratio is low. The staircasing effects make the smooth regions of the image converge to piecewise constants, and the introduction of pseudo-edges can seriously affect the visual effect of the image [37]. In fact, minimization of the l1 norm usually leads to sparsity of the solution, and this sparsity has a very wide range of applications, e.g., compressed sensing and non-negative matrix decomposition, but here it causes the first-order partial derivatives of the image to converge to zero. To address the problem that TV regularization is prone to staircasing effects, many regularization methods [41–51] based on higher-order variational methods have been proposed, which achieve the suppression of staircasing effects by introducing higherorder differentiation of image functions. In 2010, Bredies et al. [46] proposed total generalized variation (TGV), which further generalized the concept of total variation, and they also proved several excellent properties of TGV compared to TV. Unlike TV, TGV introduces higher-order partial derivatives of image functions up to order n (n is a finite positive integer). Bredies demonstrates through theoretical analysis and simulation experiments that TGV regularization can make the image converge to piecewise binary polynomial functions of order n-1 during the recovery process, which suppresses the staircasing effects of the TV model effectively. Of course, there is a cost for any introduction of higher order partial derivatives to eliminate
1.2 Regularization Methods for Image Restoration
9
or mitigate the staircasing effects, which will make the solution of the minimization generalization function more complicated. Hu et al. [49, 50] recently proposed a higher degree total variation (HDTV) regularization model, which adopts a similar idea as TGV and achieves similar results. Partial differential equation (PDE)-based image restoration is a natural extension of variational-based image restoration, which stems from the fact that functional extreme value problems often correspond to the solution of partial differential equations, and many partial differential equations also correspond to some minimizing functional [52] based on the variational principle. Since the end of the twentieth century, PDE-based image processing started to attract attention and gained rapid development. The initial studies were based on isotropic diffusion PDEs, but the results of these methods were prone to image oversmoothing. Since then, Perona and Malik [53] proposed the classical edge-preserving anisotropic P-M diffusion model, which is still adopted by much of the literature [54–56]. Weickert studied anisotropic nonlinear diffusion theory [57] and proposed a semi-implicit additive iterative algorithm based on operator splitting, which improved the the efficiency of the PDE solution. Currently, PDE has been successfully applied as an effective tool in image filtering, smoothing, restoration, and segmentation, etc. The PDE methods have many advantages [26, 58] such as solid basis theory, strong self-adaptability, strong detail preserving ability, and flexible algorithm implementation. Currently, there are still many problems in PDE-based image restoration, such as the existence and uniqueness of higher-order PDE solutions. It is because of the excellent characteristics of PDE and many key problems which have not yet been solved that PDE-based image processing will remain a research hotspot in the future.
1.2.3 Regularization Methods Based on Wavelet Frame Theory The ability to efficiently discriminate between different object patterns is a general requirement for image and visual analysis, and wavelets [59] and their related techniques fit this requirement [37] precisely. As an important means of image representation, wavelets provide a concise mathematical description of image information, and the existence of fast transformations of wavelets makes wavelet frame theory promising for image processing. The regularization of image restoration using wavelet frame representation of an image is clearly feasible, and a large body of literature has investigated [60–70] this topic. Usually, there are three forms of minimization functions for wavelet-frame based image restoration problems, namely analysis-based methods, synthesis-based methods, and balanced regularization methods [67]. One of the discrete balanced regularization methods has the following form min|x|1 + x
( ( || γ || || I − W T W x ||2 + λ ||K W x − f ||2 , 2 2 2 2
(1.11)
10
1 Introduction
where W is the standard tight frame, i.e., W W T = I; u = W x denotes an estimate of the image. The l1 -norm constraint on x is to ensure the sparsity of the coefficients, and this sparsity constraint is actually obtained by performing a convex relaxation from l 0 -norm. In Eq. (1.11), if γ = 0, it is called a synthesis-based regularization method; if γ = +∞, it implies that the second term must be zero to make the minimization function meaningful, which indicates that x = W T u holds for some u. Then Eq. (1.11) can again be written as | | λ λ min |x|1 + ||K W x − f ||22 = min| W T u|1 + ||K u − f ||22 . u 2 2 x∈Range( W T )
(1.12)
This is known as the analysis-based regularization method. It is important to note that classical wavelet theory has limitations when applied to image processing, especially when the image is rich in detail information. Although the wavelet transform can optimally characterize the class of functions with “point singularities”, it cannot optimally approximate high-dimensional data with “line singularities”. Unlike the “point singularities” of one-dimensional signals, natural images often have “line singularities”, such as edge information in the image. This “line singularity” is an important feature for subsequent image processing. The directional limitations of conventional wavelets are not compatible with the variable direction of “line singularities” in high-dimensional signals. The limitations of classical wavelets for two-dimensional or high-dimensional signal processing have led to the development of so-called “post-wavelet” theory, i.e., multiscale geometric analysis, including ridgelets [71], curvelets [72], brushlets [73], beamlets [74], wedgelets [75], contourlets [76], bandelets [77], and shearlets [78– 83]. They can address curved singularities than the classical 2D wavelets and therefore better able to model edge and detail information in images. Some of them such as curvelets and shearlets have fast transforms, which makes them easy to be applied in image processing. A summary of the nature of these transform theories and the image features that each prefers is detailed in [25]. Recently, several image restoration methods have adopted these theories for regularizing [8, 84] the objective function. In fact, the basis of both image modeling with variational theory and image modeling with wavelet frame is the classical functional analysis. They all belong to the deterministic image modeling method, and there is an intrinsic connection between them. About this intrinsic connection, there is a detailed discussion and proof in [85].
1.2.4 Regularization Methods Based on Sparse Representation of Images The human eye can quickly interpret an image by its geometric features such as edges and textures, which reveals that there is much less “feature” data in an image than in
1.2 Regularization Methods for Image Restoration
11
the raw data. The sparse representation, currently very popular in signal processing and machine learning [86, 87], takes advantage of the sparsity of data. If a signal is sparse, it can be efficiently approximated by a few elements in a set of overcomplete bases or dictionaries. Let W ∈ Rn 1 ×n 2 (n 1 < n 2 ) be an overcomplete dictionary and y be the useful signal to be represented. If x ∗ is the sparsest representation of y under W , then there should be x ∗ = arg min||x||0 ,
s.t.
x
y = W x,
(1.13)
where ||x||0 denotes the number of non-zero elements in x (usually there will be ||x ∗ ||0 = m). The overcomplete basis for performing the sparse representation can be deterministic, as described in the previous subsection for the wavelet frame. In this sense, image regularization based on the wavelet frame can be seen as a special case of sparse representation-based regularization. The overcomplete basis can also be obtained by machine learning, such as the learning dictionary-based [88–90] image regularization. In some practical applications, such as video processing, the data representation may be better suited to a matrix or even a tensor. So, is it possible to regularize a matrix or tensor by measuring the sparsity of the matrix or tensor? The low-rank decomposition[91–99], which has been extremely hot in machine learning recently, provides a good idea for matrix regularization. In fact, rank is a natural measure of the sparsity of matrix data [96]. In recent years, regularization based on lowrank decomposition has been widely used in image inverse problems, such as image restoration [97], image segmentation [98], and medical image reconstruction [99]. Taking image denoising as an example, there are two commonly used low-rank decomposition models [96, 97]: robust principal component analysis (RPCA) and Go Decomposition (GoDec). RPCA-based sparse high-level noise denoising has the minimization function min A,E
rank( A) + λ||E||0 ,
s.t.
D = A + E,
(1.14)
where D denotes the observed image, A denotes the low-rank image to be recovered, and E is used to model sparse high-level noise. The model is better suited for image restoration under non-Gaussian sparse noise conditions. And for the removal of dense Gaussian noise, i.e., every image matrix element may be noisy, the model becomes no longer applicable. For non-sparse noise, the GoDec method achieves noise reduction by adding a decomposition term representing the noise, i.e., assuming D = A + E + G, where G represents the non-sparse noise. All of the above low-rank models contain zero-norm minimization, which is a typical NP-hard optimization problem. To simplify the computation, the objective function is usually relaxed to some convex function. To get as close as possible to the solution of the original problem, the convex function is usually chosen to be the convex envelope of the objective function, i.e., the maximum convex function that
12
1 Introduction
does not exceed the objective function. It has been shown that the kernel norm of the matrix ||•||∗ , i.e., the sum of singular values, is the convex envelope of the rank function on the unit ball of the matrix spectral norm. The l1 -norm of a vector, i.e., the sum of the absolute values of its elements, is the convex envelope [96] of its l0 -norm on the unit ball of the ∞ norm. After a convex relaxation using these two conclusions, the minimization function (1.14) can be written as min A,E
|| A||∗ + λ||E||1 ,
s.t.
D = A + E.
(1.15)
Minimization functions of this type can be easily solved by some of the operator splitting methods described below.
1.2.5 Random Field-Based Regularization Methods In contrast to the noise, image detail information, especially the texture details of an image, is usually strongly correlated. Thus, for an image tainted with noise, the human eye can still roughly distinguish between them. Modeling an image as a random field allows the statistical parameters of the image probability distribution model to be estimated according to general strategies in probability statistics, such as maximum posterior, maximum likelihood, or Bayesian principle. The Gaussian model was the first stochastic model used to model images. In fact, this model does not distinguish between the statistical characteristics of images and noise. Using it as a prior model, the maximum likelihood estimate for image restoration happens to be a least square inverse filter estimate, which cannot effectively suppress noise [32]. Building image models that reflect the imaging mechanism is more helpful for tasks such as image restoration, segmentation, and recognition, which is also a development direction of image modeling. More reasonable image probability distribution models should be built depending on the object of study. When some kind of particle event exists in the imaging process, the image grayscale value usually has the nature of Poisson distribution. In this case, the image is commonly modeled by Poisson random field or consider the noise as Poisson noise [100, 101], such as medical CT images, etc. The Markov random field (MRF) model of images, equivalent to Gibbs random field [102, 103], is a very widely used stochastic modeling method that provides a Bayesian framework for image estimation. It can carefully reflect the local neighborhood statistical characteristics of images, and it can be used in cases where the point spread function is spatially varying or the noise is non-smooth. Compared with deterministic modeling methods, random field modeling of images, especially neighborhood-based modeling, is a more refined modeling method, and it has better adaptability to different image types. Thus, it has a broad application prospect in the field of image processing. However, at the same time,
1.3 Nonlinear Iterative Algorithm for Image Restoration
13
this fine-grained modeling method makes the model more complex compared to the deterministic model, which will have higher requirements on the solution algorithm and computer performance, and the parameter estimation of the model becomes a new problem. The most widely used Markov random field is the Gauss-Markov random field, combined with the Bayes method, this model has achieved good results in blind image restoration [104] and super-resolution reconstruction [105] of hyperspectral images. However, the Gauss assumption of this model usually leads to oversmoothing of the images in many cases. Some literature in recent years has used non-Gauss distributions [106] such as the students-t distribution to characterize the statistical characteristics of the images, but the relatively complex model again makes the posterior distribution of the Bayes estimate without a closed-form solution. This makes the traditional EM algorithm impossible to apply and causes great computational difficulties. The variational Bayesian approach, which combines MRF model with variational prior assumption of images, is a relatively new research hotspot in the field of image restoration. Katsaggelos’s team has conducted numerous studies [107–112] on image conventional restoration and blind restoration under this framework, and produced more satisfactory results. This idea largely overcomes the overly fine problem of random field modeling, and provides a basis for random field regularization based image restoration research. Compound regularization is a hot topic of current research. By organically combining the advantages of different prior knowledge, this strategy may yield better results [113] for image restoration. The blind recovery problem is more ill-posed and can be divided into two classes. For the first class, the blur kernel is estimated in advance and then the original image [114] is recovered by conventional methods; the other class simultaneously estimates both the blur kernel and the original image [104]. The objective function of the second class is usually non-convex, in which case more prior knowledge of the image is required to make the problem solvable, and the function minimization problems involved are usually compound regularization problems.
1.3 Nonlinear Iterative Algorithm for Image Restoration Image restoration methods based on inverse filter and Tikhonov regularization are linear and have closed-form solutions. However, both of them have drawbacks, the solution of inverse filter is unstable and the solution of Tikhonov regularization method is too smooth. Nonlinear regularized restoration methods based on total variation or higher order variation, wavelet frame theory, sparse theory, and random fields are more likely to give good restoration results. However, these methods often do not have closed-form solutions and their solutions need to be solved with numerical iterative algorithms. In fact, the iterative solution approach is more conducive to
14
1 Introduction
incorporating a prior knowledge about the solution into the solution process and to “monitoring” the recovery process [115]. Despite the different regularization methods, the image restoration is usually to build a minimization function containing regularization and fidelity terms, and then to find the minimum point of the function, which is treated as the result of image restoration. The convexity of the objective function is extremely important for both the fast implementation and the stability of the solution process. If the objective function is non-convex, the result is hardly guaranteed to be the global optimal solution of the objective function. In sparsity-based regularization methods, the NP-hard l0 optimization problem is usually relaxed to l1 convex optimization problem. It can be shown that, under very relaxed conditions, the solution of the l1 convex optimization problem converges to the solution of the corresponding l0 nonconvex optimization problem [96]. Therefore, the l 1 -norm based regularization is most widely used in the current image inverse problems.
1.3.1 Traditional Methods The TV model is the most representative l1 -regularizer. Therefore, the development of algorithms for solving nonlinear functions in image inverse problems is illustrated by the solution of the TV model. In fact, some of the early methods were usually designed specifically for specific regularization models. When the noise is Gaussian white noise, the TV-based image restoration has the following objective function min||∇u||1 + u
λ ||K u − f ||22 , 2
(1.16)
where ∇ is the first-order difference operator and TV(u) = ||∇u||1 is a l1 convex function. Thus the above equation is a typical l1 − l2 minimization problem. The solution of the TV norm is not easy due to the non-differentiability of the TV norm and the nonlinearity of Eq. (1.16). Despite the fact that the TV model has been introduced into image processing for more than two decades, to date, the minimization function Eq. (1.16) is still a touchstone for testing many new algorithms. The earliest method used to solve TV denoising (K = I) is the time-marching algorithm [6] proposed by Rudin et al. This method introduces time variables into the Euler–Lagrange equation of function Eq. (1.16), which is not only slow but also unsatisfactory in terms of computational accuracy. Subsequently, the lagged diffusivity fixed point method for solving the TV denoising model was proposed [116], which overcomes the shortcomings of the time-marching method to some extent. This method requires the introduction of a small constant in the denominator of the equation when dealing with the nondifferentiable point of TV, a strategy still visible [28] in some current literature. In 2004, Chambolle [117] proposed a classical gradient descent method based on the dual model of the TV model, and rigorously
1.3 Nonlinear Iterative Algorithm for Image Restoration
15
proved the convergence of the algorithm and the required convergence conditions. The general idea of the method is to first transform the model Eq. (1.16) into the following primal–dual form ⎫ ⎧ λ 2 maxmin ⟨u, divv⟩ + ||K u − f ||2 v u 2
√ | | s.t. |v i, j | = vi,2 j,1 + vi,2 j,2 ≤ 1. (1.17)
In Eq. (1.17), K = I and div is the divergence operator with a Hilbert adjoint operator −∇. Setting the gradient of the “min” objective function to zero yields u = f − λ−1 divv, which is substituted into Eq. (1.17) to obtain the dual model of Eq. (1.16). { } min ||divv − λ f ||22
| | |v i, j | =
√
vi,2 j,1 + vi,2 j,2 ≤ 1.
(1.18)
|( ( ( ( −1 (( (( || | ∇ λ divv − f i, j − | ∇ λ−1 divv − f i, j |v i, j = 0.
(1.19)
v
s.t.
Its optimality condition is
It is then iteratively solved by using the following semi-implicit gradient descent method |( ( ) (( ( (( (( || | k . (1.20) v i,k+1 ∇ λ−1 divv k − f i, j − | ∇ λ−1 divv k − f i, j |v i,k+1 j = v i, j + τ j Finally, the original variable, which is the final denoised image, is solved according to the relationship between the original variable and the dual variable described above. This method cleverly circumvents the nondifferentiable problem of the TV model by solving the Fenchel dual[5] model of the TV denoising model Eq. (1.16), and becomes one of the most efficient image denoising methods to date. In addition, it also often appears as a nested algorithm in some image deconvolution algorithms [15]. However, it is difficult to directly generalize this dual idea to other image inverse problems because K −1 will appear in the dual model of Eq. (1.16) when the degenerate matrix K is not a unit array. Unfortunately, K may be singular, i.e., its inverse matrix may not exist. In addition, other methods, include the second-order cone method [118], orthogonal projection method [119], interior-point method [120], and precondition method [121], are used for solving nonlinear regularized models. Although these traditional methods are able to give reasonable solutions for a particular problem, they also suffer from such inherent defects as approximate solutions, inability to fully exploit the structure of the problem itself, and unfavorable to massively parallel computation, which limit their application in the context of current image big data.
16
1 Introduction
1.3.2 Operator Splitting Methods In recent years, a class of powerful, generic, flexible, and parallel operator splitting methods has been introduced to the field of image inverse problems to better cope with the high-dimensional, massive, and high-quality requirements of image big data processing problems, with the basic idea of “simplifying and dividing”. Their common mathematical foundation is the modern convex optimization analysis laid down by pioneers such as Fenchel (1905–1988), Moreau (1923–2014), and Rockafellar (1935-). Concepts such as subdifferential, proximal mapping, and infimal convolution are frequently used. Such methods can better cope with the nonlinearity and nonsmoothness of the objective function and various tedious constraints. Usually, operator splitting methods only need to mine the first-order information of the objective function, thus their computational implementation is simple enough. Many signal processing problems, such as image restoration problems, can often be modeled as the following minimization model. min f 1 (x) + · · · + f m (x),
x∈R N
(1.21)
where f 1 , . . . , f m are convex functions mapped from R N to (−∞, +∞]. A common challenge in solving this model is that some function terms are not differentiable, which renders some traditional smooth optimization techniques useless. The operator splitting approach derives a feasible solution algorithm by “splitting” and solving these function terms separately. A common assumption is that a non-smooth function f i is “proximal”, i.e., that its proximity operator has a closed-form solution or can be easily solved. Proximal splitting is the basis for operator splitting. Practical applications show that this assumption is sufficiently relaxed. Although it is not a long time since the proximity algorithm has been introduced into the field of image processing, its generalization has been scattered rapidly. Before introducing several commonly used methods of operator splitting, a few concepts of convex analysis to be used are given below. Let R N be an N-dimensional Euclidean space, let ⟨•, •⟩ be the inner product N notation, and { let Nthe domain of }a convex function f : R → (−∞, +∞] be dom f = x ∈ R | f (x) < +∞ . Denote the set of all convex, and lower ( proper, ( semicontinuous [5] functions from R N to (−∞, +∞] as ⎡0 R N , in which ( the ( domain of a convex function is non-empty. The fenchel conjugate of f , f ∗ ∈ ⎡0 R N , is defined as ⟨ ⟩ ( ( f ∗ : R N → (−∞, +∞] : x → sup x ' , x − f x ' . x ' ∈R N
(1.22)
The subdifferential of f is a set-valued map as follows |( { ⟩ ( (} (⟨ N ∂ f : R N → 2R : x → b ∈ R N | ∀x ' ∈ R N x ' − x, b + f (x) ≤ f x ' . (1.23)
1.3 Nonlinear Iterative Algorithm for Image Restoration
17
Subdifferential is a set, while subgradient refers to one of its elements, and it is clear that the concept of subgradient is a generalization of the concept of gradient of a smooth function. The application of subgradient leads to the Fermat’s rule [5] for minimizing nonsmooth convex ( functions. ( Fermat’s law: if f ∗ ∈ ⎡0 R N , then we have } { arg min f = zer ∂ f = x ∈ R N |0 ∈ ∂ f (x) .
(1.24)
An important property of the subgradient is the maximal monotonicity [5] (see Sect. 2.4.3 for an introduction to the maximally monotone operator and the maximal monotonicity property), i.e., it satisfies ⟨
( (⟩ x − x ' , ∂ f (x) − ∂ f x ' ≥ 0,
(1.25)
and the value range of the operator I + ∂ f is R N . This property is often used in convergence proofs concerning splitting algorithms. The Bregman distance of f with respect to the point x from x’ is defined as ( ⟨ ( ( ( ⟩ D bf x ' , x ≜ f x ' − f (x) − x ' − x, b ≥ 0,
b ∈ ∂ f (x).
(1.26)
The Bregman distance is a generalized distance, which obviously does not satisfy symmetry, and its nonnegativity can be obtained from the definition of subdifferential. Let Ω be a non-empty set in R N whose indicator fuction is defined by ⎧ ιΩ : x →
0, x ∈ Ω; +∞, x ∈ / Ω.
(1.27)
The fenchel conjugate of the indicator fuction is the support function, which is defined as ⟨ ⟩ σΩ = ι∗Ω : R N → (−∞, +∞] : x → sup x ' , x .
(1.28)
x ' ∈Ω
( ( It is easy to verify that if Ω is a nonempty convex set, then we have σΩ ∈ ⎡0 R N . By introducing indicator fuction, a constrained optimization problem min f (x) can x∈Ω
be transformed into an equivalent unconstrained optimization problem min f (x)+ιΩ x thus making it easier to apply operator splitting ( N ( methods. Let the proximity operator for f ∈ ⎡0 R be prox f :
RN → RN ,
||2 ( ( 1 || x → arg min f x ' + || x − x ' ||2 . 2 x ' ∈R N
(1.29)
The proximity operator is a resolvent operator of the subdifferential operator (see Chap. 2 for definition), i.e., prox f = ( I + ∂ f )−1 , and prox is a single-valued
18
1 Introduction
mapping, which is firmly nonexpansive (see Chap. 2 for definition), i.e., ||( || || || || ( ( ( || ||prox x − prox x ' ||2 + || I − prox x − I − prox x ' ||2 ≤ || x − x ' ||2 . f f f f (1.30) In addition, the reflection operator or reflected resolvent operator 2prox f − I of subdifferential is also non-expansive [5] (see Chap. 2 for definition). Using the firmly nonexpansiveness of the prox f operator can usually transform a complex function minimization problem into a fixed-point problem, and its single-valuedness is of crucial importance for the stability of the corresponding algorithm. It is easy to verify that the proximal operator of indicator function ιΩ for a convex set Ω is a projection operator onto Ω, and thus the proximal operator is considered as a generalization of the projection operator [122]. Indeed if prox f exists, then the minimum point of min f (x) can be found by the following proximity fixed-point iteration x
x k+1 = arg min f (x) + x
|| 1 || || x − x k ||2 . 2 2β
(1.31)
( ( The above equation can in turn be written as x k+1 = proxβ f x k or x k+1 = ( ( x k − β∂ f x k+1 . This is known as the proximal point algorithm (PPA). Next, several currently popular operator splitting methods based on the proximal point method are briefly described. (1) Bregman iterative and linearized Bregman iterative methods Osher introduced the Bregman iterative method to the field of image processing in 2005 and applied it to TV-based image denoising and deblurring [7]. The method has better generality and higher computational efficiency than the previous methods. The Bregman iterative method can be used to solve the following types of image inverse problems min f (x), x
s.t.
φ(x) = b,
(1.32)
where the operator φ can be nonlinear and the iterative rule for the Bregman iterative method is ⎧ ⎨ x k+1 = arg min D pk ( x, x k ( + β ||φ(x) − b||2 , 2 f 2 x (1.33) ⎩ pk+1 = pk − β(∇φ)T (φ ( x k+1 ( − b( ∈ ∂ f ( x k+1 (. In Eq.(1.33) pk is the subgradient of f at x k , β ∈ (0, +∞) is the penalty parameter, ( pk ( and D f x, x k is the Bregman distance. If φ = A is linear, which is more common in image inverse problems, the iteration Eq. (1.33) can be transformed into the following more compact form [123]:
1.3 Nonlinear Iterative Algorithm for Image Restoration
⎧
|| ||2 x k+1 = arg min f (x) + β2 || Ax − bk ||2 , x
bk+1 = bk + b − Ax k+1 .
19
(1.34)
For the same type of problem, Yin et al. then improved the Bregman iterative method to obtain the linearized Bregman iterative method [123, 124] and applied it to the basis pursuit problem ( f (x) = ||x||1 ). Let φ = A in Eq. (1.33) be linear, and the basic idea is to make a Taylor expansion of the quadratic term around x k . || ||2 ||2 ⟨ (⟩ 1 || ( || Ax − b||22 ≈ || Ax k − b||2 + 2 x − x k , AT Ax k − b + || x − x k ||2 . δ
(1.35)
The iterative rule of the linearized Bregman iterative method for problem (1.32) (φ = A) is as follows ⎧ ( ( ( β || ((|| pk ( k+1 || x − x k − δ AT Ax k − b ||2 , = arg min D f x, x k + 2δ ⎪ ⎨x 2 x || T || || || 0 < δ < 1/ A A 2 , ⎪ ( ( ( ( ⎩ k+1 p = pk − βδ x k+1 − x k − β AT Ax k − b .
(1.36)
It should be noted that the linearized Bregman iterative method is convergent only when δ satisfies the above given condition. This method usually has a higher execution efficiency than the basic Bregman iteration, as it avoids the matrix inversion operation commonly existent in inverse problems. The problem (1.32) is clearly for “clean” data. When the observations contain noise, the two Bregman methods use || Au − b||22 ≤ c as a stopping criterion, which requires a pre-estimated value of c based on the noise level. The convergence proofs for the Bregman iterative method and linearized Bregman method are provided in [7] and [124], respectively. the Bregman methods are stretched when dealing with more complex problems, but they provide a solid theoretical foundation for the subsequent splitting Bregman methods. (2) Splitting Bregman method In 2009, Goldstein and Osher [125] proposed the splitting Bregman algorithm (SBA) based on variable splitting (VS) [126] and Bregman iterative method, and applied it to the l 1 -regularized inverse problem. The method can be used to solve the following types of problems. min f 1 (x) + f 2 (φ(x)). x
(1.37)
By introducing the auxiliary variable d, the above equation can be transformed into the following equivalent linearly constrained optimization problem. min f 1 (x) + f 2 (d), x,d
φ(x) = d.
(1.38)
20
1 Introduction
Let the convex function E(x, d) = f 1 (x) + f 2 (d), the following iterative form can be obtained by referring to the idea of Bregman iteration. ⎧( ( ( pk ( k+1 k+1 ⎪ = arg min D E x, d, x k , d k + β2 ||φ(x) − d||22 , ⎪ ⎨ x ,d x,d ( ( ( ( T k+1 k φ x( k+1 (( − d k+1 , p = p − β(∇φ) ⎪ x x ⎪ ( ⎩ k+1 p d = pkd − β d k+1 − φ x k+1 .
(1.39)
This is known as the splitting Bregman method. The solution of x k+1 and d k+1 in the first step can be alternated. It may seem that the first step requires the introduction of a nested iteration, but in reality it does not require an exact solution. Since the accuracy of the exact solution will be wasted by subsequent updates of the variables. Experiments show that x k+1 and d k+1 only need to be solved alternately once, and convergence in that case can still be rigorously proven [127, 128]. Unlike the Bregman iteration, here both sides of the equation constraint are variables. When φ = A is a linear operator, the iteration Eq. (1.39) can be simplified to ⎧ || ||2 ⎪ x k+1 = arg min f 1 (x) + β2 || Ax − d k + bk ||2 , ⎪ ⎪ ⎨ x || ||2 ( ( k+1 = arg min f 2 (d) + β2 || d − Ax k+1 − bk ||2 = pr ox f2 /β Ax k+1 + bk , d ⎪ ⎪ d ⎪ k+1 ⎩ = bk + Ax k+1 − d k+1 . b (1.40) In the above equation, the solutions of x k+1 and d k+1 are alternated only once. The introduction of the auxiliary variables is important, since it allows the two convex function terms to be separately solved, and the solution with respect to the auxiliary variable d is a proximal point problem. It is easy to convert splitting Bregman method to linear splitting Bregman method [129] using the idea of linearized Bregman. (3) Alternating direction method of multipliers and linearized Alternating direction method of multipliers Similar to the Splitting Bregman method, the alternating direction method of multipliers (ADMM) [15, 130], also known as the alternating direction method (ADM), incorporates the idea of variable splitting. It solves Eq. (1.37) by finding the saddle point of the augmented Lagrangian function of Eq. (1.38) (φ = A). The augmented Lagrangian function of Eq. (1.38) is L A(x, d; b) ≜ f 1 (x) + f 2 (d) + ⟨b , Ax − d⟩ +
β || Ax − d||22 . 2
(1.41)
where b is the Lagrangian dual variable, also known as the Lagrangian multiplier, and β ∈ (0, +∞) is the penalty parameter. The augmented Lagrangian method (ALM) [15] can find the saddle point of Eq. (1.41) by the following iterative rule:
1.3 Nonlinear Iterative Algorithm for Image Restoration
⎧( || || ( ⎨ x k+1 , d k+1 = arg min f 1 (x) + f 2 (d) + β || Ax − d + bk /β ||2 , 2 2 x,d ⎩ bk+1 = bk + β ( Ax k+1 − d k+1 (.
21
(1.42)
Similar to the splitting Bregman iterative method, the first step does not require an exact solution. If the number of iterations for the alternating solutions of x k+1 and d k+1 is 1, the following iterative rule for ADMM is obtained. ⎧ ||2 || ⎪ x k+1 = arg min f 1 (x) + β2 || Ax − d k + bk /β ||2 , ⎪ ⎪ ⎨ x ||2 || ( ( k+1 = arg min f 2 (d) + β2 || d − Ax k+1 − bk /β ||2 = pr ox f2 /β Ax k+1 + bk /β , d ⎪ ⎪ d ⎪ ( ( ⎩ k+1 = bk + β Ax k+1 − d k+1 . b (1.43) Equation (1.40) differs from Eq. (1.43) by only one constant, which can be eliminated by variable substitution, indicating that the splitting Bregman and alternating direction methods are fully equivalent under linear constraints. If the Taylor expansion of the quadratic term in the first step of Eq. (1.43) is done around x k , then the linearized alternating direction method of multipliers (LADMM), which has a wider range of applicability [93], [131–135], can be derived as follows. || ( (( ( || ( ( ⎧ k+1 = proxδ f1 x k − δβ AT Ax k − d k + bk /β , 0 < δ < 1/ β || AT A||2 , ⎨x ( ( d k+1 = prox f2 /β Ax k+1 + bk /β , ( ( ⎩ k+1 = bk + β Ax k+1 − d k+1 . b (1.44) For a history of the development of ADMM-like algorithms, please see the review literature [136] and [137]. (4) Forward–backward splitting method The forward–backward splitting (FBS) method [138] is used to solve the following type of problems. min f 1 (x) + f 2 (x),
x∈R N
(1.45)
where f 1 is “proximable” in the sense that its proximity operator exists in closedform or can be easily solved, and f 2 is differentiable and its gradient is Lipschitz continuous, i.e. || || || ( ( (|| ( ||∇ f 2 (x) − ∇ f 2 x ' || ≤ γ || x − x ' ||, ∀ x, x ' ∈ R N × R N .
(1.46)
Taking ε ∈ (0, min{1, 1/γ }), Eq. (1.45) can be solved by the following iterative rule
22
1 Introduction
⎧
( ( [ ] yk = x k − β∇ f(2 x k , β ∈ ε,(2/γ − ε , x k+1 = x k + θ k proxβ f1 yk − x k , θ k ∈ [ε, 1].
(1.47)
In Eq. (1.47), the first step involving gradient descent is referred to as the forward step, while the second step involving the proximity operator is referred to as the backward step. The derivation of Eq. (1.47) employs the nonexpansive nature [5] of the proximity operator: ( ( Fix proxβ f1 (I − β∇ f 2 ) = zer(∂ f 1 + ∇ f 2 ),
(1.48)
where Fix denotes the fixed-point. In fact, the FBS algorithm can be seen as a generalization of PPA, with the gradient (subgradient) of FBS taking the form ( ( ( ( (( x k+1 ∈ x k − β ∂ f 1 x k+1 + ∇ f 2 x k .
(1.49)
( ( ( ( To separate f 1 from f 2 , ∇ f 2 x k+1 is replaced with ∇ f 2 x k in PPA. When f 1 = ιΩ , note that PΩ is the projection operator to the convex set Ω. Equation (1.49) translates to ( ( (( x k+1 = PΩ x k − β k ∇ f 2 x k ,
(1.50)
That is, the classical gradient projection algorithm. A classical example of using the forward–backward splitting method is Beck and Teboulle’s fast iterative shrinkage/thresholding algorithm (FISTA) [139], whose iterative rule is as follows ⎧ ( k ( k (( −1 ⎪ x k+1 = prox ⎪ √ γ2 −1 f1 z − γ ∇ f 2 z , ⎨ 1+ 4tk +1 (1.51) tk+1 = 2 ( )(t(0 = 1) , ( ⎪ ⎪ ⎩ z k+1 = x k + tk −1 x k − x k−1 . tk+1
In the literature [139], Beck and Teboulle applied FISTA to the dual problem of the TV denoising problem (1.16). But when applying FISTA to TV deblurring, it is necessary to use the algorithm in a nested fashion, due to the fact that the dual problem of TV deblurring involves the inverse of the blur matrix. (5) Douglas-Rachford splitting method and Peaceman-Rachford splitting method The FBS algorithm above requires one of the two functions in Eq. (1.45) to be differentiable, which is too rigorous for many practical applications. The DouglasRachford splitting (DRS) method [140] requires only the existence of proximity operators for the two functions in Eq. (1.45). It achieves decoupling of the two function terms by the following iterations.
1.3 Nonlinear Iterative Algorithm for Image Restoration
⎧
x k = proxβ f2 yk , β > 0, ( ( yk+1 = yk + proxβ f1 2x k − yk − x k .
23
(1.52)
( ( ( ( Write T P R S = 2proxβ f1 − I ◦ 2proxβ f2 − I , then the subgradient form of the DRS is yk+1 =
( (( ( (( ( ( ( ( 1 (I + T P R S ) yk ∈ yk − β ∂ f 1 proxβ f1 2x k − yk + ∂ f 2 yk , (1.53) 2
i.e., yk+1 =
) ( ( ( 1 1 (I + T P R S ) yk or yk+1 ∈ Fix (I + T P R S ) . 2 2
(1.54)
By the nonexpansiveness of the proximal operator and the nonexpansiveness of the reflected resolvent operator (see Sect. 2.4.3), T P R S is nonexpansive, so (I + T P R S )/2 is a 1/2-averaged operator, which is firmly nonexpansive (see Sect. 2.4.2). Relaxing the 1/2-average in Eq. (1.54) yields the following Peaceman-Rachford splitting (PRS) algorithm [141, 142] ⎧ k k ⎪ ⎨ x = prox ( β f2 y k ,) β < k0, ( ( ( ( yk+1 = 1 − θ2 yk + θ2 T P R S yk = yk + θ k proxβ f1 2x k − yk − x k , (1.55) ⎪ ⎩ θ k ∈ (0, 2]. Its subgradient form is proxβ f2 (FixT P R S ) = zer(∂ f 1 + ∂ f 2 ).
(1.56)
(6) Primal–dual splitting method The primal–dual splitting (PDS) method [143] can solve both the primal problem and the dual problem by finding the saddle point of the Lagrangian problem. Compared with the aforementioned splitting methods, primal–dual splitting is more flexible and can solve a broader range of problems, thus gradually becoming a new research hotspot. In 2011, Chambolle and Pock [144] proposed a primal–dual algorithm for solving min f 1 (x) + f 2 ( Ax), x∈X
(1.57)
setting off a boom in the application of primal–dual theory to image inverse problems. The Lagrangian function of the problem (1.57) can be obtained using the Fenchel conjugate of f 2 ( Ax).
24
1 Introduction
minmax f 1 (x) + ⟨ Ax, y⟩ − f 2∗ ( y). x∈X y∈V
(1.58)
Applying the Fenchel conjugate of f 1 (x) again yields the Fenchel-Rockafellar dual problem of Eq. (1.57). ( ( ( ( max − f 1∗ − A∗ y + f 2∗ ( y) . y∈V
(1.59)
⟨ ⟩ A∗ is the adjoint operator of A with ⟨ Ax, y⟩ = x, A∗ y in Hilbert space. In Euclidean space, A∗ is AT . Chambolle and Pock’s method solves the saddle point of Eq. (1.58). When there is no dual gap, the objective function value of Eq. (1.57) is equal to that of Eq. (1.59). The iterative rule for the method is ⎧ || ||2 ( ( ⟩ ⟨ ∗ ∗ 1 ⎪ ⎨ x k+1 = arg min f 1 (x) + x, A yk + 2s || x − x k ||2 = proxs f1 x k − s A yk , x || ||2 ( ( ⟩ ⟨ k+1 ⎪ = arg min f 2∗ ( y) − Ax k+1 , y + 2t1 || y − yk ||2 = proxt f2∗ yk + t Ax k+1 . ⎩y y
(1.60) The improved format is ( ) ⎧ k+1 ⎪ = proxt f2∗ yk + t A x˜ k , ⎨y ( ( x k+1 = proxt f1 x k − s A∗ yk+1 , ⎪ ( ( ⎩ k+1 x˜ = x k + θ x k+1 − x k , θ ∈ [0, 1].
(1.61)
In addition, Chambolle and Pock have improved Eq. (1.61) for different situations. In fact, in 2008, Zhu and Chan had applied the special case [145] of Eq. (1.61) at θ = 0, which was named primal–dual hybrid gradient (PDHG) method, to the TV-based image restoration model Eq. (1.16). PDHG was used to solve the nondifferentiable problem of the TV model and the non uniqueness problem of the dual problem (TV denoising). However, at that time the author did not generalize the algorithm to other applications and did not rigorously analyze its convergence. Currently, primal–dual splitting methods are mostly constructed based on the theory of maximally monotone operator and the theory of nonexpansive operator, rather than simply applying some properties of subgradient, which can lead to some algorithms [146]–[150] with more powerful performance and wider applicability. The operator splitting methods discussed above can be equivalently related in some specific cases. For example, the splitting Bregman algorithm (SBA), the Alternating direction method of multipliers (ADMM), and the Douglas-Rachford splitting algorithm (DRSA) are equivalent [151] as shown in Fig. 1.2 when applied to the original problem (1.57) (P) and the dual problem (1.59) (D). The above operator splitting methods also have some common shortcomings, such as most basic algorithms are only for the case where the objective function contains two function terms; many splitting methods introduce auxiliary variables in the design
1.3 Nonlinear Iterative Algorithm for Image Restoration
SBA (P)
ADMM (P)
25
DRSA (D)
Fig. 1.2 Equivalence of several operator splitting methods
process, which makes these methods relatively tedious and complicated [152] to solve. Developing parallel operator splitting methods [153–157] and combining them with distributed computing to cope with image big data problems is a hotspot in the future. In addition, the connection between primal–dual algorithms and other methods is also an academic issueworthy of in-depth study [152].
1.3.3 Convergence Analysis of the Splitting Algorithms An important research hotspot and difficulty concerning operator splitting methods is the analysis of their asymptotic convergence behavior. A very important reason for the rapid development and wide application of operator splitting methods is their solid theoretical foundation, which is reflected both in the design of the algorithm and in the analysis of convergence. Usually, the convergence analysis of an algorithm contains two aspects, one is the convergence proof, i.e., whether the algorithm can find the solution of the objective function accurately; the other is the convergence rate analysis, i.e., how fast the algorithm can approach the optimal solution of the problem. The convergence rate can be described [142, 158] by the fixed-point residual (FPR, i.e., the Euclidean distance between the results of two consecutive iterations), the objective function value deviation (objecitve error), and the pairwise gap (duality gap). There are currently two main ways to analyze the convergence of operator splitting, variation inequality (VI) and non-expansive operator approach [5]. The more popularly applied Bregman-like algorithms and ADMM algorithms are mostly based on variational inequality for the contractivity and convergence analysis. In trast, the forward–backward splitting method, Douglas-Rachford splitting method, and the latest primal–dual splitting method mostly perform contractivity analysis through the contractivity of the non-expansive operator and fixed-point iterations, because their derivations are directly related to the maximally monotone operator and the nonexpansive operator. Compared with the convergence analysis method based on the contractionality of the non-expansive operator, the method based on the variational inequality is more complicated but requires less theoretical foundation. Professor Bingsheng He, a well-known scholar at Nanjing University in China, introduces the foundations about variational inequalities in detail in his textbook “Contraction Algorithms for Convex Optimization and Monotone Variational Inequalities” (personal website).
26
1 Introduction
Yin et al. [159–162] (UCLA, USA) and the team [127, 128] of He [127, 128] have performed a systematic convergence analysis of Bregman-like and ADMMlike algorithms. Yin et al. pointed out in the literature [142] and [158] that for many existing operator splitting methods, the convergence rate O(1/k) can usually be proved without additional conditions. In contrast, it is( pointed ( out in [160] that the ADMM algorithm can obtain a convergence rate O 1/ck if a term in Eq. (1.57) possesses strong convexity and a Lipschitz continuous gradient, where c is some constant greater than 0. He et al. proved the ergodic and non-ergodic O(1/k) converand [128], respectively. In addition, Goldgence rates for DRS/ADMM ( in [127] ( stein et al. [163] obtained O 1/k 2 convergence rates for ADMM by employing the Nesterov acceleration method [164] (also employed by FISTA [139]). Chambolle and convergence rate for their primal–dual splitPock, on the other hand, proved ( ( O(1/k) 2 and O 1/k convergence rate when employing the Nesterov ting algorithm [144], ( ( acceleration, and O 1/ck convergence rate under the assumption that both function terms are strongly convex.
1.3.4 Adaptive Estimation of the Regularization Parameter Although image restoration technology has gone through more than 50 years, so far, real-time image restoration technology is still rarely reported. The reasons for this are, first, the large computational volume of image restoration and the high requirements for hardware technology; second, the problem of automated implementation of the algorithm is still not well solved. A key problem in the automated implementation of image restoration algorithms is the adaptive selection of regularization parameter. The regularization parameter plays the role of balancing the fidelity term and the regularization term. In the regularized function (1.7), if the regularization parameter is chosen too small, the recovery result will easily deviate too much from the observed data, resulting in an over-smooth result; if the regularization parameter is chosen too large, it will lead to the recovery result containing noise and not smooth enough. The simplest way to select the regularization parameter is to select them artificially before solving the objective function, which is the current practice [85, 125, 126, 165, 166] in most of the literature. However, this human selection approach is not only time-consuming but also not meet demand of the fully automated implementation of image restoration. Moreover, many factors, such as image noise level, type and size of blur function, and image type, can have an impact [167] on the selection of the regularization parameter, which is not a small challenge for the performer. Currently, adaptive image restoration algorithms with automatic updating of regularization parameter are receiving increasing attention and have become a hotspot in image processing. From the existing research results, various means can be used to achieve adaptive selection of regularization parameter, which include Morozov’s discrepancy principle [15, 30] [167]– [171] (requiring a prior knowledge about noise level), generalized cross-validation [172, 173] (GCV) (without input of noise level),
1.3 Nonlinear Iterative Algorithm for Image Restoration
27
L-curve method [174, 175], unbiased predictive risk estimator (UPRE) method [176], variational Bayesian method[107], and parameter descent method [177], etc. The GCV method can be conveniently applied when the regularization term has a quadratic form, however, this is not easily satisfied in practice e.g., TV-regularized image restoration. The minimum of the GCV formula is not easy to find. In addition, the GCV method tends to lead to less smooth results [30]. The L-curve method confirms the regularization parameter by finding the corner point about the logarithmic curve of the regularization and fidelity terms. But if the curve itself is smooth, then the corner point will be difficult to determine and the method is computationally intensive. The regularization parameter of the variational Bayesian method can be expressed as a function of the image and noise model hyperparameters, and it can be obtained naturally in the process of parameter estimation. But the solution of the variational Bayesian method itself is a difficult problem. The parameter descent method first selects a large regularization parameter to solve for the objective function, and then uses some strategy to reduce the regularization parameter until a certain stopping criterion is satisfied. The final parameter is the required regularization parameter to be taken. However, the selection of the parameter descent criterion and the algorithm stopping criterion is a difficult problem, which also makes it difficult to guarantee the convergence of the algorithm. The Morozov’s discrepancy principle is a feasible solution to achieve adaptive selection of the regularization parameter when the noise level of the observed image can be estimated. The principle determines the regularization parameter by matching the residuals to a certain upper limit. According to this principle, the feasible domain of the restoration results is Ψ ≜ {u : D(K u, f ) ≤ c}.
(1.62)
where c is a noise-dependent constant. If the observed noise is Gaussian noise, then c = τ mnσ 2 , where τ is a predetermined noise-dependent constant (the inequality has a different form when the noise is not Gaussian white noise). The essence of the method is to solve the constrained optimization problem Eq. (1.6) directly and simultaneously perform regularization parameter estimation. Currently, the main problem with adaptive image restoration based on the discrepancy principle is the need to introduce internal iterations [15, 30] [169], [170], [171] in the basic iterative algorithm for implementing adaptive selection of regularization parameter. Moreover, the convergence of adaptive image restoration algorithms is not directly guaranteed by the convergence of non-adaptive algorithms and requires a rigorous theoretical proof, which is not addressed in most of the literature. In the past decade, operator splitting research has made remarkable progress in both theoretical methods and practical applications. In recent years, scholars have achieved fruitful results in solving image inverse problems based on operator splitting. Currently, in the face of high-dimensional and massive image big data processing problems, the development of a class of generic and flexible, parallel and fast operator splitting methods has become inevitable.
28
1 Introduction
References 1. Campisi P, Egiazarian K (2007) Blind image deconvolution-theory and applications. CRC Press, Boca Raton 2. Candès EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Process Mag 25(2):21–30 3. Tikhonov A, Arsenin V (1977) Solution of ill-posed problems. Winston and Sons, Washington 4. Vogel CR (2002) Computational methods for inverse problems. SIAM, Philadelphia, PA 5. Bauschke HH, Combettes PL (2011) Convex analysis and monotone operator theory in Hilbert spaces. Springer, New York 6. Rudin L, Osher S, Fatemi E (1992) Nonlinear total variation based noise removal algorithms. Phys D 60(1–4):259–268 7. Osher S, Burger M, Goldfarb D et al (2005) An iterative regularization method for total variation-based image restoration. Multiscale Model Simul 4:460–489 8. Guo W, Qin J, Yin W (2014) A new detail-preserving regularity scheme. SIAM J Imag Sci 7(2):1309–1334 9. Li C, Yin W, Jiang H, Zhang Y (2013) An efficient augmented Lagrange method with applications to total variation minimization. Comput Optim Appl 56(3):507–530 10. Chen Z, Molina R, Katsaggelos AK (2014) Automated recovery of compressedly observed sparse signals from smooth background. IEEE Signal Process Lett 21(8):1012–1016 11. Babacan SD, Molina R, Do MN, Katsaggelos AK (2012) Blind deconvolution with general sparse image priors. In: European conference on computer vision (ECCV), October 7–13, 2012 12. Vega M, Mateos J, Molina R, Katsaggelos AK (2012) Astronomical image restoration using variational methods and model combination. Stat Methodol 9(1–2):19–31 13. Villena S, Vega M, Babacan SD, Molina R, Katsaggelos AK (2013) Bayesian combination of sparse and non sparse priors in image super resolution. Digital Signal Process 23:530–541 14. Amizic B, Spinoulas L, Molina R, Katsaggelos AK (2013) Compressive blind image deconvolution. IEEE Trans Image Process 22(10):3994–4006 15. Afonso MV, Bioucas-Dias JM, Figueiredo MAT (2011) An augmented Lagrange approach to the constrained optimization formulation of imaging inverse problems. IEEE Trans Image Process 20(3):681–695 16. Fehrenbach J, Weiss P, Lorenzo C (2012) Variational algorithms to remove stationary noiseapplication to microscopy imaging. IEEE Trans Image Process 21(10):4420–4430 17. Dong B, Li J, Shen Z (2013) X-ray CT image reconstruction via wavelet frame based regularization and radon domain inpainting. J Sci Comput 54(2–3):333–349 18. Yang LH (2012) Research on image restoration of space camera with wide field of view. Chinese Academy of Sciences, Beijing 19. Zhang XS, Jiang J, Peng SL (2012) Blind super-resolution reconstruction algorithm under affine motion model. Pattern Recogn Artif Intell 25(4):648–655 20. Fang S, Ying K, Zhao L, Cheng JP (2010) Coherence regularization for SENSEreconstruction using a nonlocal operator (CORNOL). Magn Reson Med 64(5):1414–1426 21. Zuo W, Lin Z (2011) A generalized accelerated proximal gradient approach for total variationbased image restoration. IEEE Trans Image Process 20(10):2748–2759 22. Dong FF (2010) New models and fast algorithms in image restoration and segmentation. Zhe Jiang University, Hangzhou 23. Yao W (2010) Research on image quality improvement based on partial differential equations and calculus of variation. National University of Defense Technology, Changsha 24. Zhang WX (2012) Augmented Lagrangian type algorithms and their applications in image processing. Nangjing University, Nangjing 25. Jiao LC, Hou B, Wang S, Liu F (2008) Image multiscale geometric analysis: theory and applications-beyond wavelets. Xidian University Press, Xi’an 26. Feng XC, Wang WW (2009) Variational and partial differential equation methods in image processing. Science Press, Beijing
References
29
27. Zhang WJ, Feng XC, Wang XD (2012) Mumford-Shah model based on weighted total generalized variation. Acta Automatica Sinica 38(12):1913–1922 28. Chan RH, Ma J (2012) A multiplicative iterative algorithm for box-constrained penalized likelihood image restoration. IEEE Trans Image Process 21(7):3168–3181 29. Chan RH, Tao M, Yuan X (2013) Constrained total variational deblurring models and fast algorithms based on alternating direction method of multipliers. SIAM J Imag Sci 6(1):680– 697 30. Wen Y, Chan RH (2012) Parameter selection for total-variation-based image restoration using discrepancy principle. IEEE Trans Image Process 21(4):1770–1781 31. Gonzalez RC, Woods RE, Eddins SL (2004) Digital image processing using MATLAB. Pearson Prentice Hall 32. Zou MY (2001) Deconvolution and signal recovery. National Defense Industry Press, Beijing 33. Galatsanos NP, Katsaggelos AK (1992) Methods for choosing the regularization parameter and estimating the noise variance in image restoration and their relation. IEEE Trans Image Process 1(3):322–336 34. Green PJ (1990) Bayesian reconstructions from emission tomography data using a modified EM algorithm. IEEE Trans Med Imaging 9(1):84–93 35. Besag J (1993) Toward Bayesian image analysis. J Appl Stat 16(3):395–407 36. Geman D, Yang C (1995) Nolinear image recovery with half-quadratic regularization. IEEE Trans Image Process 4(7):932–946 37. Chan T, Shen J (2005) Image processing and analysis: variational, PDE, wavelet, and stochastic methods. SIAM, Philadelphia 38. Allard WK (2009) Total variation regularization for image denoising, III. Examples. SIAM J Imaging Sci 2(2):532–568 39. Chambolle A, Levine SE, Lucier BJ (2011) An upwind finite-difference method for total variation-based image smoothing. SIAM J Imag Sci 4(1):277–299 40. Getreuer P (2011) Contour stencils: total variation along curves for adaptive image interpolation. SIAM J Imag Sci 4(3):954–979 41. Chambolle A, Lions PL (1997) Image recovery via total variation minimization and related problems. Numer Math 76(2):167–188 42. Chan T, Marquina A, Mulet P (2000) Higher order total variation-based image restoration. SIAM J Sci Comput 22(2):503–516 43. Chan T, Esedoglu S, Park FE (2005) A fourth order dual method for staircase reduction in texture extraction and image restoration problems. UCLA CAM Report 05-28, UCLA, Los Angeles 44. Maso GD, Fonseca I, Leoni G, Morini M (2009) A higher order model for image restoration: the one-dimensional case. SIAM J Math Anal 40(6):2351–2391 45. Stefan W, Renaut RA, Gelb A (2010) Improved total variation-type regularization using higher order edge detectors. SIAM J Imag Sci 3(2):232–251 46. Bredies K, Kunisch K, Pock T (2010) Total generalized variation. SIAM J Imag Sci 3(3):492– 526 47. Bredies K, Dong Y, Hintermüller M (2013) Spatially dependent regularization parameter selection in total generalized variation models for image restoration. Int J Comput Math 90(1):109–123 48. Yang Z, Jacob M (2013) Nonlocal regularization of inverse problems: a unified variational framework. IEEE Trans Image Process 22(8):3192–3203 49. Hu Y, Jacob M (2012) Higher degree total variation (HDTV) regularization for image recovery. IEEE Trans Image Process 21(5):2559–2571 50. Hu Y, Ongie G, Ramani S, Jacob M (2014) Generalized higher degree total variation (HDTV) regularization. IEEE Trans Image Process 23(6):2423–2435 51. Lefkimmiatis S, Ward JP, Unser M (2013) Hessian schatten-norm regularization for linear inverse problems. IEEE Trans Image Process 22(5):1873–1888 52. Lao DZ (2014) Fundamentals of the calculus of variations, 3rd edn. National Defense Industry Press, Beijing
30
1 Introduction
53. Perona P, Malik J (1990) Scale-space and edge detection using anisotropic diffusion. IEEE Trans Patten Anal Mach Intell 12(7):629–639 54. Li W, Wang Z, Deng Y (2012) Efficient algorithm for nonconvex minimization and its application to PM regularization. IEEE Trans Image Process 21(10):4322–4333 55. Guo Z, Sun J, Zhang D, Wu B (2012) Adaptive Perona-Malik model based on the variable exponent for image denoising. IEEE Trans Image Process 21(3):958–967 56. Hajiaboli M, Ahmad M, Wang C (2012) An edge-adapting Laplacian kernel for nonlinear diffusion filters. IEEE Trans Image Process 21(4):1561–1572 57. Weickert J (1995) Multiscale texture enhancement in computer analysis of images and patterns. In: Lecture notes in computer science. Springer, pp 230–237 58. Caselles V, Morel J (1998) Introduetion to the special issue on partial differential equations and geometry-driven diffusion in image proeessing and analysis. IEEE Trans Image Process 7(3):269–273 59. Mallat S (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693 60. Figueiredo MAT, Nowak RD (2003) An EM algorithm for wavelet-based image restoration. IEEE Trans Image Process 12(8):906–916 61. Neelamani R, Choi H, Baraniuk R (2004) ForWaRD: Fourier-wavelet regularized deconvolution for ill-conditioned systems. IEEE Transactions Signal Processing 52(2):418–433 62. Chai A, Shen Z (2007) Deconvolution: a wavelet frame approach. Numer Math 106(4):529– 587 63. Kadri-Harouna S, Dérian P, Héas P, Mémin E (2013) Divergence-free wavelets and high order regularization. Int J Comput Vis 103(1):80–99 64. Cai J, Shen Z (2010) Framelet based deconvolution. J Comput Math 28(3):289–308 65. Shen Z, Toh K, Yun S (2011) An accelerated proximal gradient algorithm for frame-based image restoration via the balanced approach. SIAM J Imag Sci 4(2):573–596 66. Fornasier M, Kim Y, Langer A, Schönlieb CB (2011) Wavelet decomposition method for L2/ TV-image deblurring. SIAM J Imag Sci 5(3):857–885 67. Xie S, Rahardja S (2012) Alternating direction method for balanced image restoration. IEEE Trans Image Process 21(11):4557–4567 68. Xue F, Luisier F, Blu T (2013) Multi-wiener SURE-LET deconvolution [J]. IEEE Trans Image Process 22(5):1954–1968 69. Ho J, Hwang W (2013) Wavelet Bayesian network image denoising. IEEE Trans Image Process 22(4):1277–1290 70. Zhang Y, Kingsbury N (2013) Improved bounds for subband-adaptive iterative shrink-age/ thresholding algorithms. IEEE Trans Image Process 22(4):1373–1381 71. Candès E, Ridgelets J (1998) Theory and Applications. Department of Statistics, Standford University 72. Candès E, Curvelets J (1999) Department of Statistics, Tech. Report, Standford University 73. Meyer FG, Coifman RR (1997) Brushlets: a tool for directional image analysis and image compression. Applied and Computational Harmonic Analysis 5:147–187 74. Donoho DL, Huo XM (2001) Beamlets and Multiscale Image Analysis. Tech. Report, Standford University 75. Donoho DL (1997) Wedgelets: Nearly Minimax Estimation of Edges. Tech. Report, Standford University 76. Welland G (2003) Beyond Wavelets. Waltham: Academic Press 77. Pennec EL, Mallat S (2003) Non linear image approximation with bandelets. Tech. Report, CMAP Ecole Polytechnique 78. Labate D, Lim WQ, Kutyniok G, Weiss G (2005) Sparse multidimensional representation using shearlets. Proceedings of SPIE, Bellingham, WA 79. Han B, Kutyniok G, Shen Z (2011) Adaptive multiresolution analysis structures and shear-let systems. SIAM J Numer Anal 49(5), 1921–1946 80. Kutyniok G, Shahram M, Zhuang X (2012) Shearlab: a rational design of a digital parabolic scaling algorithm. SIAM J Imaging Sci 5(4): 1291–1332
References
31
81. Kutyniok G, Labate D (2012) Shearlets: Multiscale Analysis for Multivariate Data [M]. Dordrecht: Springer 82. Häuser S, Steidl G (2014) Fast finite shearlet transform. Preprint, arXiv: 1202.1773 83. He C, Hu C, Zhang W (2014) Adaptive shearlet-regularied image deblurring via alternating direction method [C]. IEEE Conference on Multimedia and Expo, Chengdu, Sichuan, China 84. He C, Hu C, Zhang W (2014) Adaptive shearlet-regularied image deblurring via alternating direction method. In: IEEE conference on multimedia and expo, Chengdu, Sichuan 85. Cai J, Dong B, Osher S, Shen Z (2012) Image restoration: total variation, wavelet frames, and beyond. J Am Math Soc 25(4):1033–1089 86. Hu W, Li W, Zhang X, Maybank S (2015) Single and multiple object tracking using a multifeature joint sparse representation. IEEE Trans Pattern Anal Mach Intell 37(4):816–833 87. He R, Zheng W, Tan T, Su Z (2014) Half-quadratic based iterative minimization for robust sparse representation. IEEE Trans Pattern Anal Mach Intell 36(2):261–275 88. Xu Y, Yin W (2013) A fast patch-dictionary method for whole-image recovery. UCLA CAM Report 13-38, UCLA, Los Angeles 89. Bhujle H, Chaudhuri S (2014) Novel speed-up strategies for non-local means denoising with patch and edge patch based dictionaries. IEEE Trans Image Process 23(1):356–365 90. Jia K, Wang X, Tang X (2013) Image transformation based on learning dictionaries across image spaces. IEEE Trans Pattern Anal Mach Intell 35(2):367–380 91. Xu Y, Hao R, Yin W, Su Z (2013) Parallel matrix factorization for low-rank tensor completion. UCLA CAM Report 13-77, UCLA, Los Angeles 92. Liu G, Lin Z, Yan S, Sun J, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184 93. Ren X, Lin Z (2013) Linearized alternating direction method with adaptive penalty and warm starts for fast solving transform invariant low-rank textures. Int J Comput Vis 104:1–14 94. Ono S, Miyata T, Yamada I (2014) Cartoon-texture image decomposition using blockwise low-rank texture characterization. IEEE Trans Image Process 23(3):1128–1142 95. Deng Y, Dai Q, Liu R, Zhang Z, Hu S (2013) Low-rank structure learning via nonconvex heuristic recovery. IEEE Trans Neural Netw Learn Syst 24(3):383–396 96. Lin ZC (2013) Rank minimization: theory, alorithms and application. In: Zhang CS, Yang Q (eds) Machine learning and its application. Tsinghua University Press, Beijing, pp 149–169 97. Gou S, Wang Y, Wang Z, Peng Y, Zhang X, Jiao L, Wu J (2013) CT image sequence restoration based on sparse and low-rank decomposition. PLoS ONE 8(9):1–10 98. Cheng B, Liu G, Wang J, Huang Z, Yan S (2011) Multi-task low-rank affinity pursuit for image segmentation. In: Proceedings of international conference on computer vision (ICCV) 99. Gao H, Cai J, Shen Z, Zhao H (2011) Robust principal component analysis-based fourdimensional computed tomography. Phys Med Biol 56:3181–3198 100. Bardsley JM, Goldes J (2009) Regularization parameter selection methods for ill-posed Poisson maximum likelihood estimation. Inverse Prob 25(9):095005 101. Carlavan M, Blanc-Feraud L (2012) Sparse Poisson noisy image deblurring. IEEE Trans Image Process 21(4):1834–1846 102. Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741 103. Zhu S, Mumford D (1997) Prior learning and Gibbs reaction-diffusion. IEEE Trans Pattern Anal Mach Intell 19(11):1236–1250 104. Molina R, Mateos J, Katsaggelos AK (2006) Blind deconvolution using a variational approach to parameter, image, and blur estimation. IEEE Trans Image Process 15(12):3715–3727 105. Molina R, Vega M, Mateos J, Katsaggelos AK (2008) Variational posterior distribution approximation in Bayesian super resolution reconstruction of multispectral images. Appl Comput Harmonic Anal 24(2):251–267 106. Willing M, Hinton G, Osindero S (2003) Learning sparse topographic representation with products of students-t distribution. NIPS 15:1359–1366 107. Babacan SD, Molina R, Katsaggelos AK (2008) Parameter estimation in TV image restoration using variational distribution approximation. IEEE Trans Image Process 17(3):326–339
32
1 Introduction
108. Babacan SD, Molina R, Katsaggelos AK (2008) Generalized Gaussian Markov field image restoration using variational distribution approximation. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’08), Las Vegas, Nevada 109. Babacan SD, Molina R, Katsaggelos AK (2009) Variational Bayesian blind deconvolution using a total variation prior. IEEE Trans Image Process 18(1):12–26 110. Babacan SD, Wang J, Molina R, Katsaggelos AK (2010) Bayesian blind deconvolution from differently exposed image pairs. IEEE Trans Image Process 19(11):2874–2888 111. Amizic B, Molina R, Katsaggelos AK (2012) Sparse Bayesian blind image deconvolution with parameter estimation. Eurasip J Image Video Process 2012(20):15 112. Chen Z, Babacan SD, Molina R, Katsaggelos AK (2014) Variational Bayesian methods for multimedia problems. IEEE Trans Multim 16(4):1000–1017 113. Bioucas-Dias JM, Figueiredo MAT (2008) An iterative algorithm for linear inverse problems with compound regularizers. In: Proceedings of IEEE international conference image processing (ICIP), San Diego, CA, USA 114. Lee D, Jeong S, Lee Y, Song B (2013) Video deblurring algorithm using accurate blur ker-nel estimation and residual deconvolution. IEEE Trans Image Process 22(3):926–940 115. Katsaggelos AK (1999) Iterative image restoration algorithms. In: Madisetti VK, Williams DB (eds) Digital signal processing handbook.CRC Press LLC, Boca Raton 116. Chan T, Mulet P (1999) On the convergence of the lagged diffusivity fixed point method in total variation image restoration. SIAM J Numer Anal 36:354–367 117. Chambolle A (2004) An algorithm for total variation minimization and applications. J Math Imaging Vis 20(1–2):89–97 118. Goldfarb D, Yin W (2005) Second-order cone programming methods for total variation-based image restoration. SIAM J Sci Comput 27(2):622–645 119. Figueiredo M, Nowak R, Wright S (2007) Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J Selected Topics Signal Process 1(4):586–597 120. Koh K, Kim S, Boyd S (2007) An interior-point method for large-scale 1-regularized logistic regression. J Mach Learn Res 8(8):1519–1555 121. Bertaccini D, Sgallari F (2010) Updating preconditioners for nonlinear deblurring and denoising image restoration. Appl Numer Math 60(10):994–1006 122. Combettes PL, Pesquet J (2010) Proximal splitting methods in signal processing. In: Bauschke HH et al (eds) Fixed-point algorithms for inverse problems in science and engineering. Springer, New York 123. Yin W, Osher S, Goldfarb D et al (2008) Bregman iterative algorithms for l1-minimization with applications to compressend sensing. SIAM J Imag Sci 1(1):143–168 124. Cai J, Osher S, Shen Z (2009) Convergence of the linearized Bregman iteration for L1-norm minimization. Math Comp 78(268):2127–2136 125. Goldstein T, Osher S (2009) The split Bregman method for L1-regularized problems. SIAM J Imag Sci 2(2):323–343 126. Wang Y, Yang J, Yin W, Zhang Y (2008) A new alternating minimization algorithm for total variation image reconstruction. SIAM J Imag Sci 1(3):248–272 127. He B, Yuan X (2012) On the O(1/n) convergence rate of the douglas-rachford alternating direction method. SIAM J Numer Anal 50(2):700–709 128. He B, Yuan X (2012) On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. http://www.optimization-online.org/DBHTML/2012/01/3318. html 129. Zhang X, Burger M, Bresson X, Osher S (2010) Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J Imag Sci 3(3):253–276 130. Matakos A, Ramani S, Fessler J (2013) Accelerated edge-preserving image restoration without boundary artifacts. IEEE Trans Image Process 22(5):2019–2029 131. Chen D (2014) Regularized generalized inverse accelerating linearized alternating minimization algorithm for frame-based poissonian image deblurring. SIAM J Imag Sci 7(2):716–739
References
33
132. Woo H, Yun S (2013) Proximal linearized alternating direction method for multiplicative denoising. SIAM J Sci Comput 35(2):336–358 133. Yang J, Yuan X (2013) Linearized augmented Lagrange and alternating direction methods for nuclear norm minimization. Math Comput 82(281):301–329 134. Ng MK, Wang F, Yuan X (2011) Inexact alternating direction methods for image recovery. SIAM J Sci Comput 33(4):1643–1668 135. Jeong T, Woo H, Yun S (2013) Frame-based Poisson image restoration using a proximal linearized alternating direction method [J]. Inverse Prob 29(7):075007 136. Cai X, Gu G, He B, Yuan X (2013) A proximal point algorithms revisit on the alternating direction method of multipliers. Sci China Math 56(10):2179–2186 137. Glowinski R (2014) On alternating directon methods of multipliers: a historical perspective. In: Fitzgibbon W et al (eds) Modeling, simulation and optimization for science and technology. Springer, Dordrecht, pp 59–82 138. Combettes PL, Wajs VR (2005) Signal recovery by proximal forward-backward splitting. Multiscale Model Simul 4(4):1168–1200 139. Beck A, Teboulle M (2009) Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans Image Process 18(11):2419–2434 140. Combettes PL, Pesquet J (2007) A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE J Selected Topics Signal Process 1(4):564–574 141. He B, Liu H, Wang Z, Yuan X (2014) A strictly contractive Peaceman-Rachford splitting method for convex programming. SIAM J Optim 24(3):1011–1040 142. Davis D, Yin W (2014) Convergence rate analysis of several splitting schemes. UCLA CAM Report 14-51, UCLA, Los Angeles 143. Combettes PL, Condat L, Pesquet J-C et al (2014) A forward-backward view of some primaldual optimization methods in image recovery. In: Proceedings of the IEEE international conference on image processing. Paris, France 144. Chambolle A, Pock T (2011) A first-order primal-dual algorithm for convex problems with applications to imaging. J Math Imag Vis 40(1):120–145 145. Zhu M, Chan T (2008) An efficient primal-dual hybrid gradient algorithm for total variation image restoration. UCLA CAM Report 08-34, UCLA, Los Angeles 146. Fix A, Wang C, Zabih R (2014) A primal-dual method for higher-order multilabel markov random fields. In: Proceedings of IEEE conference computer vision and pattern recognition 147. Alghamdi MA, Alotaibi A, Combettes PL, Shahzad N (2014) A primal-dual method of partial inverses for composite inclusions. Optim Lett 8(8):2271–2284 148. Condat L (2013) A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J Optim Theory Appl 158(2):460–479 149. Chen P, Huang J, Zhang X (2013) A primal-dual fixed point algorithm for convex separable minimization with applications to image restoration. Inverse Prob 29:025011 150. Combettes PL, Pesque J (2012) Primal-dual splitting algorithm for solving inclusions with mixtures of composite, lipschitzian, and parallel-sum type monotone operators. Set Valued Variat Anal 20(2):307–330 151. Setzer S (2011) Operator splittings, Bregman methods and frame shrinkage in image processing. Int J Comput Vis 92(3):265–280 152. Yan M, Yin W (2014) Self equivalence of the alternating direction method of multipliers. UCLA CAM Report 14-59, UCLA, Los Angeles 153. He B, Hou L, Yuan X (2013) On full Jacobian decomposition of the augmented Lagrange method for separable convex programming. http://www.optimization-online.org/DB_HTML/ 2013/09/4059.html 154. Liu R, Lin Z, Su Z (2013) Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning. Mach Learn 99:287–325 155. Becker SR, Combettes PL (2014) An algorithm for splitting parallel sums of linearly composed monotone operators, with applications to signal recovery. J Nonlinear Convex Anal 15(1):137– 159
34
1 Introduction
156. Eckstein J, Mátyásfalvi G (2015) Object-parallel infrastructure for implementing first-order methods with an example application to LASSO. http://www.optimization-online.org/DB_ HTML/2015/01/4748.html 157. He B, Liu H, Lu J, Yuan X (2014) Application to the strictly contractive Peace-man-Rachford splitting method to multi-block separable convex optimization. http://www.optimization-onl ine.org/DB_HTML/2014/05/4358.html 158. Davis D (2014) Convergence rate analysis of primal-dual splitting schemes. UCLA CAM Report 14-63, UCLA, Los Angeles 159. Davis D, Yin W Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions. Optim Control (submitted) 160. Deng W, Yin W (2012) On the global and linear convergence of the generalized alternating direction method of multipliers. UCLA CAM Report 12-52, UCLA, Los Angeles 161. Lin T, Ma S, Zhang S (2014) On the global linear convergence of the ADMM with multi-block variables. UCLA CAM Report 14-92, UCLA, Los Angeles 162. Shi W, Ling Q, Yuan K, Wu G, Yin W (2014) On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans Signal Process 62(7):1750–1761 163. Goldstein T, O’Donoghue B, Setzer S (2012) Fast alternating direction optimization methods. UCLA CAM Report 12-35, UCLA, Los Angeles 164. Nesterov Y (1983) A method of solving a convex programming problem with convergence rate o(1/k2). Soviet Math Doklady 27(2):372–376 165. Yang J, Yin W, Zhang Y, Wang Y (2009) A fast algorithm for edge-preserving variational multichannel image restoration. SIAM J Imag Sci 2(2):569–592 166. Wu C, Tai X (2010) Augmented Lagrange method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models. SIAM J Imag Sci 3(3):300–339 167. He C, Hu C, Zhang W, Shi B (2014) A fast adaptive parameter estimation for total variation image restoration. IEEE Trans Image Process 23(12):4954–4967 168. Morozov VA (1984) Methods for solving incorrectly posed problems. Springer-Verlag, New York (translated from the Russian by Aries A B, translation edited by Nashed Z) 169. Wen Y, Yip AM (2009) Adaptive parameter selection for total variation image deconvolution. Numer Math Theory Methods Appl 2(4):427–438 170. Afonso MV, Bioucas-Dias JM, Figueiredo M (2010) Fast image recovery using variable splitting and constrained optimization. IEEE Trans Image Process 19(9):2345–2356 171. Ng M, Weiss P, Yuan X (2010) Solving constrained total-variation image restoration and reconstruction problems via alternating direction methods. SIAM J Sci Comput 32(5):2710– 2736 172. Golub GH, Heath M, Wahba G (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2):215–223 173. Liao H, Li F, Ng M (2009) Selection of regularization parameter in total variation image restoration. J Opt Soc Am A 26(11):2311–2320 174. Hansen PC (1992) Analysis of discrete ill-posed problems by means of the L-curve. SIAM Rev 34(4):561–580 175. Engl H, Grever W (1994) Using the L-curve for determining optimal regularization parameters. Numer Math 69(1):25–31 176. Lin Y, Wohlberg B, Guo H (2010) UPRE method for total variation parameter selection. Signal Process 90(8):2546–2551 177. Montefusco LB, Lazzaro D (2012) An iterative l1-based image restoration algorithm with an adaptive parameter estimation. IEEE Trans Image Process 21(4):1676–1686
Chapter 2
Mathematical Fundamentals
2.1 Summarize Traditionally speaking, image processing is a branch of signal processing that has been built on the mechanisms of Fourier analysis and spectral analysis. In the last few decades, a large number of new methods and tools have been introduced into the field for better processing images, such as variational methods linked to many image geometric regularities, applied harmonic analysis such as wavelets, stochastic methods based on random field theory and Bayesian inference theory, artificial intelligence methods, machine learning methods, Hilbert space theory and methods related to optimal objective function solution, etc. It is clearly impossible to cover all the mathematical fundamentals in one book. This chapter focuses on the basics of convolution, Fourier transform, and Hilbert space as the theoretical foundation for the subsequent chapters of the book.
2.2 Convolution 2.2.1 One-Dimensional Discrete Convolution Let the impulse response of a linear time-invariant system be h(t), then its output signal y(t) can be expressed as the convolution of the input signal x(t) and h(t) as follows ∫+∞ y(t) = h(τ )x(t − τ )dτ
(2.1)
−∞
© Chemical Industry Press 2023 C. He and C. Hu, Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems, Advanced and Intelligent Manufacturing in China, https://doi.org/10.1007/978-981-99-3750-9_2
35
36
2 Mathematical Fundamentals
Systems can be either causal or non-causal. A system is said to be stable if and when it is bounded-input-bounded-output (BIBO), i.e., ∫+∞ |h(τ )|dτ < ∞
(2.2)
−∞
It is easy to show that h(t) and x(t) in Eq. (2.1) are interchangeable, i.e., we have ∫+∞ y(t) = x(τ )h(t − τ )dτ
(2.3)
−∞
For simplicity, Eq. (2.1) and Eq. (2.3) are usually written as y(t) = h(t) ∗ x(t) = x(t) ∗ h(t)
(2.4)
where ∗ is the convolution operator. Some operational properties of convolution can be found in textbooks related to signals and systems and are omitted here. In order to perform signal processing on a computer, the continuous signal and the system response must be digitized. Digitization consists of two things: sampling and quantization. For analytical convenience, analog sampling sequences are usually discussed. By noting the analog sampling sequences of x(t), h(t), and y(t) as x(n), h(n), and y(n), respectively, the discrete form corresponding to Eq. (2.1) is y(n) =
+∞ ∑
h(n − k)x(k) =
k=−∞
+∞ ∑
h(k)x(n − k)
(2.5)
k=−∞
If the system is causal, the second term of Eq. (2.5) can only be summed up to k = n, while the third term can only be summed up from k = 0. The computer can only handle finite discrete convolutions. Given the sequence x(n), n = 0, 1, 2, . . . , N − 1 and h(n), n = 0, 1, 2, . . . , M − 1, where M and N are positive integers, then Eq. (2.5) becomes y(n) =
N −1 ∑ k=0
x(k)h(n − k) =
M−1 ∑
h(k)x(n − k), n = 0, 1, 2, . . . , L − 1 (2.6)
k=0
where L = M + N − 1 is the length of the sequence y(n). To facilitate analysis, the discrete convolution formula can be written in vector form. Note that ]T [ x = x0 , x1 , x2 , . . . , x N −1
(2.7)
2.2 Convolution
37
]T [ h = h 0 , h 1 , h 2 , . . . , h M−1
(2.8)
]T [ y = y0 , y1 , y2 , . . . , y L−1
(2.9)
It is easy to verify that the convolution Eq. (2.6) can be written as y = F (h,N ) x = F (x,M) h
(2.10)
where F (h,N ) is a matrix consisting of the elements of h with N columns of the following form ⎡
F (h,N )
h0 h1 .. . .. .
⎢ h0 ⎢ ⎢ ⎢ ⎢ ⎢ .. ⎢ . ⎢ =⎢ ⎢ h M−1 ⎢ ⎢ ⎢ h M−1 ⎢ ⎢ ⎣
⎤ ..
.
h0 h1 .. . .. .. . . h M−1
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
(2.11)
F (x,M) is similar to F (h,N ) and is a matrix consisting of elements of x with M columns. Equation (2.10) shows that the two convolution factors are interchangeable, and both of them are called convolution kernel matrices. Consider a common situation. Suppose that the discrete system has a finite sequence of impulse responses h(n), and the input sequence x(n) can be considered as an infinite sequence from the past to the future, but only a fraction of the samples of the output sequence y(n) can be observed. From the convolution operation, it is clear that in order to obtain L samples of y(n), all M samples of h(n) and M + L − 1 samples of x(n) are used. And in fact, the convolution length of h(n) of length M and x(n) of length M + L − 1 is a sequence of length 2M + L − 2. Obviously, the L samples of y(n) are only a part of it, so it is called partial convolution, while the previous full convolution is also called regular convolution. Using the vector–matrix form, the partial convolution can be written as
38
2 Mathematical Fundamentals
⎡
y0 ⎢y ⎢ 1 ⎢ ⎢ y2 ⎢ ⎢ . ⎢ . ⎣ . y L−1
⎡
h M−1 h M−2 · · · h 0 . ⎢ . h M−1 . . .. ⎥ ⎢ ⎥ ⎢ .. ⎥ ⎢ . h M−2 ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ h M−1 ⎦ ⎢ ⎢ ⎣ ⎤
..
.
h0 . .. . .. . .. .. . . h0 h M−1 · · · h 1 h 0 ..
⎤⎡ ⎤ x−M+1 ⎥⎢ ⎥⎢ .. ⎥ ⎥⎢ . ⎥ ⎥ ⎥⎢ ⎥⎢ x−1 ⎥ ⎥ ⎥⎢ ⎥⎢ x0 ⎥ ⎥ ⎥⎢ ⎥⎢ . ⎥ ⎥⎣ .. ⎥ ⎦ ⎦ x L−1
(2.12)
Partial convolution can be written as y p = F Th x
(2.13)
where F h of size N × (N − M + 1) is the convolution kernel derived from h(n), which is the inverse arrangement of the sequence h(n). It is easy to find the following relationship between one-dimensional partial convolution and full convolution (2.14) where is the partial convolution operator; D(M:N ) [·] is the range-limited operator, and the result of partial convolution can be viewed as taking only a segment from n = M to n = N of the full convolution. It should be noted that partial convolution does not conform to the commutative law, the lengths of x and h determine their role in the convolution, and usually the short sequence is used as convolution kernel. Many practical situations arise with partial convolution.
2.2.2 Two-Dimensional Discrete Convolution Assume that the two-dimensional finite sequences x(m, n) and h(m, n) are defined on the finite set of grid points Z x = {(m, n) ∈ z2 |0 ≤ m ≤ N 1 − 1, 0 ≤ n ≤ N 2 − 1} and Z h = {(m, n) ∈ z2 |0 ≤ m ≤ M 1 − 1, 0 ≤ n ≤ M 2 − 1}, respectively, where z2 is the set consisting of integer grid points on the two-dimensional plane. Their two-dimensional convolution can be written as y(m, n) =x(m, n) ∗ h(m, n) =
N 1 −1 N 2 −1 ∑ ∑
x(k, l)h(m − k, n − l)
k=0 l=0
=
M 1 −1 M 2 −1 ∑ ∑ k=0
l=0
h(k, l)x(m − k, n − l)
2.2 Convolution
39
m =0, 1, . . . , M1 + N1 − 2; n = 0, 1, . . . , M2 + N2 − 2
(2.15)
Clearly, y(m, n) is meaningful only on the set of (M1 + N1 − 1)×(M2 + N2 − 1) grid points, i.e., its support domain is also finite. Similar to the one-dimensional finite discrete convolution, the two-dimensional finite discrete convolution can also be expressed in vector–matrix form. The transpose vectors of each row of the two-dimensional sequence are arranged one after the other into a single column vector from the first row, i.e., a lexicographic arrangement is used. Thus, Eq. (2.15) can be written as y = Fh x = Fx h
(2.16)
The expansion of y = F h x is ⎡
y0 y1 .. .
⎤
⎡
F (h 0 ,N2 ) F (h 1 ,N2 ) .. .
⎤
⎢ ⎥ ⎢ ⎥ F (h 0 ,N2 ) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎡ ⎤ .. ⎢ ⎥ ⎢ ⎥ . x0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ F (h 0 ,N2 ) ⎢ y M1 −1 ⎥ ⎢ F (h M1−1 ,N2 ) · · · ⎥⎢ x 1 ⎥ ⎢ ⎢ ⎥ ⎥⎢ . ⎥ = .. .. .. ⎢ ⎥ ⎢ ⎥⎣ . ⎥ . . ⎢ ⎥ ⎢ ⎥ . . ⎦ ⎢ ⎥ ⎢ ⎥ ⎢ y N1 −1 ⎥ ⎢ F (h M1−1 ,N2 ) · · · F (h 0 ,N2 ) ⎥ ⎢ ⎥ ⎢ ⎥ x N1 −1 .. ⎢ ⎥ ⎢ ⎥ .. ⎣ ⎦ ⎣ ⎦ . . F (h M1−1 ,N2 ) y M1 +N1 −2 (2.17) ⎤ ⎡ h (i,0) ⎥ ⎢ h ⎥ ⎢ (i,1) h (i,0) ⎥ ⎢ . .. ⎥ ⎢ .. . ⎥ ⎢ ⎥ ⎢ h (i,0) ⎥ ⎢ h (i,M2 −1) · · · ⎥ ⎢ (2.18) F (h i1 ,N2 ) = ⎢ .. .. ⎥ . . ⎥ ⎢ ⎥ ⎢ ⎢ h (i,M2 −1) · · · h (i,0) ⎥ ⎥ ⎢ ⎥ ⎢ .. ⎦ ⎣ . h (i,M2 −1) where yi and x i denote the transpose of the (i + 1)st row of y and x, respectively. F (h i1 ,N2 ) denotes the convolution kernel matrix formed by the elements of the (i + 1)st row of h. Its size is (M2 + N2 − 1) × N2 and the size of the large block matrix is (M1 + N1 − 1) × N1 , so the size of the whole matrix is (M1 + N1 − 1)(M2 + N2 − 1) × N1 N2 . Two-dimensional partial convolution has a similar meaning to one-dimensional partial convolution. In many cases, we can observe and process only a small piece of an image in a very wide area extension. If the image being observed is a blurred
40
2 Mathematical Fundamentals
image caused by some convolution factor, the observed image is a partial convolution of the original image and the convolution factor. Assuming a point spread function of size M1 × M2 , expressing a partial convolution of L 1 × L 2 involves a portion of the original image of size (L 1 + M1 − 1) × (L 2 + M2 − 1). Let the finite sequence x(m, n) and h(m, n) be of dimensions N1 × N2 and M1 × M2 , respectively, with N1 ≥ M1 and N2 ≥ M2 . The partial convolution expressions are y P = F Th x
(2.19)
where F h is the convolution kernel matrix of size N1 N2 × (N1 − M1 + 1)(N2 − M2 + 1) derived from h (m, n), which is the inverse arrangement of h(m, n). The relationship between the two-dimensional partial convolution and the full convolution is (2.20) That is, rows M1 to N1 are retained in the larger block matrix, while rows M2 to N2 are retained in the smaller matrix.
2.3 Fourier Transform and Discrete Fourier Transform Let the continuous time signal x(t) be absolutely productable on (−∞, +∞), i.e., ∫∞ |x(t)|dt < ∞
(2.21)
−∞
Then x(t) has the Fourier transform ∫∞ X ( jΩ) =
x(t)e− jΩt dt,
(2.22)
−∞
where the angular frequency Ω = 2π f and f is the frequency variable. Its inverse transformation is 1 x(t) = 2π
∫∞ X ( jΩ)e jΩt dΩ
(2.23)
−∞
The Fourier transform and its inverse transform allow us to understand a signal from both the time and frequency domains.
2.3 Fourier Transform and Discrete Fourier Transform
41
Consider the δ sampling sequence [ xs (t) = x(t)
∞ ∑
] δ(t − nT )
(2.24)
n=−∞
where T is the signal sampling period and δ(t − nT ) is the Driac δ function. For any continuous function y(t), we have ∫∞ y(t)δ(t − nT )dt = y(nT )
(2.25)
−∞
The Fourier transform of xs (t) is X s ( jΩ) =
∞ ∑
x(nT )e− jΩnT
(2.26)
n=−∞
/ It is easy to verify that X s ( jΩ) is a periodic function with a period of 2π T , which means that just X s ( jΩ) on one cycle represents the frequency domain information of xs (t). If the sampling sequence x(nT ) is used instead of the δ-sampling sequence xs (t), then x(nT ) should be able to be obtained using X s ( jΩ) over one period. A direct calculation can prove the inverse transformation formula corresponding to Eq. (2.26) is as follows T x(nT ) = 2π
∫π/T X s ( jΩ)e jΩnT dΩ.
(2.27)
−π/T
For simplicity, note that w = ΩT and x(nT ) is abbreviated to x(n) with sampling period T normalized, and Eqs. (2.26) and (2.27) become ∞ ∑ ( ) Xs e jw = x(n)e− j wn
(2.28)
n=−∞
1 x(n) = 2π
∫π
( ) X s e jw e jwn dw
(2.29)
−π
Equations (2.28) and (2.29) are called the Discrete Time Fourier Transform (DTFT) and ( the ) Inverse Discrete Time Fourier Transform (IDTFT), respectively, where X s e j w is a continuous periodic function whose period is 2π . The computation of IDTFT involves complex integration, which is extremely inconvenient to compute. The situation is much better if the sequence x(n) is a
42
2 Mathematical Fundamentals
periodic sequence. In fact, in practice, all that a computer can handle is a discrete sequence of finite length, which can be extended periodically for computational convenience, while its main spectral information can still be preserved. On the one hand, because the periodic function x(n) can be represented by a Fourier series, i.e., its spectrum is discrete; on the other hand, x(n) is a sampled sequence whose Fourier transform is a discrete spectral function with period ws . Suppose x(n) (x(nT )) is a discrete sequence with period N T and according to domain is Fourier / series theory, the spectral line space of X s ( jΩ) in the frequency / 2π N T . There are exactly N spectral lines on a period ws = 2π T . By normalizing the sampling period T , the DTFT of x(n) for a discrete sequence of period N simply computes N discrete spectral line values. The discrete Fourier transform of a discrete periodic sequence x(n) can then be defined as X (k) =
N −1 ∑
x(n)e− j2π nk / N
(2.30)
n=0
Its corresponding inverse discrete Fourier transform (IDFT) equation is x(n) =
N −1 1 ∑ X (k)e j2π nk / N N k=0
(2.31)
An important advantage of the IDFT is that it avoids the computation of complex integrals. However, it should be emphasized that when employing the DFT, the process of extending the finite sequence into a periodic sequence is implied. An important reason why the DFT is very widely used is that it has fast algorithms (FFT). Due to the symmetry in the time and frequency domains, the DFT has the following two pairwise important properties, the instant domain circular convolution property as follows DFT(x(n) ∗ y(n)) = X (k)Y (k)
(2.32)
and the time-domain product property as follows DFT(x(n)y(n)) = X (k) ∗ Y (k)
(2.33)
It should be emphasized that the sequence lengths of x(n) and y(n) should be the same, and if they are not, they can be made consistent by making up the zeros. Other related properties of the DFT can be found in books with respect to signal processing and will not be repeated here. Because Eq. (2.32) holds, the DFT can be used to compute linear convolution and correlation. The length of the sequence resulting from these calculations is the sum of the lengths of the two sequences involved in the calculation minus 1. In order
2.3 Fourier Transform and Discrete Fourier Transform
43
to be able to use DFT calculations while ensuring that no aliasing Effect occurs, the sequences involved in the calculation are usually lengthened by complementing zeros so that the length of the sequence for which the DFT calculation is performed is sufficient. The lengths of the sequences x(n) and y(n) are denoted by N x and N y , respectively, and the two zero-complemented sequences (with the convention that the complementary zeros always follow the original sequence) are denoted by xe (n) and ye (n), both of which are of length N . For the cross-correlation calculation of two convolution kernels, there should be N ≥ N x + N y − 1, and for the autocorrelation calculation of x(n), there should be N ≥ 2N x − 1. Note that X e (k) = DFT(xe (n)) and Ye (k) = DFT(ye (n)), with x(n) denoting the inverse arrangement of x(n). Then the linear convolution is x(n) ∗ y(n) = D(1:Nx +N y −1) [IDFT(X e (k)Ye (k))]
(2.34)
The cross-correlation is r x y (n) =
N −1 ∑
x(m)y ∗ (m − n) = x(n) ∗ y ∗ (n)
m=0
[ ( )] =D(1:Nx +N y −1) IDFT exp(−j2π k(N x − 1)/N )X e∗ (k)Ye (k)
(2.35)
The autocorrelation is [ ( )] r x (n) = x(n) ∗ x ∗ (n) = D(1:2Nx −1) IDFT exp(−j2π k(N x − 1)/N )|X e (k)|2 (2.36) The above calculation methods for cross-correlation and autocorrelation involve complex exponential calculations, which can be efficiently solved by using the following methods. The cross-correlation calculation is as follows ( ) r˜x y = IDFT X e∗ (k)Ye (k) ; ) { ( } r x y (n) = x(n) ∗ y ∗ (n) = r˜x y N − N y + 2 : N , r˜x y (1 : N x )
(2.37)
Based on Eq. (2.37), it is easy to obtain the formula for autocorrelation as follows. ) ( r˜x = IDFT |X e (k)|2 ; r x (n) = x(n) ∗ x ∗ (n) = {˜r x (N − N x + 2 : N ), r˜x (1 : N x )}
(2.38)
44
2 Mathematical Fundamentals
2.4 Theory and Methods of Fixed-Points in Hilbert Spaces The basic theory and methods of Hilbert spaces have been discussed in detail in many books, and only some of the basics that may be used subsequently are presented here.
2.4.1 Hilbert Space Hilbert spaces are a class of complete linear normed spaces, so before introducing the concept of Hilbert spaces, the concept of complete linear normed spaces is first introduced. Definition 2.1 (Linear normed space) Let X be a linear space (closed to addition and number multiplication operations) over the real domain R (or complex domain) and define the map X → R: x → ||·||. If ∀x, y ∈ X and α ∈ R satisfy. i. Positive definiteness: ||x|| ≥ 0, ||x|| = 0 ⇔ x = 0; ii. Homogeneity: ||αx|| = |α|||x||; iii. Trigonometric inequality: ||x + y|| ≤ ||x|| + ||y||. Then ||x|| is the parametric number of x and (X, ||·||)is the linear normed space, abbreviated as X . The three conditions are usually referred to as the norm axiom. Here x and y can be discrete or continuous. The distance can be defined in a linear normed space by defining the distance d(x, y) = ||x − y||. It is easy to verify that d(x, y) satisfies non-negativity, symmetry, and the triangular inequality ||x − y|| ≤ ||x − z|| + ||z − y||, so that X is a distance space by the distance derived from the norm. Definition 2.2 (Banach space)Let X be a linear normed space, if X is complete with the distance d(x, y) = ||x − y||, i.e., each of its fundamental sequences converges to a point in X . Then X is said to be a Banach space. The completeness of the distance space involves the concept of fundamental sequences, which is defined as follows. Definition 2.3 (Fundamental sequence) Let (X, d) be a distance space, where d is a distance defined in the space and {xn } is a sequence of points in X . If ∀ε > 0, there exists N , when n, m > N , we have d(xm , xn ) < ε.
(2.39)
Then {xn } is said to be a fundamental sequence. The concept of Hilbert space is based on the concept of inner product. The definition of inner product and inner product space is as follows.
2.4 Theory and Methods of Fixed-Points in Hilbert Spaces
45
Definition 2.4 (Inner product and inner product space) Let X be a linear space over a number field Z and define the map ⟨·, ·⟩: X × X → Z, for any x, y, z ∈ X, α ∈ Z, which satisfies, (i) (ii) (iii) (iv)
⟨x + y, z⟩ = ⟨x, z⟩ + ⟨y, z⟩, ⟨αx, y⟩ = α⟨x, y⟩, ⟨x, y⟩ = ⟨y, x⟩∗ , ⟨x, x⟩ ≥ 0, ⟨x, x⟩ = 0 ⇔ x = 0,
Then ⟨x, y⟩ is said to be the inner product of x, y. The linear space X in which the 1 inner product is defined is called the inner product space, and if we let ||x|| = ⟨x, x⟩ 2 , then X is a linear normed space. Definition 2.5 (Hilbert space) Let X be an inner product space, and if X is called a Banach space by the norm derived from the inner product, then X is said to be a Hilbert space, denoted H. For example, in l2 = {x|x = (x1 , x2 , . . . xk , . . .),
∞ ∑
|xk |2 < +∞} (xk is a
k=1
complex number), define the inner product ⟨x, y⟩ =
∞ ∑
xi y i
(2.40)
i=1
Then l2 is an inner product space. Again, L 2 [a, b] denotes the entirety of the square L-producible complex-valued functions on [a, b], ∀x, y ∈ L 2 [a, b], defining ∫b ⟨x, y⟩ =
x(t)y(t)dt,
(2.41)
a
it can be verified that L 2 [a, b] is an inner product space. In a linear normed space, a linear functional can be defined. Definition 2.6 (Linear functional) Let X be a linear normed space over a real domain R (or a complex domain) and D be a linear subspace of X. If f : D → R satisfies: ∀α, β ∈ R, x, y ∈ D, f (αx + βy) = α f (x) + β f (y),
(2.42)
then f is a linear functional on D. D is said to be the definition domain of f and f (D) = { f (x)|x ∈ D } is the range of f. In particular, if there exists M > 0 with | f (x)| ≤ M||x|| for any x ∈ D, then f is said to be a linear bounded functional or linear continuous functional on D.
46
2 Mathematical Fundamentals
The space formed by all linear bounded functionals in X is called the conjugate space of X , denoted as X ∗ . Define the norm of the linear bounded functional as || f || = sup x/=0
| f (x)| ||x||
(2.43)
It is easy to verify that X ∗ is a Banach space. Convergence of points in linear normed spaces has the notion of strong and weak convergence, usually referred to as strong convergence without specification, both of which are defined as follows. Definition 2.7 (Strong convergence, weak convergence) Let X be a linear normed space, xn ∈ X , i. If there exists x ∈ X such that ||xn − x|| → 0, then the point sequence {xn } is said to strongly converges to x. ii. If there exists x ∈ X with | f (xn ) − f (x)| → 0 for any f ∈ X ∗ , {xn } is said to weakly converges to x. It can be shown that strong convergence must lead to weak convergence and the converse is not true, and weak convergence implies that the series is bounded and its weak limit is unique. It is important to emphasize that strong and weak convergence of vectors are equivalent in finite dimensional H-spaces, and practical applications mostly satisfy this condition.
2.4.2 Non-expansive Operators with Fixed-Point Iterations For the proof of the relevant theorems in the Sects. 2.4.2 and 2.4.3, please refer to reference [1]. Definition 2.8 (Convex Set) Let X be a linear space and C be a subset of X . If ∀x, y ∈ C, it holds that {λx + (1 − λ) y|0 ≤ λ ≤ 1 } ∈ C
(2.44)
Then C is said to be a convex subset or convex set of X . Definition 2.9 (Fixed-point) Let X be a set and T : X → X be an operator. Then the fixed-point FixT (set of points) of T is defined as FixT = {x ∈ X |T x = x }
(2.45)
Let C be a nonempty closed convex set in a Hilbert space H. Let PC be the projection operator from a point in H onto C. Then we have FixPC = C.
2.4 Theory and Methods of Fixed-Points in Hilbert Spaces
47
Definition 2.10 (Non-expansive operators) Let D be a non-empty set in a Hilbert space H. Let T : D → H (which can be a non-linear operator), then T is. i. firmly non-expansive, if ||T x − T y||2 + ||(I − T )x − (I − T ) y||2 ≤ ||x − y||2
(2.46)
ii. nonexpansive, if T is 1-Lipschitz continuous, i.e., ||T x − T y||2 ≤ ||x − y||2
(2.47)
It is easy to show that the projection operator onto a non-empty convex set is firmly non-expansive; FixT is a closed convex set if the definition domain of the nonexpansive operator T is a closed convex set. Definition 2.11 (α-averaged operator) Let D be a non-empty set in a Hilbert space H. Let T : D → H be a non-expansive operator and let α ∈ (0, 1), then T is said to be α-averaged if there exists a non-expansive operator R : D → H such that T = (1 − α)I + α R
(2.48)
Let D be a non-empty set in a Hilbert space H. Let T : D → H. It holds that. i. if T is α-averaged, then T is nonexpansive. ii. (ii) T is 1/2-averaged if and only if T is firmly non-expansive. Definition 2.12 (Fejér monotonicity) Let D be a nonempty set in a Hilbert space H and {x k } be a sequence in H. Then {x k } is said to be Fejér monotonic with respect to D if. ( ∗ ) ∀x ∈ D (∀k ∈ N)
|| k+1 ||2 || ||2 || x − x ∗ || ≤ || x k − x ∗ ||
(2.49)
Theorem 2.1 (Krasnosel’skiˇı-Mann algorithm) Let D be a non-empty set in a(Hilbert ) space H. Let T : D → D be a non-expansive operator of FixT /= ∅ and let λk k∈N ( ) ∑ be the sequence in (0, 1) such that k∈N λk 1 − λk = +∞, taking x 0 ∈ D, then the iterative sequence (∀k ∈ N)
( ) x k+1 = x k + λk T x k − x k
have the following natures: { } i. x k is Fejér monotone with respect to FixT ; { } ii. { T x}k − x k converges strongly to 0; iii. x k converges weakly to FixT .
(2.50)
48
2 Mathematical Fundamentals
Theorem 2.2 Let T : H(→)H be a firmly nonexpansive operator space ( ) ∑ in a Hilbert H with FixT /= ∅ and let λk k∈N be a sequence in [0, 2] such that k∈N λk 2 − λk = +∞. Taking x 0 ∈ H such that (∀k ∈ N) x k+1 = x k + λk (Tx k − x k ), then the following conclusions hold: { } i. x k is Fejér monotone with respect to FixT ; { } ii. { T x}k − x k converges strongly to 0; iii. x k converges weakly to FixT . A special case of the Theorem 2.2 is to take λk ≡ 1, the iteration becomes x k+1 = T x k .
2.4.3 Maximally Monotone Operator Definition 2.13 (Graph) Let M : H → 2H be a point-to-set mapping in a Hilbert space H. Then M is monotone if (∀(x, u) ∈ graM) (∀( y, v) ∈ graM)
⟨x − y, u − v⟩ ≥ 0,
(2.51)
where graM is the graph of M, i.e., (x, u) ∈ graM ⇔ u ∈ M x. Definition 2.14 (Maximally monotone) Let M : H → 2H be a monotone operator and call M maximally monotone if there does not exist a monotone operator M ' : H → 2H such that graM ' contains graM, i.e., for any (x, u) ∈ H × H, we have (x, u) ∈ graM ⇔ (( y, v) ∈ graM)
⟨x − y, u − v⟩ ≥ 0.
(2.52)
Theorem 2.3 Let M : H → 2H be a monotone operator, then M is maximally monotone when and only when ran(I + M) = H, where ran is the range of the operator. Definition 2.15 (Resolvent operator) Let M : H → 2H and let β > 0. Then the resolvent operator of M is defined as Jβ M = (I + β M)−1 . Theorem 2.4 Let f ∈ ⎡0 (H) (proper convex function), then ∂ f is maximally monotone and has .Jβ∂ f = pr oxβ f Theorem 2.5 Let M : H → 2H be a maximally monotone operator and let β > 0. Then the following conclusions hold. i.
and and are maximally monotone;
are firmly nonexpansive operators
2.4 Theory and Methods of Fixed-Points in Hilbert Spaces
49
ii. reflected resolvent operator (reflection operator) Rβ M : H → H : x → 2Jβ M − I
(2.53)
is non-expansive operator.
2.4.4 Solution of the l1 -ball Projection Problem The projection operator for convex sets is a classical example of non-expansive operator, and the problem of projecting onto the l2 -sphere is easy to solve, while the problem of projecting onto the l1 -sphere is much more complicated. Below, a brief description of its solution is given, and the final conclusion is used in subsequent sections. The problem of projection to the l1 -sphere can be described as Pc (x) =
arg min ||x − y||22 { y∈Rn ,| y|1 ≤c}
(2.54)
where c > 0 is the upper bound. If there is |x|1 ≤ c, then it is clear that there is y = x. In the other cases, following from the strict convexity of ||x − y||22 and the convexity of | y|1 ≤ c, there must be a unique optimal solution to the problem and there exists μ ∈ (0, +∞) such that the solution of the problem is equivalent to the solution of the following Lagrangian problem. Pc (x) = arg min||x − y||22 + μ| y|1
(2.55)
y∈Rn
The optimal solution of Eq. (2.55) has the closed-form ⎧ yi (μ) =
xi − sgn(xi ) μ2 , |xi | ≥ μ2 ; 0, other
(2.56)
Let ϕ(μ) = | y(μ)|1 , and the aim is to find μ∗ such that ϕ(μ∗ ) = | y(μ∗ )|1 = c, where ϕ is a monotonically decreasing continuous convex function. Furthermore, there is ϕ(0) = |x|1 and lim ϕ(μ) = 0. By the Mean Value Theorem, for any μ→∞ [ ] c ∈ 0, | y|1 , there exists μ∗ such that ϕ(μ∗ ) = c.
50
2 Mathematical Fundamentals
ϕ(μ) =
n ∑ | ∗| |x | =
∑
i
i=1
i, |xi |≥(μ/ 2)
( ∑ ( μ) μ) |xi | − |xi | − = , 2 2 i,z ≥μ
(2.57)
i
where z i = 2|xi |. It follows that ϕ(μ) is a piecewise linear decreasing function, and the slope may change at μ = z i . Therefore, μ∗ can be found by the following algorithm. i. Compute z i = 2|xi |, i, . . . , n; ii. achieve j such that k → by an(|ordering | (function; / )) ( z j (k)) is increasing∑ n |x j(i) | − z j(k) 2 , and E is iii. take the partial sum: ϕ z j(k) = E(k) = i=k decreasing; iv. if E(1) < c, let a1 = 0,b1 = |x|1 ,a2 = z j (1) ,b2 = E(1), otherwise, find k ∗ such that E(k ∗ ) ≥ c and E(k ∗ + 1) < c, and let a1 = z j(k ∗ ) ,b1 = |E(k ∗ )|1 ,a2 = z j (k ∗ +1) ,b2 = E(k ∗ + 1); v. let μ∗ =
(a2 − a1 )c + b2 a1 − b1 a2 b2 − b1
(2.58)
vi. find y∗ = y(μ∗ ) from Eq. (2.56).
Reference 1. Bauschke HH, Combettes PL (2011) Convex analysis and monotone operator theory in Hilbert spaces. Springer, New York
Chapter 3
Ill-Poseness of Imaging Inverse Problems and Regularization for Detail Preservation
3.1 Summarize The purpose of regularization for solving imagere storation problems is twofold: first, to achieve stability in the solution process, to effectively suppress noise, and to obtain results with some smoothness; second, to incorporate some prior knowledge about the image into the solution process through regularization, in order to achieve results that better approximate the original image. Typically, linear regularization methods, such as Wiener filtering and constrained least square filtering, both of which can be considered as special cases of Tikhonov regularization, can satisfy the first requirement and obtain results with some smoothness. However, the results obtained by linear methods usually suffer from more severe artifacts and oversmoothing due to the lack of more reasonable prior knowledge of image. The nonlinear regularization, on the other hand, can better balance these two requirements. Image deblurring (deconvolution) is the most representative class of image restoration problems. In this chapter, we take image deblurring as an example, study the ill-posed mechanism of image restoration in depth from two perspectives of operator eigenvalue analysis and inverse filter, discuss the necessity of image restoration regularization, and reveal the effectiveness of total generalized variation (TGV) and shearlet regularization in maintaining image details from both theoretical analysis and simulation experiments. Besides, the corresponding discrete implementation methods of these two regularizations are given. This chapter is structured as follows: Sect. 3.2 briefly introduces several typical image blurring models and the degradation mechanism. Section 3.3 details the illposed mechanism of image deblurring from both eigenvalue analysis of compact operator and inverse filter perspectives, and delves into the underlying reasons why inverse filter cannot be used for deconvolution. Section 3.4 introduces the classical Tikhonov regularization. Section 3.5 then gives two nonlinear regularization methods
© Chemical Industry Press 2023 C. He and C. Hu, Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems, Advanced and Intelligent Manufacturing in China, https://doi.org/10.1007/978-981-99-3750-9_3
51
52
3 Ill-Poseness of Imaging Inverse Problems and Regularization for Detail …
that preserve image details, namely the TGV model and the shearlet transform. Section 3.6 describes several image quality evaluation methods used in this book.
3.2 Typical Types of Image Blur Mathematically speaking, the image blurring process can be regarded as the convolution of the original clear image with the PSF in the space domain. Therefore, image deblurring is also known as image deconvolution. The PSF, also known as the blur kernel, embodies the resolution capability of the imaging system for point sources. Depending on the causes of image blur, the blur kernel usually corresponds to the following typical mathematical models [1]: motion blur, out-of-focus blur, and Gaussian blur. These common blur models are briefly described below. (i) Motion blur model. Motion blur arises when there is relative motion between the imaging target and the imaging system. Depending on the subject of motion, it can be divided into global motion blur (consistent blurring of the whole image, usually caused by the relative motion of the scene and the imaging system) and local motion blur (blurring of only a part of the observed image, usually caused by the motion of an object in the image). If the relative motion is uniform linear motion (the model can also be used for non-uniform linear motion blur modeling when the camera exposure time is short), the PSF can be expressed as ⎧ ⎨ 1 , y = x tan θ, 0 ≤ x ≤ d cos θ, (3.1) h(x, y) = d ⎩ 0, others where d is the motion distance and θ is the direction of motion (counterclockwise angle to the horizontal direction). Figures 3.1a and 3.2a give the representation of a motion blur kernel in the space and frequency domains, respectively. (ii) Out-of-focus (average) blur model. The out-of-focus blur of an image originates from the improper focusing of the optical imaging system. Its point spread function is expressed as a uniformly distributed circular spot, which can be expressed as ⎧ ⎨ 1 , h(x, y) = π R 2 ⎩ 0,
x 2 + y2 ≤ R2,
(3.2)
others
where R is the radius of the circular spot. Figures 3.1b and 3.2b give the representation of an out-of-focus blur kernel in the space and frequency domains, respectively.
3.2 Typical Types of Image Blur
(a) Motion blur
53
(b) Out-of-focus blur (c) Gaussian blur
Fig. 3.1 PSF in space domain
(a) Motion blur
(b) Out-of-focus blur (c) Gaussian blur
Fig. 3.2 PSF in frequency domain
(iii) Gaussian blurring model. When there are numerous factors causing image blurring (e.g., atmospheric turbulence and diffraction from the optical system) and no one factor dominates, the combined effect causes the PSF to converge to the following Gaussian form ( 2 ) x + y2 exp − , h(x, y) = √ 2σ 2 2π σ 1
(3.3)
where the degree of blurring is proportional to the standard deviation σ . Clearly, if σ is large, then Gaussian blur tends to be out-of-focus blur. Figures 3.1c and 3.2c give the representation of a Gaussian blur kernel in the space and frequency domains, respectively. For simplicity, in the rest of the book, the average (out-of-focus) blur of size s1 ×s2 is noted as A(s1 , s2 ) (or A(s) if size s × s); the Gaussian blur of size s with standard deviation δ is noted as G(s, δ); and the motion blur of length d with counterclockwise angle θ is noted as M(d, θ ), all of which can be generated by the MATLAB function “fspecial” [2].
54
3 Ill-Poseness of Imaging Inverse Problems and Regularization for Detail …
3.3 The Ill-Posed Nature of Image Deblurring Image deblurring can be divided into two main categories, non-blind deblurring (regular deconvolution) and blind deblurring (blind deconvolution), depending on whether the blur kernel is known or not. This book focuses on the solution of nonblind deblurring, but several of the proposed methods can be conveniently generalized to blind deblurring as well. From the discussion below in this chapter, it is clear that even regular deconvolution problems with known blur kernels are usually severely ill-posed and that this ill-posed nature is intrinsic and cannot be eliminated in the absence of regularization. The ill-posed nature of image deblurring (deconvolution) can be explained from two mathematical perspectives. (i) The deblurring process is the inverse of the compact operator. From the point of view of functional analysis, a blurring process can be modeled by a compact operator. Whereas a compact operator typically maps a bounded set in Hilbert space to a compact set, in which a coherent mixing of spatial information is introduced and may be accompanied by a compression of the spatial dimension. This is possible because the eigenvalues of the compact operator converge to zero. Inverting the compact operator is equivalent to removing the coherence of the data space and reconstructing the suppressed information dimension, which is usually extremely unstable [1]. (ii) The deblurring process is the inverse of low-pass filtering. The frequency domain representation of the image blurring PSF is usually a low-pass filter which suppresses the high frequency detail information in the image. Image deblurring in the frequency domain is then an inverse of this low-pass filter, which is unstable with respect to noise and other high-frequency perturbations in the image data.
3.3.1 Discretization of Convolution Equations and Ill-Posed Analysis of Blur Matrices The linear degradation process of the image can be modeled by Eq. (1.5). First, some properties of the compact operator are used to analyze its ill-posed nature. In deblurring, the convolution operator (matrix) is a compact operator. We denote the Hilbert adjoint operator of the operator K as K * . Then K * K is a self-adjoint compact operator (if K is a matrix, then K * K is K H K, where K H denotes the Hermit transpose of K) and all its eigenvalues are non-negative real numbers. The eigenvalues of K * K are arranged in descending order as λ1 ≥ λ2 ≥ · · · ≥ 0, and their corresponding unit orthogonal eigenvectors are v1 , v2 , . . .. The eigenvectors corresponding to different eigenvalues of K * K must be orthogonal, and if the same eigenvalue corresponds to more than one eigenvector then they can be√unit orthogonalized with the help of Schmidt orthogonalization. Define μi = 1/ λi and wi =μi K v i , i = 1, 2, . . .,
3.3 The Ill-Posed Nature of Image Deblurring
55
then the minimal norm least square solution of Eq. (1.5) [3] is K+ f =
∑
μi ⟨ f , wi ⟩v i ,
(3.4)
i∈N
where K + is the pseudo-inverse operator of K. From the above equation, it is clear that although the minimal norm least square solution of Eq. (1.5) is unique, when K is not finite dimensional, there will be λi → 0 and μi → +∞. Then the noise in the observed data is amplified, which makes the minimal norm least square solution discontinuously dependent on the observed data. In practice, the ill-posed nature of the blur matrix K, is influenced by the blur kernel, the image size (or convolution length), as well as the structure of K. In most cases, deconvolution will use a discrete circular convolution model. For simplicity, the construction of a circular convolution matrix is illustrated by a onedimensional deconvolution example. Assume that the lengths of the convolution kernel h(n) and the observation data f (n) are M and N. Usually, the observation process is a partial convolution process, in which case the length of the input data u(n) is M + N − 1, and the discrete convolution equation can be written as ⎡
h M−1 h M−2 · · · ⎡ ⎤ ⎢ . f0 ⎢ h M−1 . . ⎢ f1 ⎥ ⎢ .. ⎢ ⎥ ⎢ . ⎢ f2 ⎥ ⎢ ⎢ ⎢ ⎥=⎢ ⎢. ⎥ ⎢ ⎣ .. ⎦ ⎢ ⎢ ⎣ f N −1
h0 .. . h M−2 h M−1
⎤⎡ .. ..
.
. h0 . . .. .. . . .. .. h M−1
..
.
h0 · · · h1 h0
⎤ u −M+1 ⎥⎢ ⎥ ⎥⎢ .. ⎥ ⎥⎢ . ⎥ ⎥⎢ ⎥⎢ u −1 ⎥ ⎥ ⎥⎢ ⎥. ⎥⎢ u 0 ⎥ ⎥⎢ ⎥ ⎥⎢ . ⎥ ⎥⎣ .. ⎦ ⎦ u N −1
(3.5)
Although Eq. (3.5) is a reasonable approximation to the continuous convolution process, it is almost unusable in practical deconvolution problems because Eq. (3.5) is underdetermined and the variable number M + N − 1 is usually larger than the equation number N. Therefore, a reasonable approximation to the convolution model Eq. (3.5) is necessary. Because corresponding to the fast Fourier transform (FFT), the circular convolution model becomes the most commonly used approximation model for Eq. (3.5), and its construction can significantly reduce the computational effort and reduce the data storage space. Under the circular convolution condition, Eq. (3.5) can be approximated as
56
3 Ill-Poseness of Imaging Inverse Problems and Regularization for Detail …
⎡ ⎡
h0
⎤ ⎢ ⎢ f0 ⎢f ⎥ ⎢ h1 ⎢ 1 ⎥ ⎢ .. ⎢. ⎥ ⎢ . ⎢. ⎥ ⎢ ⎢. ⎥ ⎢ ⎢ ⎥=⎢ h M−1 ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎢ ⎢ .. ⎥ ⎢0 ⎣. ⎦ ⎢ ⎢ .. f N −1 ⎣. 0
0 ··· 0 h0 0 .. .. . . .. .. . . .. . .. .. . . · · · 0 h M−1
h M−1 · · · h 1 . . .. .. .. .. ..
. .
. ···
0 .. . h1
⎤
⎤ ⎥⎡ ⎥ u0 ⎥⎢ ⎥⎢ u 1 ⎥ ⎥ ⎥ ⎥ h M−1 ⎥⎢ .. ⎢ ⎥ ⎥⎢ . ⎥ ⎥⎢ ⎥. 0 ⎥⎢ ⎥ ⎥ ⎢ ⎥ .. ⎥⎢ . ⎥ ⎥⎣ .. . ⎦ ⎥ ⎥ ⎦ u N −1 0 h0
(3.6)
An important premise for Eq. (3.6) to hold is that M 0 are used to adjust the weights of the three functions, and the constants C 1 , C 2 , and C 3 serve to prevent the denominator from being zero. A more common form of SSIM is to take α = β = γ = 1 and C 3 = C 2 / 2 on the basis of Eq. (3.37), i.e.,
)( ) ( 2μx μ y + C1 2σx y + C2 )( ). SSIM(x, y) = ( 2 μx + μ2y + C1 σx2 + σ y2 + C2
(3.38)
Additional specific details about SSIM can be found in [16]. Based on SSIM, Wang et al. developed many more quality evaluation metrics for different applications. For improvements and applications of SSIM, please refer to Wang’s personal web site.
72
3 Ill-Poseness of Imaging Inverse Problems and Regularization for Detail …
References 1. Chan T, Shen J (2005) Image processing and analysis: variational, PDE, wavelet, and stochastic methods. SIAM, Philadelphia 2. Gonzalez RC, Woods RE, Eddins SL (2004) Digital image processing using MATLAB. Pearson Prentice Hall 3. Zou MY (2001) Deconvolution and signal recovery. National Defense Industry Press, Beijing 4. Zou M, Unbehauen R (1995) On the computational model of a kind of deconvolution problems. IEEE Trans Image Process 4(10):1464–1467 5. Tikhonov A, Arsenin V (1977) Solution of Ill-posed problems. Winston and Sons, Washington 6. Bredies K, Kunisch K, Pock T (2010) Total generalized variation. SIAM J Imag Sci 3(3):492– 526 7. Wu C, Tai X (2010) Augmented lagrange method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models. SIAM J Imag Sci 3(3):300–339 8. Labate D, Lim W Q, Kutyniok G, Weiss G (2005) Sparse multidimensional representation using shearlets. In: Proceedings of SPIE. Bellingham, WA 9. Han B, Kutyniok G, Shen Z (2011) Adaptive multiresolution analysis structures and shearlet systems. SIAM J Numer Anal 49(5):1921–1946 10. Kutyniok G, Shahram M, Zhuang X (2012) Shearlab: a rational design of a digital parabolic scaling algorithm. SIAM J Imag Sci 5(4):1291–1332 11. Kutyniok G, Labate D (2012) Shearlets: multiscale analysis for multivariate data. Springer, Dordrecht 12. Häuser S, Steidl G (2014) Fast finite shearlet transform. Preprint, arXiv: 1202.1773 13. He C, Hu C, Zhang W (2014) Adaptive shearlet-regularied image deblurring via alternating direction method. In: IEEE conference on multimedia and expo. Chengdu, Sichuan, China 14. Easley G, Labate D, Lim W (2008) Sparse directional image representation using the discrete shearlet transform. Appl Comput Harmonic Anal 25:25–46 15. Guo K, Labate D (2013) The construction of smooth Parseval frames of shearlets. Math Model Nat Phenomena 8(1):82–105 16. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Proc 13(4):600–612 17. Jiao LC, Hou B, Wang S, Liu F (2008) Image multiscale geometric analysis: theory and applications-beyond wavelets. Xidian University Press, Xi’an
Chapter 4
Fast Parameter Estimation in TV-Based Image Restoration
4.1 Summarize Chapter 3 mentioned that nonlinear regularized image restoration methods are more advantageous in terms of balancing noise suppression and detail preservation than linear methods. The total variation (TV) model is one of the most commonly used regularization models in current image restoration, and the solution of the nonlinear TV regularized image restoration problem has also been the hotspot due to the non differentiability of this model. In regularized image restoration, accurate estimation of the regularization parameter for balancing the data fidelity and regularization terms is the key to successfully solving the ill-posed image restoration problem. However, adaptive estimation of the regularization parameter is a prerequisite for automating image restoration. Currently, most existing TV regularized image restoration algorithms only use human predetermination to select the regularization parameter [1–5]. When the noise level of the observed image can be estimated, the Morozov’s discrepancy principle is a good method to achieve adaptive estimation of the regularization parameter. In fact, image noise level estimation is also a hot research topic in the image field, and a large number of research results have been published in the press. Currently, the main problem of adaptive image restoration based on the discrepancy principle is that it is necessary to introduce inner iterations [6–10] in the basic iterative algorithm while implementing adaptive selection of regularization parameter. This leads to the complexity of the algorithm structure and affects the efficiency of image restoration, and makes the convergence and final results of the algorithm vulnerable to the impact of the solution accuracy of the inner iteration method. To achieve adaptive estimation of the regularization parameter for image restoration, the following two aspects need to be considered simultaneously: (i) choosing a suitable parameter estimation strategy, thus making the parameter estimation more accurate and the structure of the algorithm more concise; (ii) rigorously proving the
© Chemical Industry Press 2023 C. He and C. Hu, Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems, Advanced and Intelligent Manufacturing in China, https://doi.org/10.1007/978-981-99-3750-9_4
73
74
4 Fast Parameter Estimation in TV-Based Image Restoration
convergence of the algorithm, thus making the algorithm have a solid theoretical foundation and better generalizability. In this chapter, a new algorithm capable of simultaneous regularization parameter estimation and image restoration is proposed based on the classical TV model and the alternating direction method of multipliers (ADMM). By applying variable splitting to both the regularization and fidelity terms, the nondifferentiability of the TV model is overcome, fast iterative updating of the regularization parameters with a closed-form is achieved, and the restoration results are guaranteed to satisfy the Morozov’s discrepancy principle. This chapter proves the global convergence of the proposed algorithm subject to parameter variation. Further, an equivalent splitting Bregman form of the proposed algorithm is given and the idea of adaptive estimation of parameters is extended and applied to the interval-constrained TV image restoration problem. Experimental results show that the proposed algorithm has a significant advantage in speed and is more competitive in accuracy compared with existing TV image restoration algorithms. This chapter is structured as follows: Sect. 4.2 outlines the existing TV regularization adaptive parameter estimation methods based on the Morozov’s discrepancy principle and analyzes their advantages and disadvantages. Section 4.3 proposes the adaptive parameter estimation for ADMM (APE-ADMM) that can perform both regularization parameter estimation and image restoration, and details its saddle point condition, derivation procedure and convergence analysis. Section 4.4 gives the adaptive parameter estimation for splitting Bregman algorithm (APE-SBA), which is equivalent to APE-ADMM, and extends the idea of adaptive parameter estimation to the case of image restoration with interval constraints, resulting in the APE for box-constrained ADMM (APE-BCADMM). Section 4.5 verifies the effectiveness of the proposed algorithm in adaptive regularization parameter estimation and image restoration and its superiority over existing algorithms through three comparative experiments.
4.2 Overview of Adaptive Parameter Estimation Methods in TV Image Restoration The adaptive estimation of the regularization parameter in TV image restoration has been a hotspot in the field of image processing. Under the Morozov’s discrepancy principle and Gaussian noise condition, the problem is essentially solving for minTV(u) s.t. u ∈ u : K u − f 22 ≤ c . u
(4.1)
According to Lagrange theory, the constrained problem Eq. (4.1) can be transformed into the unconstrained problem Eqs. (4.1–4.16). If u is a solution to problem (4.1), then it is also a solution to problem (4.1–4.16) for some particular λ ≥ 0. Given λ, denote by u* (λ) the optimal solution to problem (4.1–4.16), then with respect to
4.2 Overview of Adaptive Parameter Estimation Methods in TV Image …
75
problem (4.1), when λ = 0, we have u∗ (0) ∈ ; or when λ > 0, we have ∗ K u (λ) − f 2 = c. 2
(4.2)
In fact, minimizing the problem (4.1–4.16) when λ = 0 is equivalent to minimizing TV(u), whose solution is a constant-value image, which obviously does not correspond to a practical application. Therefore, the aim of the Morozov’s discrepancy principle is to find a λ > 0 such that the solution of the problem (4.1) is a non-constantvalue image. It is worth mentioning that since there is no closed-form solution to the problem (4.1–4.16), it is difficult to directly confirm whether its solution is in the feasible domain . In order to make the method for solving Eqs. (4.1–4.16) usable when solving problem Eq. (4.1), Blomgren and Chan proposed a standard method [11] for updating λ. Although an approximate solution can be found, the method is too time-consuming due to the fact that the problem (4.1–4.16) is solved multiple times for a series of λ. Ng et al. proposed an algorithm based on ADMM for solving the problem (4.1). In each iteration step, a least square problem needs to be solved before projecting the current estimate of the original image into the feasible domain . By using either circular boundary or Neumann boundary conditions, the blur matrix K can be diagonalized by the FFT, and thus the method can easily solve the least square problems involved. However, the method requires the introduction of the Newton iteration method for automatic updating of λ. Afonso et al. also proposed an alternative algorithm for solving problem (4.1) based on ADMM [6]. In this algorithm, an auxiliary variable to replace Ku is introduced using variable splitting, whereby the recovery problem on TV is decomposed into a Moreau proximity denoising problem and an inverse filter problem. And then, in each iteration step, the corresponding proximity denoising problem is solved by the Chambolle denoising algorithm[12]. Based on the primal–dual model [13] of TV, Wen and Chan derived an efficient method [7] for solving problem (4.1). The procedure is to first transform the solution of problem (4.1) into a saddle point problem of the corresponding primal–dual problem, and then use the primal–dual hybrid gradient algorithm (PDHG) to find the solution of the problem. To ensure that the solution of the problem is in the feasible domain, the Newton iteration method is introduced as a nested algorithm in this method. The above methods for solving problem (4.1) all introduce an inner iterative structure while guaranteeing a feasible solution, and in addition, only the literature [7] provides a proof of the corresponding algorithm.
76
4 Fast Parameter Estimation in TV-Based Image Restoration
4.3 Fast Adaptive Parameter Estimation Based on ADMM and Discrepancy Principle This section proposes an algorithm for solving the constrained TV recovery problem (4.1) based on ADMM, which contributes in three ways. First, unlike recovery algorithms [1–5] that focus on fixed regularization parameter, the aim of the proposed algorithm in this paper is to solve the constrained problem (4.1) and find the optimal λ adaptively without human intervention. Second, unlike existing algorithms [6–10] for solving the problem (4.1), the proposed algorithm is more compact in structure and avoids nested iterations. In the proposed algorithm, two auxiliary variables are introduced to replace Ku and TV norm by using the variable splitting technique, thus decomposing the problem (4.1) into multiple simple subproblems that can be solved by ADMM. Thanks to this, λ can be updated in closed-form at each step of the iteration. The application of the Morozov’s discrepancy principle ensures that the solution is always in the feasible domain Ψ . Finally, the proof of convergence of the algorithm is completed based on the variational inequality. Because λ in the algorithm is variable, the proof of the algorithm is quite different from the existing proofs in the literature [5, 14, 15] on ADMM. Moreover, the proposed parameter estimation idea can be naturally generalized to interval-constrained TV image restoration. Experiments show that the proposed algorithm can find the optimal λ and outperforms some existing well-known algorithms in terms of speed and accuracy. Based on the equivalence of ADMM, splitting Bregman algorithm and Douglas-Rachford algorithm [16], the proposed algorithm can also be regarded as an application example of the latter two. Without loss of generality, write the Euclidean space Rmn for V and define Q V × V . For u ∈ V , u i, j ∈ R denotes the ((i − 1)n + j ) th element of u and for y ∈ Q, yi, j,1 , yi, j,2 denotes the ((i − 1)n + j ) th element of y. The inner products in Euclidean spaces V and Q are defined as u, vV =
m,n
u i, j vi, j ,
u2 =
√
u, uV ,
i, j
y, q Q =
m,n 2
yi, j,k qi, j,k ,
y2 =
√
y, y Q .
(4.3)
i, j k=1
√ In addition, for pixel (i, j), note yi, j 2 = yi,2 j,1 + yi,2 j,2 , and for y ∈ Q, write
yi, j . Let the two-dimensional first-order difference operator be the y1 = i,m,n j 2 mapping ∇ : V → Q and ∇u ∈ Q is given by (∇u)i, j = (∇1 u)i, j , (∇2 u)i, j , then the isotropic total variation TV(u) = ∇u1 is given. Using the definition of inner products in V and Q, one can derive the adjoint operator of −∇ as the divergence operator div: Q → V, i.e., div = −∇ T , so that for any u ∈ V and y ∈ Q, there must
4.3 Fast Adaptive Parameter Estimation Based on ADMM and Discrepancy …
77
be −div y, uV = y, ∇u Q . The specific definitions of the gradient and divergence operators under circulant boundary conditions can be found in [5].
4.3.1 Augmented Lagrangian Model for TV Regularized Problem The blur matrix K is generated by PSF and holds with K 1 = 1 [7, 13], where 1 is a vector of all elements that are 1, so that the null space of K does not contain any constant-valued vectors except 0. In contrast, the null space of ∇ is the set of constant-valued vectors. Thus, only 0 is the common element of the null spaces of K and ∇, i.e., zer∇ ∩ zerK ={0}. Under this condition, the minimization function Eqs. (4.1–4.16) is properly convex and the solution exists. By Fermat’s law, the following lemma holds. Lemma 4.1 [5, 13] The problem (4.1–4.16) has at least one solution u∗ , which satisfies 0 ∈ λK T K u∗ − f − div ∂ ∇u∗ 1 ,
(4.4)
where ∂∇u∗ 1 denotes the subdifferential of TV(u) at ∇u∗ . Next, the operator splitting technique [3] is used to liberate ∇u from the nondifferentiable l1 -norm and to simplify the updating process of the regularization parameter λ. Specifically, an auxiliary variable y ∈ Q is introduced to replace ∇u (or using yi, j ∈ R2 instead (∇u)i, j ) and introduce another auxiliary variable x ∈ V to replace Ku, thus transforming the problem (4.1–4.16) into the following linearly constrained form λ (4.5) min y1 + x − f 22 s.t. K u = x; y = ∇u. u,x, y 2 The augmented Lagrangian (AL) function of problem (4.5) is defined as LA (u, x, y; μ, ξ ; λ) λ2 x − f 22 − μT (x − K u) + β21 x − K u22 + y1 − ξ T ( y − ∇u) + β22 y − ∇u22 ,
(4.6)
where μ ∈ V and ξ ∈ Q are Lagrangian multipliers (or dual variables), while β 1 and β 2 are positive penalty parameters. For the AL function Eq. (4.5), consider the following saddle point problem LA (u∗ , x ∗ , y∗ ; μ, ξ ; λ) ≤ LA u∗ , x∗ , y∗ ; μ∗ , ξ ∗ ; λ ≤ LA u, x, y; μ∗ , ξ ∗ ; λ , u∗ , x ∗ , y∗ ; μ∗ , ξ ∗ ∈ V × V × Q × V × Q. (4.7)
78
4 Fast Parameter Estimation in TV-Based Image Restoration
Theorem 4.1 describes the relationship between the saddle point of problem (4.7) and the solution of problem (4.1–4.16). Lemma 4.2 is given first, and it is crucial to the proof of Theorem 4.1. Lemma 4.2 [17] Let F = F1 + F2 , where F1 and F2 are lower semicontinuous convex functions mapping from R N to R, and F1 is differentiable and its gradient is F1 . Let p∗ ∈ R N , then the following two conditions are equivalent (i) p* is the solution of Inf F( p); N
p∈R ∗ ∗ (ii) F1 ( p ), p − p + F2 ( p) − F2 ( p∗ ) ≥ 0 ∀ p ∈ R N . Theorem 4.1 u∗ ∈ V is a solution of the problem (1–16)if and only if there exist x ∗ , μ∗ ∈ V and y∗ , ξ ∗ ∈ Q such that u∗ , x ∗ , y∗ ; μ∗ , ξ ∗ is a saddle point of the problem (4.7). Proof Let u∗ , x ∗ , y∗ ; μ∗ , ξ ∗ satisfies the saddle point condition, which follows from the first inequality of Eq. (4.7). T μT x ∗ − K u∗ +ξ T y∗ − ∇u∗ ≥ μ∗ x ∗ − K u∗ T + ξ ∗ y∗ − ∇u∗ ∀ μ ∈ V , ξ ∈ Q.
(4.8)
Let ξ = ξ ∗ in Eq. (4.8), then we have T μT x ∗ − K u ∗ ≥ μ∗ x ∗ − K u ∗ ∀ μ ∈ V .
(4.9)
Inequality (4.9) shows that x ∗ − K u∗ = 0, and similarly, it follows that y∗ −∇u∗ = 0 . Thus the following equation holds
x ∗ − K u∗ = 0, y∗ − ∇u∗ = 0.
(4.10)
Considering Eq. (4.10) and the second inequality in Eq. (4.7) yields 2 ∇u∗ 1 + λ2 K u∗ − f 2 ≤ λ2 x − f 22 − (μ∗ )T (x − K u) + β21 x − K u22 ∗ T + y1 − ξ ( y − ∇u) + β22 y − ∇u22 ∀(u, x, y) ∈ V × V × Q. (4.11) Substituting x = Ku and y = ∇u into Eq. (4.11), we get ∗ ∇u + λ K u∗ − f 2 ≤ ∇u1 + λ K u − f 2 . 2 2 1 2 2
(4.12)
Inequality (4.12) shows that u* is a solution to problem (4.1–4.16). Conversely, let u* ∈ V be a solution to the problem (4.1–4.16). Let x* = Ku* and y∗ = ∇u∗ , and by Lemma 4.1 there must exist μ* and ξ * such that μ* =
4.3 Fast Adaptive Parameter Estimation Based on ADMM and Discrepancy …
79
∗ T ∗ * ∗ T ∗ λ(Ku-f ∗ ∗ ), ∗ξ =∗ ∂∇u 1 and ∇ ξ = −λK (K u − f ) hold. Next, we prove that ∗ u , x , y ; μ , ξ is a saddle point in Eq. (4.7). Since x* = Ku* and y∗ = ∇u∗ hold, the first inequality in Eq. (4.7) holds. Next, we prove that
LA u∗ , x ∗ , y∗ ; μ∗ , ξ ∗ ; λ ≤ LA u, x, y; μ∗ , ξ ∗ ; λ ∀u, x, y ∈ V × V × Q. (4.13) From the definition of L A in Eq. (4.6), it follows that LA u, x, y; μ∗ , ξ ∗ ; λ are all proper convex functions if u, x and y are the variables, respectively (fixing the other two). Therefore, by Lemma 4.2, it has a minimum point (u , x, y) in V × V × Q, when and only when
K T μ∗ + ∇ T ξ ∗ , u − u +β1 K T (K u − x), u − u
+β2 ∇ T (∇u − y), u − u ≥ 0 ∀u ∈ V ,
(4.14)
y1 − y1 − ξ ∗ , y − y + β2 y − ∇u, y − y ≥ 0 ∀ y ∈ Q,
(4.15)
λ λ x − f 22 − x − f 22 − μ∗ , x − x + β1 x − K u, x − x ≥ 0 ∀x ∈ V . 2 2 (4.16) On the one hand, it would be useful to combine u =u* , x =x* , and y =y* into Eq. (4.14), we have (u* , x* , y* ) satisfying Eq. (4.14) by the above assumptions on other hand, by Lemma 4.1, there must be 0 ∈ λK T (K u∗ − f ) − μ* and ξ * . On the ∗ ∗ div ∂ y 1 ( y = ∇u∗ ). Combining u =u* and y =y* into Eq. (4.15), the third term of Eq. (4.15) is zero. By divξ ∗ = λK T (K u∗ − f ) and the non-negativity of the Bregman distance, the inequality (4.15) is equivalent to
y1 − y∗ 1 − ∂ y∗ 1 , y − y∗ ≥ 0 ∀ y ∈ Q.
(4.17)
That is, (u* , x* , y* ) satisfies Eq. (4.15). Similarly, if we take u =u* and x =x* , inequality (4.16) holds. Therefore, (u* , x* , y* ) satisfies Eq. (4.14), Eq. (4.15), and Eq. (4.16) simultaneously, so the second inequality in Eq. (4.7) holds. Theorem 4.1 is proved. Lemma 4.1 Jointing Theorem 4.1, shows that, for problem Eq. (4.7), there exists at least one saddle point and each u* is a minimum point of problem (4.1–4.16). The augmented Lagrangian method (ALM) [18] can solve the saddle point problem Eq. (4.7) by the following iterative framework.
80
4 Fast Parameter Estimation in TV-Based Image Restoration
⎧ k+1 k+1 k+1 = arg minLA u, x, y; μk , ξ k ; λ , ⎪ ⎨ u ,x , y k+1 u,x, y k+1 k+1 k μ = μ − β − Ku , 1 ⎪ x ⎩ k+1 = ξ k − β2 yk+1 − ∇uk+1 . ξ
(4.18)
The problem of solving uk+1 , x k+1 , yk+1 precisely is not straightforward, which requires the introduction of an inner iteration in the framework Eq. (4.18). In this book, ADMM is used to solve this problem, where only one solution is required for each of the three variables in each iteration step, and the convergence analysis in Sect. 4.3.3 justifies it.
4.3.2 Algorithm Derivation This subsection solves the TV regularization problem (4.1) by ADMM, where the regularization parameter λ is updated in closed-form and eventually converges to the optimal value determined by the discrepancy principle. The iterative ADMM scheme used to solve problem (4.1) is uk+1 = arg minLA u, x k , yk ; μk , ξ k ; λk ;
(4.19)
yk+1 = arg minLA uk+1 , x k , y; μk , ξ k ; λk ;
(4.20)
x k+1 = arg minLA uk+1 , x, yk+1 ; μk , ξ k ; λk+1 ;
(4.21)
μk+1 = μk − β1 x k+1 − K uk+1 ;
(4.22)
ξ k+1 = ξ k − β2 yk+1 − ∇uk+1 .
(4.23)
u
y
x
In Eq. (4.21), λk+1 is the regularization parameter obtained by the update at step k + 1 according to the discrepancy principle. From the definition of LA in Eq. (4.6) and the above 5 iteration steps, it is clear that only the variable x, not the variable u, is related to the update of λ. In other words, only the variable x is restricted by the discrepancy principle. The next section details how to solve the subproblem Eqs. (4.19–4.21). The subproblem on u has the following quadratic minimization form 2 T 2 T β1 β2 uk+1 = arg min μk K u + x k − K u2 + ξ k ∇u + yk − ∇u2 . 2 2 u (4.24) Thus, there are
4.3 Fast Adaptive Parameter Estimation Based on ADMM and Discrepancy …
81
−1 T K β1 x k − μk − div β2 yk − ξ k , uk+1 = β1 K T K − β2
(4.25)
where = div · ∇ denotes the Laplace operator. Under the circulant boundary condition, the operators K and ∇ have the form of circulant matrices that can be diagonalized by the fast Fourier transform (FFT). Thus, Eq. (4.25) can be solved [3] by two forward FFTs and one inverse FFT with a computational complexity of O(mn log(mn)) multiplication operation if the image size is m × n. . Accordingly, if the Neumann boundary condition is assumed to be satisfied, the FFT should be replaced by the discrete cosine transform (DCT). From Eq. (4.20), the subproblem with respect to y has yi,k+1 j
2 β2 ξ i,k j k+1 = arg min yi, j + yi, j − ∇u − . i, j 2 β2 yi, j
(4.26)
2
The minimization problem Eq. (4.26) is a proximity minimization problem whose solution can be obtained by the following two-dimensional contraction operation [3] yi,k+1 j
∇uk+1 + ξ k /β 2 1 i, j i, j k+1 k . = max ∇u + ξ i, j /β2 − , 0 i, j 2 β2 + ξ k /β ∇uk+1 i, j
i, j
2
2
(4.27) Here it is necessary to assume that 0 × (0/0) = 0 to avoid computational overflow. The computational complexity of the operator (4.27) is then linearly related to mn. According to Eq. (4.21), the subproblem with respect to x can be written as x k+1 = arg min x
2 λk+1 β1 x − f 22 + x − ak+1 2 , 2 2
(4.28)
where ak+1 = K uk+1 + μk /β1 . The minimization problem (4.28) clearly shows that x is associated with λ. From Eq. (4.5), x plays the role of Ku. Next, it is only necessary to verify that x satisfies the discrepancy principle, i.e., x − f 22 ≤ c holds or not. From Eq. (4.28), the minimization problem with respect to x is quadratic, and its closed-form solution is x k+1 =
λk+1 f + β1 ak+1 . λk+1 + β1
(4.29)
In each iteration step, depending on the value of ak+1 , the value of λ may be taken in two ways, on the one hand, if 2 k+1 a − f 2 ≤ c,
(4.30)
82
4 Fast Parameter Estimation in TV-Based Image Restoration
k+1 then λk+1 = 0 and xk+1 = ak+1 can be set, and it is clear satisfies the 2 that x k+1 − f 2 > c, according to the discrepancy principle. On the other hand, if a discrepancy principle, xk+1 should be determined by solving the following equation
2 k+1 x − f 2 = c.
(4.31)
Replacing xk+1 in Eq. (4.31) with Eq. (4.29), we have λ
k+1
β1 f − ak+1 2 = − β1 . √ c
(4.32)
From the above discussion, it can be seen that Ku can be released from the discrepancy principle by introducing an auxiliary variable x. Thanks to this, a closed-form solution with respect to λ can be obtained at each iteration step without any additional condition, which is the biggest difference between the proposed algorithm and the algorithm in the literature [10]. In [10], Ng et al. also solved the constrained problem Eq. (4.1) based on ADMM, with the difference that only one auxiliary variable was introduced to replace the TV regularizer. Unlike Eq. (4.32), for Eq. (4.2), a closedform solution of λ does not exist, and therefore, the Newton iterative method needs to be introduced to solve λ. Inevitably, necessary additional conditions need to be imposed on Eq. (4.2) to ensure the existence and uniqueness [7] of λ. In contrast, the proposed algorithm is not subject to additional conditions. Algorithm 4.1 Adaptive Parameter Estimation for Alternate Directional Method of Multipliers (APE-ADMM) Step 1: Enter f , K, c. Step 2: Initialize u0 , x0 , y0 , μ0 , ξ 0 = 0, k = 0, β 1 , β 2 > 0. Step 3: Determine whether the termination condition is met; if not, perform the following steps. Step 4: Calculate uk+1 by means of Eq. (4.25). Step 5: Calculate yk+1 by means of Eq. (4.27). Step 6: if Eq. (4.30) holds, set λk+1 = 0 and xk+1 = ak+1 , otherwise update λk+1 and xk+1 by Eqs. (4.32) and (4.29), respectively. Step 7: Update μk+1 and ξ k+1 by means of Eqs. (4.22) and (4.23). Step 8: k = k + 1. Step 9: End the loop and output u k+1 and λk+1 . Algorithm 4.1 (APE-ADMM) summarizes the derived algorithm. In this algorithm, the most computationally intensive problem regarding the u-subproblem (step 4) requires solving 3 sub-FFT/inverse FFTs in each iteration step. So if APE-ADMM iterates L times, the computation consumes approximately the time used for 3L 2D FFTs. In addition, it is easy to find that some variables in Algorithm 4.1 can be updated in parallel (e.g., y and x), so Algorithm 4.1 can be accelerated by a parallel computing device such as a GPU.
4.3 Fast Adaptive Parameter Estimation Based on ADMM and Discrepancy …
83
4.3.3 Convergence Analysis This subsection will show that the sequence {uk } generated by the APE-ADMM converges to the minimum point of the constrained TV regularization problem Eq. (4.1), and {λk } converges to the optimal regularization parameter λ* corresponding to the constraint u ∈ , i.e., the solution of problem Eq. (4.1) is simultaneously the solution of problem Eqs. (4.1–4.16) when λ = λ* . The step 6 in the algorithm APE-ADMM show that the sequence {λk } converges to some non-negative λ† as k → + ∞ and that one has λ† = 0 if u ∈ and has λ† > 0 if K u − f 22 = c. In fact for a natural image, only the latter case is possible. The saddle point condition stated in Eq. (4.7) still holds for LA u, x, y; μ, ξ ; λ† , and in the discussion that follows, (u* , x* , y* ; μ* , ξ * ) is noted as its saddle point. Lemma 4.3 and Lemma 4.5 reveal the contraction and convergence of the sequence produced by the algorithm APE-ADMM. Lemma 4.3 Let {uk , xk , yk ; μk , ξ k } be the sequence generated by APE-ADMM, then {xk }, {yk }, {μk } and {ξ k } are bounded and the following holds ⎧ lim x k+1 − x k 2 = 0, ⎪ ⎪ ⎨ k→+∞ lim x k − K uk 2 = 0, k→+∞ ⎪ ⎪ ⎩ lim μk+1 − μk = 0, 2 k→+∞
lim yk+1 − yk 2 = 0; k→+∞ lim yk − ∇uk 2 = 0; k→+∞ lim ξ k+1 − ξ k 2 = 0.
(4.33)
k→+∞
√ √ √ √ Noting that v = β1 x, β2 y, μ/ β1 , ξ/ β2 , then there must 2 2 bev k+1 − v ∗ 2 ≤ v k − v ∗ 2 , i.e., {vk } is Fejér monotone with respect to the set of all v* . Proof Let (u* , x* , y* ; μ* , ξ * ) be the saddle point of LA u, x, y; μ, ξ ; λ† , define k+1 ˆ k+1 , and ξˆ uˆ k+1 uk+1 − u∗ and define xˆ k+1 , ˆyk+1 , μ in the same way. Taking λ = λ† , the first inequality in Eq. (4.7) yields x* = Ku* and y∗ = ∇u∗ . Associating this conclusion with Eqs. (4.22) and (4.23) yields ⎧ k+1 ⎨μ ˆ k+1 = μ ˆ k − β1 xˆ k+1 − K uˆ , k+1 k ⎩ ξˆ = ξˆ − β2 ˆyk+1 − ∇ uˆ k+1 .
(4.34)
Therefore, there are ⎧ 2 2 k k+1 2 ˆ ˆ −μ ⎪ μ k+1 k+1 ⎪ 2 2 ⎨ ˆ k , xˆ k+1 − K uˆ − β1 xˆ k+1 − K uˆ , =2μ β1 2 k 2 k+1 2 ˆ ˆ 2 k ⎪ ξ −ξ ⎪ k+1 k+1 k+1 k+1 2 2 ⎩ − β2 ˆy = 2 ξˆ , ˆy − ∇ uˆ − ∇ uˆ . β2 2
(4.35)
84
4 Fast Parameter Estimation in TV-Based Image Restoration
On the one hand, from the proof of Theorem 4.1 and the second inequality (λ = λ† ) in Eq. (4.7), it follows that
K T μ∗ + ∇ T ξ ∗ , u − u∗ + β1 K T (K u∗ −x ∗ ), u − u∗ +β2 ∇ T (∇u∗ − y∗ ), u − u∗ ≥ 0 ∀u ∈ V ,
y1 − y∗ 1 − ξ ∗ , y − y∗ + β2 y∗ − ∇u∗ , y − y∗ ≥ 0 ∀ y ∈ Q, 2 λ† λ† x − f 22 − x ∗ − f 2 − μ∗ , x − x ∗ 2 2 +β1 x ∗ − K u∗ , x − x ∗ ≥ 0 ∀x ∈ V.
(4.36) (4.37)
(4.38)
On the other hand, since uk+1 , xk+1 , and yk+1 are solutions of the corresponding subproblems, respectively, by Lemma 4.2, there must be
K T μk + ∇ T ξ k , u− uk+1 + β1 K T K uk+1 − x k , u − uk+1 +β2 ∇ T ∇uk+1 − yk , u − uk+1 ≥ 0, ∀u ∈ V ,
y1 − yk+1 1 − ξ k , y − yk+1 + β2 yk+1 − ∇uk+1 , y − yk+1 ≥ 0 ∀ y ∈ Q, λk+1 x 2
2 k+1 − f 22 − λ 2 x k+1 − f 2 − μk , x − x k+1
+β1 x k+1 − K uk+1 , x − x k+1 ≥ 0 ∀x ∈ V .
(4.39)
(4.40)
(4.41)
Substituting u = uk+1 and u = u* into Eqs. (4.36) and (4.39), respectively, and adding them together yields
ˆ k , K uˆ μ
k+1
k k+1 k+1 +β1 K uˆ + ξˆ , ∇ uˆ k+1 − xˆ k , K uˆ +β2 ∇ uˆ k+1 − ˆyk , ∇ uˆ k+1 ≤ 0.
(4.42)
Similarly, substituting y = yk+1 and y = y* into Eqs. (4.37) and (4.40) respectively, and adding them together gives
k ξˆ , − ˆyk+1 + β2 ∇ uˆ k+1 − ˆyk+1 , − ˆyk+1 ≤ 0.
(4.43)
From the step 6 of the algorithm APE-ADMM, it follows that there must be 2 2 either (i) ak+1 − f 2 ≤ c and λk+1 = 0 or (ii) ak+1 − f 2 > c, λk+1 > 0 and 2 k+1 x − f = c. Since x ∗ − f 2 = c, for both case (i) and case (ii) we have 2
2
4.3 Fast Adaptive Parameter Estimation Based on ADMM and Discrepancy …
k+1 ˆ k , − xˆ k+1 + β1 K uˆ μ − xˆ k+1 , − xˆ k+1 ≤ 0.
85
(4.44)
Adding Eqs. (4.42–4.44), we get
k k+1 ˆ k , xˆ k+1 − K uˆ μ + ξˆ , ˆyk+1 − ∇ uˆ k+1 2 2 k+1 ≥ β1 xˆ k+1 − K uˆ + β2 ˆyk+1 − ∇ uˆ k+1 2 2 k+1 + β2 ˆyk+1 − ˆyk , ∇ uˆ k+1 . +β1 xˆ k+1 − xˆ k , K uˆ
(4.45)
Combining Eqs. (4.35) and (4.45) yields 2 k k+1 2 ˆ ˆ −μ μ
k 2 k+1 2 ˆ ˆ ξ −ξ
k+1 k+1 k ˆ ˆ ˆ x ≥ 2β − x , K u 1 β1 β2 2 k+1 k+1 k − ˆy , ∇ uˆ k+1 + β1 xˆ k+1 − K uˆ +2β2 ˆy 2 2 +β2 ˆyk+1 − ∇ uˆ k+1 . 2
2
+
2
2
(4.46)
2
Then we estimate the bound of the first two terms on the right-hand side of inequality (4.46). From the update of xk and yk it follows that
y1 − yk 1 − ξ k−1 , y − yk + β2 yk − ∇uk , y − yk ≥ 0,
(4.47)
2
λk λk x − f 22 − x k − f 2 − μk−1 , x − x k + β1 x k − K uk , x − x k ≥ 0. 2 2 (4.48) Substituting y = yk and y = yk+1 into Eqs. (4.40) and (4.47), respectively, and adding them together gives 2 k k−1 β2 ˆyk+1 − ˆyk − ξˆ − ξˆ , ˆyk+1 − ˆyk 2 −β2 ∇ uˆ k+1 − ∇ uˆ k , ˆyk+1 − ˆyk ≤ 0.
(4.49)
According to step 6 of APE-ADMM, there are
2 λk x k − f 2 λk+1 x k+1 −
2 ≥ λk x k+1 − f 2 , 2 2 f 2 ≥ λk+1 x k − f 2 .
(4.50)
Substituting x = xk and x = xk+1 into Eqs. (4.41) and (4.48), respectively, and adding them together yields
86
4 Fast Parameter Estimation in TV-Based Image Restoration
2 ˆk −μ ˆ k−1 , xˆ k+1 − xˆ k β1 xˆ k+1 − xˆ k − μ 2 k+1 k − K uˆ , xˆ k+1 − xˆ k ≤ 0. −β1 K uˆ
(4.51)
Combining (4.34) (taking k = k − 1), Eq. (4.49), and Eq. (4.51) gives ⎧ 2 ⎪ k+1 ⎨ xˆ k+1 − xˆ k , K uˆ k+1 − xˆ k ≥ − xˆ k , xˆ 22 k+1 k ⎪ ⎩ ˆyk+1 − ˆyk , ∇ uˆ k+1 − ˆyk ≥ − ˆy . ˆy
(4.52)
2
Equation (4.52) associating ⎧ 2 2 2 k k+1 ⎪ k+1 k k k 1 k+1 ⎪ − ˆy , ˆy = 2 ˆy − ˆy − ˆy − ˆy , ⎨ ˆy 2 2 2 2 2 2 ⎪ k+1 ⎪ − xˆ k , xˆ k = 21 xˆ k+1 − xˆ k − xˆ k+1 − xˆ k , ⎩ xˆ 2
2
2
obtains that ⎧ 2 2 2 k k+1 ⎪ k+1 k k+1 k 1 k+1 ⎪ ≥ 2 ˆy − ˆy + ˆy − ˆy , ∇ uˆ − ˆy , ⎨ ˆy 2 2 2 2 2 2 ⎪ k k+1 k+1 k k+1 k 1 k+1 ⎪ ≥ 2 xˆ − xˆ + xˆ − xˆ , K uˆ − xˆ . ⎩ xˆ 2
(4.53)
2
(4.54)
2
Combining Eqs. (4.46) and (4.54) yields k 2 k+1 2 ˆ ˆ x x + β − 1 β1 β2 2 2 2 2 k+1 k 2 k+1 2 k+1 +β2 ˆy − ˆy ≥ β1 xˆ − K uˆ + β2 ˆyk+1 − ∇ uˆ k+1 2 2 2 2 2 2 k+1 k+1 k k +β1 xˆ − xˆ + β2 ˆy − ˆy .
2 k k+1 2 ˆ ˆ −μ μ 2
2
+
k 2 k+1 2 ˆ ˆ ξ −ξ 2
2
2
2
(4.55) above inequality shows that the sequence The 2 2 k 2 k 2 k k ˆ ˆ /β1 + ξ /β2 + β1 xˆ + β2 ˆy is non-negative, bounded and μ 2
2
2
2
non-increasing, so it must have a limit. Therefore, the left-hand side of inequality (4.55) tends to 0 as k → + ∞, which shows that the limit of the right-hand side of inequality (4.55) is also 0. Combining this result with Eq. (4.22), Eq. (4.23), and Eq. (4.34) leads to the conclusion of Lemma 4.3. Lemma 4.3 yields the proof. Lemma 4.4 [19] Let pk be a sequence in the Euclidean space R N and C N k a non-empty subset in R . If p is Fejér monotone with respect to C, i.e.,
4.3 Fast Adaptive Parameter Estimation Based on ADMM and Discrepancy …
87
k+1 2 2 p − p∗ 2 ≤ pk − p∗ 2 (∀ p∗ ∈ C) and all clusters of pk are in C, then there must be a point in C to which pk converges. Lemma 4.5 Let uk , x k , yk ; μk , ξ k be a sequence generated by APE-ADMM, then it must converge to a saddle point of LA u, x, y; μ, ξ ; λ† . In particular, uk converges to the solution of the problem Eqs. (4.1–4.16) at λ = λ† . Proof Substituting u = u* , x = x* and y = y* into Eqs. (4.39)–(4.41) respectively, and adding them together gives 2 k+1 k+1 y∗ 1 + λ 2 x ∗ − f 22 ≥ yk+1 1 + λ 2 x k+1 − f 2 − μk , x k+1 − K uk+1 2
− ξ k , yk+1 − ∇uk+1 + β1 K uk+1 − x k+1 2 2
k+1 +β1 x − x k , K uk+1 − K u∗ + β2 ∇uk+1 − yk+1 2
+β2 yk+1 − yk , ∇uk+1 − ∇u∗ . (4.56) By Lemma 4.3 and inequality (4.56), it follows that † † ∗ y + λ x ∗ − f 2 ≥ lim yk+1 + λ x k+1 − f 2 . 2 1 1 2 k→+∞ 2 2
(4.57)
Since (u* , x* , y* ; μ* , ξ * ) is the saddle point of LA u, x, y; μ, ξ ; λ† , according to Eq. (4.7) we have † † ∗ y + λ x ∗ − f 2 ≤ lim yk+1 + λ x k+1 − f 2 . 1 1 2 2 k→+∞ 2 2
(4.58)
Combining inequality (4.57), inequality (4.58), x* = Ku* , y∗ = ∇u∗ , and Lemma 4.3 yields 2 † − f 2 = y∗ 1 + λ2x ∗ − f 22 † 2 = lim yk 1 + λ2 x k − f 2 k→+∞ 2 † = lim ∇uk 1 + λ2 K uk − f 2 .
∇u∗ 1 +
λ† Ku∗ 2
(4.59)
k→+∞
Further, by Lemma 4.3, there is 2 † lim LA uk , x k , yk ; μk , ξ k ; λ† = lim yk 1 + λ2 x k − f 2 k→+∞ k→∞ † = y∗ 1 + λ2 x ∗ − f 22 = LA u∗ , x ∗ , y∗ ; μ∗ , ξ ∗ ; λ† .
(4.60)
Equation (4.60) shows that any cluster point of the sequence {uk , xk , yk ; μk , ξ k } is a saddle point of LA (u, x, y; μ, ξ; λ† ). Rewriting LA (u, x, y; μ, ξ; λ† ) as LA (λ† ) =
88
4 Fast Parameter Estimation in TV-Based Image Restoration
√ √ √ √ √ √ √ √ * / β1 , ξ * / β2 ) LA (u, β1 x, β2 y; μ/ β1 , ξ / β2 ;√λ† ), then√(u* , β1 x* ,√ β2 y* ; μ √ or any cluster of the sequence {uk , β1 xk , β2 yk ; μk / β1 , ξ k / √β2 } is a√saddle point√of LA (λ√† ). By Lemma 4.3 and Lemma 4.4, there must be {uk , β1 xk , β2 yk ; † k μ / β1 , ξ k / β2 } converge to a point in the L √ saddle √point kset of √A (λ ).k According √ k k k to Eq. (4.25) and the convergence of {u , β1 x , β2 y ; μ / β1 , ξ / β2 }, the convergence of {uk } can be obtained. Therefore {uk , xk , yk ; μk , ξ k } must converge to a saddle point of L A (u, x, y; μ, ξ; λ† ). We yield the proof of Lemma 4.5. Lemma 4.5 shows that {u}k converges to the solution of the constrained problem (4.1–4.16) with λ = λ† . On the other hand, the acquisition of λ† strictly follows the discrepancy principle. Therefore, λ† is the λ* to be found, which makes u* a solution to the constrained problem (4.1). Summarizing the above discussion, the following convergence theorem of the algorithm APE-ADMM can be obtained. Theorem 4.2 Let the sequence {uk } and {λk } be generated by APE-ADMM, then {u}k converges to the solution of the constraint problem Eq. (4.1), while {λk } converges to the optimal regularization parameter λ* corresponding to the constraint u ∈ Ψ. From Lemma 4.5 and Theorem 4.2, it follows that the solution of problem (4.1) may not be unique due to the singularity of the blur matrix K. However, due to the convexity of problem (4.1), each solution leads to the same minimum value of problem (4.1). Thus, each solution can be viewed as a reasonable estimate of the original image.
4.3.4 Parameter Settings The upper bound c in problem (4.1) is noise-dependent [6, 7, 10], and if c is chosen appropriately, a good balance between noise suppression and image restoration can be achieved. In this book, the median criterion [7] based on the wavelet transform is chosen to estimate the noise variance σ 2 . Once the noise variance is determined, the equation c = τ mnσ 2 can be used to find c. τ = 1 is a more traditional choice, but studies in [7] show that this choice leads to an oversmoothed solution in the case of low noise levels, which suggests that the value of λ is too small in that case and τ < 1 should be set. In fact, there is no uniform method for selecting τ so far, and this is an open problem in the field of image inverse problems that deserves further study. For the Tikhonov regular method, a feasible way is the equivalent degrees of freedom 2 (EDF) method. It estimates τ by solving K u∗ (λ) − f 2 = EDF · σ 2 to estimate τ. However, the EDF method is difficult to be directly transferred to the TV regularization method because closed-form solutions do not exist [6, 7, 10] for u* (λ). Another practical way to choose τ is to adjust τ according to the blurred signal-tonoise ratio (BSNR), where BSNR = 10log10 (var( f )/σ 2 ) and var( f ) is the variance of f . In this book, by fitting the experimental results to a straight line, it is suggested
4.4 Extension of Fast Adaptive Parameter Estimation Algorithm
89
to set τ = − 0.006BSNR + 1.09 in the deblurring experiments and τ = − 0.03BSNR + 1.09 in the denoising experiments. Although the parameters of the fitted straight line need to be adjusted for different types of problems, this strategy is to some extent robust to changes in image type and size. A similar approach can be found in some other TV recovery work [6, 8, 10]. Another issue that needs to be emphasized is the choice of penalty parameters β 1 and β 2 . The simple setting β 1 = β 2 > 0 is sufficient to guarantee the convergence of the algorithm. However, it is clear from the definition of L A in Eq. (4.6) that λ penalizes the distance between x and f , β 1 penalizes the distance between x and Ku, and β 2 penalizes the distance between y and ∇u. Ku will be closer to f when the BSNR of the observed image is higher, in which case λ should be larger. Thus, to induce Ku to converge faster to f , a larger β 1 should be chosen. This suggests that a higher BSNR implies a larger β 1 . Many experiments have shown that APE-ADMM converges at a faster rate when setting β 1 = (0.1BSNR−1) and β 2 = 1. By choosing different weights of β 1 and β 2 , the proposed algorithm becomes more flexible.
4.4 Extension of Fast Adaptive Parameter Estimation Algorithm 4.4.1 Equivalent Splitting Bregman Algorithm The splitting Bregman form of the APE-ADMM can be easily derived based on the equivalence of the ADMM and the splitting Bregman algorithm under linear constraints. From the constrained optimization problem (4.5), the Bregman function is defined as J (u, x, y) y1 +
λ x − f 22 . 2
(4.61)
Define Bregman distance ( pk , pk , pk ) y) − J uk ,x k, yk D J u x y u,x, y;uk ,xk , y k J (u,x, − pku ,u − uk − pkx ,x − x k − pky , y − yk .
(4.62)
According to the iterative rule (4.1–4.39) of the splitting Bregman algorithm (applied simultaneously to the two linear constraints), one obtains
( pk , pk , pk ) uk+1 , x k+1 , yk+1 =arg min D J u x y u, x, y; uk , x k , yk u,x, y
+
β2 β1 x − K u22 + y − ∇u22 ; 2 2
(4.63)
90
4 Fast Parameter Estimation in TV-Based Image Restoration
pk+1 = pku + β1 K T x k+1 − K uk+1 + β2 ∇ T yk+1 − ∇uk+1 ; u
(4.64)
pk+1 = pkx + β1 K uk+1 − x k+1 ; x
(4.65)
pk+1 = pky + β2 ∇uk+1 − yk+1 . y
(4.66)
⎧ 0 0 T 0 ⎨ pu −β1 K b − β2 ∇ T d , 0 p0 β1 b , ⎩ 0x p y β2 d 0 .
(4.67)
Define
According to Eqs. (4.64)–(4.66) above, it follows necessarily that ⎧ k k T k ⎨ pu = −β1 K b − β2 ∇ T d , k k k = 0, 1, . . . p = β1 b , ⎩ kx p y = β2 d k .
(4.68)
Therefore, the following iterative rule can be obtained. ⎧ 2 ⎪ uk+1 , x k+1 , yk+1 = arg min λ2 x − f 22 + β21 x − K u − bk 2 + y1 ⎪ ⎪ u,x, y ⎪ ⎨ 2 + β22 y − ∇u − d k 2 ; ⎪ ⎪ bk+1 = bk + K uk+1 − x k+1 ; ⎪ ⎪ ⎩ k+1 = d k + ∇uk+1 − yk+1 . d (4.69) Using an alternating strategy similar to that of ADMM, it follows that uk+1 = arg min u
β1 x k − K u − bk 2 + β2 yk − ∇u − d k 2 ; 2 2 2 2
yk+1 = arg min y1 + yi
x k+1 = arg min x
β2 y − ∇u − d k 2 ; 2 2
2 λk+1 β1 x − f 22 + x − K uk+1 − bk 2 ; 2 2
(4.70) (4.71)
(4.72)
bk+1 = bk + K uk+1 − x k+1 ;
(4.73)
d k+1 = d k + ∇uk+1 − yk+1 .
(4.74)
4.4 Extension of Fast Adaptive Parameter Estimation Algorithm
91
The above updates of uk+1 , yk+1 , xk+1 , and λk+1 are similar to those in APE-ADMM and will not be repeated here. The following parameter-adaptive splitting Bregman algorithm is obtained from the above discussion. Algorithm 4.2 Adaptive Parameter Estimation for Splitting Bregman Algorithm (APE-SBA) Step 1: Enter f , K, c. Step 2: Initialize u0 , x0 , y0 , b0 , d 0 = 0, k = 0, β 1 , β 2 > 0. Step 3: Determine whether the termination condition is met; if not, perform the following steps. Step 4: Calculate uk+1 by means of Eq. (4.70). Step 5: Calculate yk+1 by means of Eq. (4.71). Step 6: Update λk+1 and xk+1 by means of Eq. (4.72) (in the same way as APEADMM). Step 7: Update bk+1 and d k+1 by means of Eqs.(4.73) and (4.74). Step 8: k = k + 1. Step 9: End the loop and output uk+1 and λk+1 . From Eqs. (4.70)–(4.74), it is easy to find that APE-ADMM is exactly equivalent to APE-SBA if we set μ = β 1 b and ξ = β 2 d.
4.4.2 Interval Constrained TV Image Restoration with Fast Adaptive Parameter Estimation The adaptive parameter estimation method employed in the algorithm APE-ADMM can be easily extended to image restoration problems with interval constraints. The interval constraint on pixel values refers to the restriction of image pixel values to a given dynamic range ([0, 255] in this book), while some literature considers only the positivity constraint [20, 21] on pixel values. If a large number of pixels in the image take values that lie at both ends of a given dynamic range, such as taking 0 or 255, the interval constraint can significantly improve the quality [22, 23] of the restored image in terms of both quantitative evaluation and visual effects. Consider the following interval constrained TV regularized image restoration problem min∇u 1 s.t. u ∈ {u : 0 ≤ u ≤ 255} ∩ u : K u − f 22 ≤ c . u
(4.75) Introducing three auxiliary variables x, y and z in place of Ku, ∇u and u respectively yields min ∇u 1 +
x, y,z
λ x − f 22 + ι (z) s.t. K u = x, ∇u = y, u = z. 2
(4.76)
92
4 Fast Parameter Estimation in TV-Based Image Restoration
The augmented Lagrangian function for the optimization problem (4.76) is defined as LA (u, x, y, z; μ, ξ , η) λ2 x − f 22 − μT (x − K u) + + y1 − ξ T ( y − ∇u) + β22 y − ∇u22 + ι (z) −η T (z − u) + β23 z − u22 .
β1 x 2
− K u22 (4.77)
Similar to the derivation process of the algorithm APE-ADMM, it is obtained that −1 u = β1 K T K + β2 ∇ T ∇ + β3 I T K β1 x k − μk + ∇ T β2 yk − ξ k + β3 z k − ηk .
(4.78)
The variables yk+1 , λk+1 , xk+1 , μk+1 and ξ k+1 are solved exactly as in the algorithm APE-ADMM. Besides, we have z k+1 = arg minι (z) − (ηk )T z + z
k β3 z − uk+1 2 = P (uk+1 + η ) 2 2 β3
(4.79)
and ηk+1 = ηk − β3 z k+1 − uk+1 .
(4.80)
In Eq. (4.79), P is the projection operator projected onto the convex set . In the interval constraint of this paper, the process is implemented by setting the pixel values less than 0 to 0 and the pixel values greater than 255 to 255, while the other pixel values remain unchanged. Summarizing the above discussion, the following APE-ADMM algorithm with interval constraints can be obtained. Algorithm 4.3 Adaptive Parameter Estimation for Box/Interval Constrained Alternate Directional Multiplier Method (APE-BCADMM) Step 1: Enter f , K, c. Step 2: Initialize u0 , x0 , y0 , μ0 , ξ 0 , η0 = 0, k = 0, β 1 , β 2 , β 3 > 0. Step 3: Determine whether the termination condition is met; if not, perform the following steps. Step 4: Calculate uk+1 by means of Eq. (4.25). Step 5: Calculate yk+1 by means of Eq. (4.27). Step 6: Calculate zk+1 by means of Eq. (4.79). Step 7: If Eq. (4.30) holds, then λk+1 = 0 and xk+1 = ak+1 = Kuk+1 + μk /β 1 , otherwise update λk+1 and xk+1 by Eqs. (4.32) and (4.29), respectively. Step 8: Update μk+1 , ξ k+1 , and ηk+1 by means of Eqs. (4.22), (4.23), and (4.80). Step 9: k = k + 1. Step 10: End the loop and output uk+1 and λk+1 .
4.5 Experimental Results
93
The algorithm APE-BCADMM is not covered in the experimental analysis that follows, and its experimental results are presented in the pixel interval constraint validity experiments in Chapter 6.
4.5 Experimental Results This section sets up three experiments to verify the effectiveness of the proposed APEADMM algorithm, each for a specific purpose: (i) The first purpose is to reveal the importance of adaptive regularization parameter selection by comparing it with two well-known TV algorithms. These two algorithms are the fast total variation deconvolution algorithm (FTVD-v41 ) [3] and the fast iterative shrinkage/thresholding algorithm (FISTA2 ) [24]. FTVD-v4 combines variation splitting and ADM method for image restoration, while FISTA is a forward–backward splitting method, and their common advantage is fastness. In both methods, the regularization parameters are manually selected by trial and error, so that λ is fixed during the algorithm execution. Comparative experiments show that, in general, APE-ADMM automatically finds the optimal λ and converges faster than FTVD-v4 and FISTA, whose results are very sensitive to changes in λ. (ii) The second objective is to compare APE-ADMM with three other well-known adaptive TV regularization algorithms, namely WenChan[7], C-SALSA3 [6] and Ng-Weiss-Yuan[10]. The experimental results show that the proposed algorithm outperforms the other algorithms in terms of both speed and PSNR. Because the experiments were selected with different image sizes, the variation of the algorithm speed with respect to the image size was also well shown. (iii) The third objective is to show the competitiveness of APE-ADMM in denoising by making a comparison with the well-known Chambolle algorithm [12, 25] and the TV-based adaptive splitting Bregman method [2, 26]. The next four subsections detail the above four experiments. The MATLAB experimental platform is a Windows 7 desktop computer configured with an Intel Core(TM) i5 CPU (3.20 GHz) and 8 GB RAM. The quality of the observed and restored images is evaluated by PSNR. The four test images with dimensions of 256 × 256 (Lena), 512 × 512 (Boat and Barbara), and 1024 × 1024 (man) are given in Fig. 4.1.
1
http://www.caam.rice.edu/~optimization/L1/ftvd/v5-1/. http://iew4.technion.ac.il/~becka/papers/tv_fista. 3 http://cascais.lx.it.pt/~mafonso/salsa.html. 2
94
4 Fast Parameter Estimation in TV-Based Image Restoration
Fig. 4.1 Test images: Lena, boat, barbara, and man
4.5.1 Experiment 1: Implications for Significance Regularization Parameter Estimation The Experiment 1 first explains why adaptive selection of λ is more attractive in image restoration and shows that the proposed algorithm is able to perform adaptive estimation of optimal parameter quickly and efficiently. Here “optimal” means that the PSNR is maximized. The comparison experiment involves both FTVD-v4 and FISTA methods, and two images, Boat and Man. In FISTA, the regularization parameter appears as a regularization term multiplier, while in FTVD-v4 and APE-ADMM it appears as a fidelity term multiplier. For comparison purpose, the regularization parameter of FISTA used here is actually the inverse of the parameter in the original method. After applying a mean blur A (9) to Boat and a motion blur M (30, 30) to Man, Gaussian noise with a variance of 4 and 20 is added to the blurred images to obtain the final two degraded images, respectively. The PSNRs of the degraded images Boat and Man are 23.30 dB and 21.64 dB, respectively. The termination criterion of the algorithm is set uniformly as k+1 u − uk 2 ≤ 10−6 , uk
(4.81)
2
or the number of iteration steps reaches 1000, where uk denotes the recovery result of the kth step. Table 4.1 gives the experimental results of APE-ADMM, FTVD-v4, and FISTA under their respective optimal λ, including PSNR, number of iteration steps, and CPU time. As can be seen from Table 4.1, the final parameter values of APE-ADMM are all close to the optimal parameter values of the other two algorithms; the PSNRs of APE-ADMM, FTVD-v4, and FISTA are basically at the same level; when comparing the CPU times, APE-ADMM significantly outperforms the other two algorithms. Figure 4.2 gives the PSNR variation curves of the three algorithms relative to CPU time. It is not difficult to find that for the Boat and Man image restoration experiments, the PSNR of APE-ADMM rises and converges faster than the other two algorithms. Figure 4.3 shows that although λ fluctuates at the initial stage of iteration, APEADMM can quickly find the optimal λ as the number of iteration steps increases. Note that λ may achieve zero value in the middle, which is allowed in the proposed
4.5 Experimental Results
95
algorithm. Unlike the algorithm with fixed λ, the objective function value of the proposed algorithm with respect to the minimization function Eqs. (4.1–4.16) does not decrease monotonically, and it changes with λ. Table 4.1 Comparison of results for APE-ADMM, FTVD-v4, and FISTA at optimal λ Image
Method
Optimal λ
PSNR (dB)
Boat
APE-ADMM
10.19
28.62
Man
CPU (s)
134
7.91
FTVD-v4
11.00
28.60
1000
49.26
FISTA
10.50
28.47
1000
572.40
3.18
26.83
166
44.89
APE-ADMM FTVD-v4
3.30
26.82
1000
221.43
FISTA
3.20
26.80
1000
3272.02
30
30 25
PSNR (dB)
25
PSNR (dB)
Step
20
15
APE-ADMM FTVD-v4 FISTA
5 -1 10
10
10
10
10
3
2
1
0
-1
10
15
10
APE-ADMM FTVD-v4 FISTA
10 -2 10
20
10
10
10
10
4
3
2
1
0
10
Seconds (log scale)
Seconds (log scale)
(a) Boat
(b)Man
Fig. 4.2 PSNR curves of APE-ADMM, FTVD-v4 and FISTA 4
14
10
10
Boat Man
Boat Man
12
10
2
λ
Function value
10
0
10
10
10
8
10
6
-2
10
0
20
40
60
80
100
120
140
Iterations
(a) λ versus iterations
160
180
10
0
20
40
60
80
100
120
140
160
180
Iterations
(b) Objective value versus iterations
Fig. 4.3 Curves of λ and objective function values relative to the number of iteration steps for APE-ADMM
96
4 Fast Parameter Estimation in TV-Based Image Restoration
It is worth mentioning that APE-ADMM can find the optimal λ by the above adaptive approach, while FTVD-v4 and FISTA need to obtain the optimal parameters by trial-and-error. In fact, in this experiment, the optimal parameters of FTVD-v4 and FISTA are obtained with the help of the APE-ADMM algorithm. Since all three algorithms are TV-regularized, their objective functions have the same form and thus it is reasonable to believe that their optimal λs are similar. Using the nearest neighbor integer of the APE-ADMM optimal parameter value λ* as the reference point (denote the integer as round(λ* )), 11points (containing endpoints) are selected at equal intervals in the interval [0.5 round(λ* ), 2 round(λ* )] as the possible optimal λ for FTVD-v4 and FISTA. The change curves of their PSNR with respect to λ are then calculated. The curves corresponding to the Boat and Man images are given in Fig. 4.4a and b, respectively. After more refined adjustment, the optimal λ values for FTVD-v4 and FISTA shown in Table 4.1 are obtained. From Fig. 4.4, it can be found that the PSNRs of both FTVD-v4 and FISTA are sensitive to changes in λ and achieve reasonable restoration results only in the vicinity of the optimal λ. Once λ deviates too much from the optimal value, their PSNRs drop rapidly. Figure 4.5 give the Man recovery images of the three algorithms for different λ. When the optimal λ is used, FTVD-v4 and FISTA can obtain similar results to APE-ADMM both in terms of PSNR and visual effects. However, when setting λ = 0.5round (λ* ), both FTVD-v4 and FISTA obtain over-smoothed results, and the texture structures such as hair in their Man image restorations are still blurred. In contrast, when setting λ = 2round (λ* ), both FTVD-v4 and FISTA obtain noise-containing results. In fact, as the image size increases, the process of obtaining the optimal λ by non-adaptive methods becomes more complicated. Moreover, the optimal λ is sensitive to changes in image size, image type, and blur type. 27
28.8
FTVD-v4 FISTA
PSNR (dB)
PSNR (dB)
28.6 28.4
28.2 28
27.8
FTVD-v4 FISTA
26.8 26.6 26.4 26.2 26
5
10
λ
15
(a) PSNR relative to λ for Boat
20
25.8
1
2
3
λ
4
5
(b) PSNR relative to λ for Man
Fig. 4.4 PSNR curves for FTVD-v4 and FISTA relative to λ
6
4.5 Experimental Results Degraded image, PSNR=21.64
97 APE-ADMM, λ=3.18, PSNR=26.83
FTVD-v4, λ=3.30, PSNR=26.82
FISTA, λ=3.20, PSNR=26.80
FTVD-v4, λ=1.5, PSNR=26.33
FISTA, λ=1.5, PSNR=26.39
FTVD-v4, λ=6, PSNR=26.20
FISTA, λ=6, PSNR=25.92
Fig. 4.5 Man recovery images of APE-ADMM, FTVD-v4, and FISTA at different λ
98
4 Fast Parameter Estimation in TV-Based Image Restoration
4.5.2 Experiment 2—Comparison with Other Adaptive Algorithms In Experiment 2 in this subsection, the proposed algorithm is compared with three other well-known adaptive TV image restoration methods. The comparison involves three aspects: speed, accuracy, and parameter selection. The three methods involved in the comparison are Wen-Chan [7] based on the primal–dual model, C-SALSA [6] based on ADMM, and Ng-Weiss-Yuan [10] based on ADMM. These methods have been described in detail in the introduction part of this chapter (Sect. 4.1) and will not be repeated here. Table 4.2 gives the 5 individual background problems chosen for this experiment. In problems 3A and 3B, i, j = −7, . . . , 7. Three images of different sizes Lena (256 × 256), Boat (512 × 512) and Man (1024 × 1024) are used in the experiments. The stopping criterion of the algorithms is the same as in the previous subsection. The other parameter settings for the three comparison algorithms follow the original reference. Table 4.3 gives the comparison results for the four aspects of PSNR, iteration steps, CPU time, and the final regularization parameter λ. The optimal result for each comparison term is marked in bold. “–” indicates that the algorithm does not give a result for that term. Because the results of the algorithm Ng-Weiss-Yuan are all obtained when the image pixel value range is set to [0, 1], thus, the image pixel values is divided 255 when this algorithm is used. In addition, to ensure that the PSNR is comparable with respect to the other algorithms, the noise variances in Table 4.2 are also transformed when Ng-Weiss-Yuan is employed. The optimal λ for Ng-Weiss-Yuan is obtained with the pixel values limited to [0, 1] rather than [0, 255], and the λ obtained in these two cases does not have a clear correspondence. Thus, only parameter values of APE-ADMM and Wen-Chan are chosen for comparison. Figure 4.6 gives the variation curves of PSNR and λ with respect to CPU time for different algorithms, corresponding to the Boat image, in the context of problem 2B. The corresponding degraded and restored images are then given in Fig. 4.7. Some results can be seen from Table 4.3, Figs. 4.6 and 4.7. First, compared to other algorithms, APE-ADMM is able to obtain the highest PSNR in the shortest time with the least number of iteration steps. If the CPU time is divided by the number of iteration steps, it is also found that APE-ADMM has the highest single-step execution efficiency. This result is in line with the vision: by adaptively updating the Table 4.2 Adaptive image deblurring experimental settings table
Problem
Blur kernel
Variance of Noise (σ 2 )
1
A(9)
0.562
2A
G(9, 3)
2
2B
G(9, 3)
8
3A
hi j = 1/(1 + i2 + j2 )
2
3B
hi j = 1/(1 +
8
i2
+
j2 )
3B
3A
2B
6.49
–
1215.7
2.41
2.31
–
433.62
C-SALSA
Ng-Weiss-Yuan
APE-ADMM
Wen-Chan
C-SALSA
Ng-Weiss-Yuan
400.69
Ng-Weiss-Yuan
Wen-Chan
–
C-SALSA
6.53
2.66
APE-ADMM
4.27
1317.98
Ng-Weiss-Yuan
Wen-Chan
–
C-SALSA
APE-ADMM
13.89
Wen-Chan
6150.2
Ng-Weiss-Yuan
23.48
–
C-SALSA
APE-ADMM
44.63
Wen-Chan
2A
46.29
APE-ADMM
1
28.91
29.02
29.28
29.44
30.64
30.54
30.98
31.18
26.56
26.77
26.82
27.15
27.36
27.64
27.91
28.08
29.77
30.17
30.31
30.37
956
187
515
149
700
490
367
134
1000
372
1000
190
1000
402
1000
165
856
191
442
126
22.75
4.80
7.41
1.24
16.25
13.19
5.31
1.12
24.38
9.58
14.22
1.58
23.80
10.38
14.53
1.40
20.06
4.95
6.50
1.07
Time
520.75
–
2.78
2.83
1368.36
–
8.15
7.86
563.48
–
4.77
6.82
2076.94
–
16.62
23.68
7301.47
–
53.04
51.88
28.84
28.86
29.16
29.35
30.44
30.46
30.87
31.06
26.32
26.63
26.89
27.21
27.51
27.69
28.08
28.41
30.64
30.78
31.10
31.22
PSNR
Boat 512 × 512 Step
λ
PSNR
Lena 256 × 256
Method
Problem
λ
678
293
451
126
526
532
311
113
1000
729
853
165
944
675
593
149
561
814
321
111
Step
129.06
75.73
43.58
6.13
100.95
137.73
30.03
5.49
187.79
188.83
82.03
8.04
174.58
174.85
57.30
7.26
109.08
210.62
31.15
5.39
Time
459.29
–
2.45
2.85
1212.92
–
7.19
7.70
448.73
–
3.10
7.55
1383.43
–
11.87
27.28
6252.58
–
46.82
52.92
λ
30.11
30.20
30.39
30.66
31.66
31.69
32.02
32.28
27.95
28.18
28.22
28.55
28.72
28.99
29.13
29.39
30.81
31.01
31.29
31.48
PSNR
Man 1024 × 1024
Table 4.3 Comparison of different algorithms in terms of λ, PSNR (dB), number of iteration steps, and CPU time consumed (s)
920
233
520
136
710
595
396
121
1000
601
1000
177
1000
439
877
159
776
811
399
119
Step
933.26
257.13
211.74
30.12
626.37
660.39
161.91
26.88
1018.88
663.14
405.97
39.20
882.00
485.80
357.90
34.91
634.08
895.88
163.39
26.05
Time
4.5 Experimental Results 99
100
4 Fast Parameter Estimation in TV-Based Image Restoration
(a) Degraded Boat 28
4
10
APE-ADMM Wen-Chan
26
22
λ
PSNR(dB)
3
10
24
2
10
20 1
10
APE-ADMM Wen-Chan C-SALSA Ng-Weiss-Yuan
18 16 -2
10
-1
10
0
10
1
10
2
10
0
3
10
10 -2 10
-1
10
Seconds (log scale)
(b) PSNR versus CPU time
0
10
1
10
2
10
Seconds (log scale)
(c) λ versus CPU time
Fig. 4.6 Experimental results of the Boat image under problem 2B
regularization parameter λ in closed-form, APE-ADMM obtains a higher execution efficiency based on the exclusion of the inner iterations. Moreover, the single-step execution efficiency and accuracy of the other three methods are affected by the setting of the in-iteration parameters. Thanks to the selection strategy of β 1 and β 2 , the proposed algorithm can complete the image restoration task with the least number of iteration steps. Second, as can be seen from Table 4.3, APE-ADMM is still able to maintain its advantage over other algorithms well when the background problem and the image change, and the larger the image size, the greater the advantage in speed. The higher PSNR also indicates that APE-ADMM can obtain more reasonable λ. Figure 4.7 demonstrates that the Boat image restoration results of APE-ADMM under problem 2B are better than those of Wen-Chan in terms of detail, which is due to the fact that the λ obtained by the algorithm Wen-Chan is too small, which means that the restored image is oversmoothed.
4.5 Experimental Results
APE-ADMM, PSNR=27.21
C-SALSA, PSNR=26.63
101
Wen-Chan, PSNR=26.89
Ng-Weiss-Yuan, PSNR=26.32
Fig. 4.7 Boat restoration images of APE-ADMM, Wen-Chan, C-SALSA and Ng-Weiss-Yuan in the context of problem 2B
4.5.3 Experiment 3—Comparison of Denoising Experiments The above experiments show that APE-ADMM is fast and effective in the deblurring problem. This subsection then demonstrates the potential of APE-ADMM for image denoising by comparing it with two other well-known adaptive TV denoising algorithms. These two algorithms are the Chambolle projection algorithm [12, 25] and the adaptive splitting Bregman denoising algorithm [2, 26], both of which employ the regularization parameter selection strategy described in [12]. Both algorithms
102
4 Fast Parameter Estimation in TV-Based Image Restoration
Table 4.4 PSNR (dB) for image denoising experiment σ2
Noised image
APE-ADMM
Chambolle
Split Bregman
Barbara 512 × 512 100
28.14
30.88
30.12
29.91
400
22.14
26.78
26.38
26.16
900
18.73
24.89
24.82
24.40
Boat 512 × 512 100
28.15
32.42
31.77
31.25
400
22.17
29.03
28.51
28.11
900
18.75
26.92
26.72
26.19
used here are online versions.4 In contrast to the proposed algorithms, the parameter selection strategies adopted by these two algorithms are specifically designed for image denoising and it is not yet certain whether they can be used for image deblurring. The images used in this experiment are Barbara and Boat. First Gaussian white noise with variance of 100, 400, and 900 are added to the two images mentioned above, respectively. Then, three methods are used to achieve noise removal. Table 4.4 compares the PSNR of the three algorithms. Because both the Chambolle algorithm and the splitting Bregman algorithm are online algorithms, this experiment does not include CPU time as a comparison. From Table 4.4, it can be seen that APE-ADMM can obtain higher PSNR than the other two algorithms. The restoration results and error Barbara images of different algorithms under a noise variance of 400 are given in Figs. 4.8 and 4.9, respectively. For better visual effect, the difference between the restored image and the original image; the error image is affine projected onto the [0, 255] interval. The relative errors represented by the error images are 9.58%, 10.04%, and 10.29%, respectively. In addition to the advantage in terms of PSNR, Fig. 4.9 show that APE-ADMM outperforms the other two algorithms in terms of detail preservation, with less texture loss. The better recovery quality suggests that the proposed algorithm can find a better λ when the regularizers are all TV models.
4
http://www.ipol.im/.
4.5 Experimental Results
Degraded image, σ2=400, PSNR=22.14
Chambolle, PSNR=26.38
103
APE-ADMM, PSNR=26.78
Split Bregman, PSNR=26.16
Fig. 4.8 Barbara image restorations of APE-ADMM, Chambolle, and splitting Bregman algorithms at σ 2 = 400
104
4 Fast Parameter Estimation in TV-Based Image Restoration
APE-ADMM, Relative error=9.58%
Chambolle, Relative error =10.04%
Split Bregman, Relative error =10.29%
Fig. 4.9 Error images of APE-ADMM, Chambolle, and splitting Bregman algorithms at σ2 = 400
References 1. Cai J, Dong B, Osher S, Shen Z (2012) Image restoration: total variation, wavelet frames, and beyond. J Am Math Soc 25(4):1033–1089 2. Goldstein T, Osher S (2009) The split Bregman method for L1-regularized problems. SIAM J Imag Sci 2(2):323–343 3. Wang Y, Yang J, Yin W, Zhang Y (2008) A new alternating minimization algorithm for total variation image reconstruction. SIAM J Imag Sci 1(3):248–272 4. Yang J, Yin W, Zhang Y, Wang Y (2009) A fast algorithm for edge-preserving variational multichannel image restoration. SIAM J Imag Sci 2(2):569–592 5. Wu C, Tai X (2010) Augmented Lagrange method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models. SIAM J Imag Sci 3(3):300–339
References
105
6. Afonso MV, Bioucas-Dias JM, Figueiredo MAT (2011) An augmented Lagrange approach to the constrained optimization formulation of imaging inverse problems. IEEE Trans Image Process 20(3):681–695 7. Wen Y, Chan RH (2012) Parameter selection for total-variation-based image restoration using discrepancy principle. IEEE Trans Image Process 21(4):1770–1781 8. Wen Y, Yip AM (2009) Adaptive parameter selection for total variation image deconvolution. Numer Math Theory Methods Appl 2(4):427–438 9. Afonso MV, Bioucas-Dias JM, Figueiredo M (2010) Fast image recovery using variable splitting and constrained optimization. IEEE Trans Image Process 19(9):2345–2356 10. Ng M, Weiss P, Yuan X (2010) Solving constrained total-variation image restoration and reconstruction problems via alternating direction methods. SIAM J Sci Comput 32(5):2710–2736 11. Blomgren P, Chan T (2002) Modular solvers for image restoration problems using the discrepancy principle. Numer Linear Algebra Appl 9(5):347–358 12. Chambolle A (2004) An algorithm for total variation minimization and applications. J Math Imaging Vis 20(1–2):89–97 13. Chan T, Shen J (2005) Image processing and analysis: variational, PDE, wavelet, and stochastic methods. SIAM, Philadelphia 14. Deng W, Yin W (2012) On the global and linear convergence of the generalized alternating direction method of multipliers. UCLA CAM Report 12–52, UCLA, Los Angeles 15. Goldstein T, O’Donoghue B, Setzer S (2012) Fast alternating direction optimization methods. UCLA CAM Report 12–35, UCLA, Los Angeles 16. Setzer S (2011) Operator splittings, Bregman methods and frame shrinkage in image processing. Int J Comput Vis 92(3):265–280 17. Ekeland I, Témam R (1999) Convex analysis and variational problems (classics in applied mathematics). SIAM, Philadelphia, PA, USA 18. Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer-Verlag, New York, NY 19. Bauschke HH, Combettes PL (2011) Convex analysis and monotone operator theory in hilbert spaces. Springer, New York 20. Ma J (2010) Positively constrained multiplicative iterative algorithm for maximum penalized likelihood tomographic reconstruction. IEEE Trans Nucl Sci 57(1):181–192 21. Chan RH, Liang H, Ma J (2011) Positively constrained total variation penalized image restoration. Adv Adapt Data Anal 3(1/2):187–201 22. Chan RH, Ma J (2012) A multiplicative iterative algorithm for box-constrained penalized likelihood image restoration. IEEE Trans Image Process 21(7):3168–3181 23. Chan RH, Tao M, Yuan X (2013) Constrained total variational deblurring models and fast algorithms based on alternating direction method of multipliers. SIAM J Imag Sci 6(1):680–697 24. Getreuer P (2011) Contour stencils: total variation along curves for adaptive image interpolation. SIAM J Imag Sci 4(3):954–979 25. Duran J, Coll B, Sbert C (2013) Chambolle’s projection algorithm for total variation denoising. Image Process Line 3:311–331 26. Getreuer P (2012) Rudin–Osher–Fatemi total variation denoising using split Bregman. Image Process Line 2:74–95
Chapter 5
Parallel Alternating Derection Method of Multipliers with Application to Image Restoration
5.1 Summarize The choice of regularization strategy in image restoration problems directly affects the correctness of the final estimation results and the complexity of the solution process, which in most cases are difficult to balance. The inverse filter without regularization cannot solve the ill-posed image inverse problem; the early second-order Tikhonov regularization can be conveniently and analytically solved in frequency domain by Wiener filtering, but the overly strong regularity makes its edge-preserving ability limited; the most commonly used TV regularization or wavelet regularization based on l 1 -sparity can better preserve the image edges, but they do not have closedform solutions. Furthermore, it has been shown that TV regularization is optimal only in the processing of piecewise constant images [1]; the wavelet transform can sparsely characterize point singularities or isotropic features in an image, but cannot sparsely characterize anisotropic features [2] such as image edges or curves. Therefore, the results of both common methods are visually deficient. Compound regularization methods can combine the advantages of multiple regularization means to obtain superior results, but this often leads to more complex optimization problems. The study in Chap. 4 only considers the solution of the constrained TV regularization image restoration problem, while a single TV regularization will inevitably produce staircasing effects when applied to natural images. Fully exploiting the smoothness [3–15] and sparsity of the image itself [16–19] and constructing a better image regularization strategy is a fundamental way to further improve the quality of the restored image.
© Chemical Industry Press 2023 C. He and C. Hu, Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems, Advanced and Intelligent Manufacturing in China, https://doi.org/10.1007/978-981-99-3750-9_5
107
108
5 Parallel Alternating Derection Method of Multipliers with Application …
More refined regularization strategies necessarily introduce more complex optimization functions. Discovering the common features among different regularization strategies and constructing generic optimization models for image inverse problems are the basis for deriving more general and powerful algorithms for image inverse problems. The ADMM method used in Chap. 4 is a generic optimization algorithm with applications in many fields, and its further parallel generalization is gaining more and more attention [20, 21]. This chapter investigates the problem of solving compound regularized image restoration, establishes a generic image inverse problem optimization model, and proposes the parallel alternating direction method of multipliers (PADMM) [22] for solving the generic objective function model. The objective function can be decomposed into several individually solvable subproblems and the algorithm is given by Moreau decomposition. The proposed PADMM algorithm is applied to the solution of the image restoration problem with compound regularization based on total generalized variation (TGV) and shearlet, and the PADMM for TGV/ shearlet image restoration (PADMM-TGVS) is proposed. The effectiveness of the proposed algorithm is verified by single/multi-channel image deblurring experiments and image compression-aware experiments. When applied to image restoration, the PADMM algorithm incorporates the regularization adaptive parameter estimation strategy proposed in Chap. 4. Adaptive image restoration algorithms are difficult to be extended to multi-channel image processing because they cannot handle the tricky inter-channel blur well. With some reasonable improvements, PADMM can be easily extended to some other image inverse problems, such as image inpainting and image decompression. The chapter is structured as follows: Sect. 5.2 develops a generic image inverse problem optimization function model and gives its saddle point condition, derivation procedure, and primal-dual form of( the for solving this model. Section 5.3 / PADMM ) gives a convergence proof and O 1 k convergence rate analysis of the proposed algorithm. Section 5.4 then details the strategy of applying the PADMM algorithm in total generalized variation/shearlet regularized image restoration. Section 5.5 gives relevant comparative experimental results.
5.2 Parallel Alternating Direction Method of Multipliers In order to obtain better results in image restoration, this chapter focuses on the solution of the compound regularized inverse problem. In the following, the generic objective function model describing the regularized image restoration is first given, while later, the PADMM for solving the generic model is proposed.
5.2 Parallel Alternating Direction Method of Multipliers
109
5.2.1 A General Description of the Regularized Image Restoration Objective Function The compound regularizer considered in this chapter is linear combinations of several regularizers. Denote the set of all convex, proper, and lower semicontinuous functions mapped from Hilbert space X to (−∞, + ∞] as ⎡0 {X }, and then the generic convex objective function considered here can be described as min e(x) + x∈X
H ∑
f h (L h x),
(5.1)
h=1
where e ∈ ⎡0 {X }, f h ∈ ⎡0 {Vh }, and f h is sufficiently “simple” in the sense that the proximity operator exists in closed-form or can be easily solved. L h (X → Vh ) is a linear bounded operator whose Hilbert adjoint operator is denoted by L ∗h and the norm induced by its inner product is denoted as ||L h || = sup{||L h x||2 : ||x||2 = 1} < +∞. Furthermore, the optimal solution to the problem Eq. (5.1) is assumed to exist. It is important to note that given a nonempty set Ω in X, by defining its indicator function ιΩ (see Chap. 1 for its definition), one can transform the constrained optimization problem min g(x) s.t. x ∈ Ω into the unconstrained optimization problem x
min g(x) + ιΩ . Thus problem Eq. (5.1) is sufficiently general for inverse problems x that can be modeled as of the type of Eqs. (1.6) or (1.7), such as image deblurring, inpainting, compressed sensing, and segmentation. In practice, the difficulty in solving problem Eq. (5.1) arises from several aspects. First, the image data spaces X and V h are usually high-dimensional; second, L h is usually massive and non-sparse; in addition, the function e and the linear operator coupling f h may be non-differentiable. As a result, many traditional algorithms, such as the gradient method, cannot be used to solve problem Eq. (5.1). Operator splitting methods provide a feasible way to solve the problem Eq. (5.1), and this class of methods usually achieves the solution of the problem by mining the first-order information of the objective function. They can decompose a complex problem into several sub-problems that are easier to solve, thus deriving some feasible algorithms.
5.2.2 Augmented Lagrangian Function with Saddle Point Condition According to Fermat’s rule, the following lemma about the solution of problem Eq. (5.1) can be obtained. Lemma 5.1 The solution x* of the problem Eq. (5.1) satisfies
110
5 Parallel Alternating Derection Method of Multipliers with Application … H ( ) ( ) ∑ 0 ∈ ∂e x ∗ + L ∗h ∂ f h L h x ∗ ,
(5.2)
h=1
where ∂ f h (L h x ∗ ) denotes the subdifferential of f h at L h x ∗ . By variable splitting, a set of auxiliary variables a1 ∈ V1 , . . . , a H ∈ VH can be introduced to replace L 1 x, . . . , L H x as variables of f 1 , . . . f H , thus transforming the problem Eq. (5.1) into the following optimization problem with linear constraints min e(x) + x∈X
H ∑
f h (a h ) s.t. a1 = L 1 x, . . . , a H = L H x.
(5.3)
h=1
The augmented Lagrangian function of problem Eq. (5.3) is defined as LA (x, a1 , . . . , a H ; v 1 , . . . , v H ) ) H ( ∑ βh f h (a h ) + ⟨v h , L h x − a h ⟩ + ||L h x − a h ||22 , = e(x) + 2 h=1
(5.4)
where v 1 ∈ V1 , . . . , v H ∈ VH are dual variables (or Lagrange multipliers) and β1 , . . . , β H are positive penalty parameters. In Eq. (5.4), a weighting idea is used, i.e., each linear constraint corresponds to a specific βh . Although this approach complicates the derivation of the algorithm, in practice it may significantly improve the actual convergence rate of the algorithm, the reason for this approach has been discussed ( in detail in Chap. 4 ) Call x ∗ , a∗1 , . . . , a∗H ; v ∗1 , . . . , v ∗H be the saddle point of the problem Eq. (5.4), then ( ) ( ) LA x ∗ , a∗1 , . . . , a∗H ; v 1 , . . . , v H ≤ LA x ∗ , a∗1 , . . . , a∗H ; v ∗1 , . . . , v ∗H ( ) ≤ LA x, a1 , . . . , a H ; v ∗1 , . . . , v ∗H ∀(x, a1 , . . . , a H ; v 1 , . . . , v H ) ∈ X × V1 × · · · × VH × V1 × · · · × VH .
(5.5)
The relationship between the solution of problem Eq. (5.1) and the saddle point condition Eq. (5.5) can be given by the following theorem, a detailed proof of which is given in this chapter. Theorem 5.1 x ∗ ∈ X is a solution to the problem ( Eq. (5.1) when and only )when there exist a∗h , v ∗h ∈ Vh , h = 1, . . . , H such that x ∗ , a∗1 , . . . , a∗H ; v ∗1 , . . . , v ∗H is a saddle point of the augmented Lagrangian problem Eq. (5.4). ( ) Proof Let x ∗ , a∗1 , . . . , a∗H ; v ∗1 , . . . , v ∗H satisfy the saddle point condition in Eq. (5.5), then H ∑ ⟨ h=1
H ⟩ ∑ ⟨ ∗ ⟩ v h , L h x ∗ − a∗h ≤ v h , L h x ∗ − a∗h ∀v h ∈ Vh , h = 1, . . . , H. h=1
(5.6)
5.2 Parallel Alternating Direction Method of Multipliers
111
Setting v h = v ∗h , h = 2, . . . , H in the above equation yields ⟨
⟩ ⟨ ⟩ v 1 , L 1 x ∗ − a∗1 ≤ v ∗1 , L 1 x ∗ − a∗1 ∀v 1 ∈ V1 .
(5.7)
Inequality (5.7) shows that L 1 x ∗ − a∗1 = 0 holds, and similarly, it follows that L h x ∗ − a∗h = 0,
h = 2, . . . , H.
(5.8)
This conclusion and the second inequality in Eq. (5.5) show that H ( ) ( ) ∑ e x∗ + f h L h x ∗ ≤ e(x) h=1
+
H ( ∑ h=1
⟩ βh ⟨ f h (a h ) + v ∗h , L h x − a h + ||L h x − a h ||22 2
)
∀x ∈ X, a h ∈ Vh , h = 1, . . . , H.
(5.9)
Substituting L h x − a h = 0, h = 1, . . . , H into the above equation yields H H ∑ ( ) ∑ ( ) e x∗ + f h L h x ∗ ≤ e(x) + f h (L h x). h=1
(5.10)
h=1
Inequality (5.10) shows that x* is a solution to the problem Eq. (5.1). Conversely, assume that x* ∈ X is a solution to the problem Eq. (5.1) such that ∗ a h = L h x ∗h , h = 1, . . . , H . By Lemma 5.1, it follows there exists v ∗h such that) ( that ∗ ∗ ∗ ∗ v h = ∂ f h (L h x ), h = 1, . . . , H . Next we prove that x , a1 , . . . , a∗H ; v ∗1 , . . . , v ∗H is the saddle point of Eq. (5.4). Since a∗h = L h x ∗h holds, the first inequality in Lagrangian function Eq. (5.4), Eq. (5.5) holds. From ( the definition of the augmented ) it follows that LA x, a1 , . . . , a H ; v ∗1 , . . . , v ∗H are proper, coercive, and continuous convex functions with respect to the variables x, a1 , . . . , a H , respectively. By Lemma 4.2 it has a minimum point ( x˜ , a˜ 1 , . . . , a˜ H ) in X × V1 × · · · × VH if and only if e(x) − e( x˜ ) +
/ H ∑(
L ∗h v ∗h
+
βh L ∗h (L h x˜
\ ) − a˜ h ) , x − x˜ ≥ 0 ∀x ∈ X,
(5.11)
h=1
⟩ ⟨ f h (a h ) − f h ( a˜ h ) + −v ∗h + βh ( a˜ h − L h x˜ ), a h − a˜ h ≥ 0 ∀a h ∈ Vh , h = 1, . . . , H.
(5.12)
On the one hand, substituting x˜ = x ∗ and a˜ h = a∗h into Eq. (5.11) according to the above assumption about v ∗h , there must be (x* , a∗1 , …, a∗H ) satisfing Eq. (5.11). On the other hand, by Lemma 5.1 and a∗h = L h x ∗h , it follows that 0 ∈ ∂e(x ∗ ) + ( ∗) ∑H ∗ ˜ = x ∗ and a˜ h = a∗h into Eq. (5.12), we get that h=1 L h ∂ f h a h . Substituting x
112
5 Parallel Alternating Derection Method of Multipliers with Application …
Eq. (5.12) is equivalent to ( ) ⟨ ⟩ f h (a h ) − f h a∗h − v ∗h , a h − a∗h ≥ 0 ∀a h ∈ Vh , h = 1, . . . , H,
(5.13)
i.e., ( ) ⟨ ( ) ⟩ f h (a h ) − f h a∗h − ∂ f h L h x ∗ , a h − a∗h ≥ 0 ∀a h ∈ Vh , h = 1, . . . , H. (5.14) The left-hand side of the inequality (5.14) is the Bregman distance, which by its definition is non-negative. Therefore (x* , a∗1 , …, a∗H ) similarly satisfies Eq. (5.12). Then, the second inequality in Eq. (5.14) holds. Theorem 5.1 is proved.
5.2.3 Algorithm Derivation To simplify the convergence analysis of the proposed algorithm, Eq. (5.4) is rewritten as ( ) √ √ v1 vH ' L A x, β1 a1 , . . . , β H a H ; √ , . . . , √ β1 βH \ H ( ) / v (√ ∑ √ √ h = e(x) + f h βh a h + √ , βh L h x − βh a h βh h=1 ) || || √ √ 1 || ||2 (5.15) + || βh L h x − βh a h || . 2 2 Note that a=
) (√ √ β1 a1 , . . . , β H a H ∈ V ≜ V1 × · · · × VH , ) v1 vH ∈ V, √ ,..., √ β1 βH ) (√ √ β1 L 1 x, . . . , β H L H x ∈ V, Lx =
(5.16)
(
v=
f (a) =
H ∑ h=1
f h (a h ) =
H ∑
fh
(√
) βh a h .
(5.17) (5.18)
(5.19)
h=1
For the adjoint operator L* of L, there must be L ∗ v = L ∗1 v 1 + · · · + L ∗H v H =
H ∑ h=1
L ∗h v h .
(5.20)
5.2 Parallel Alternating Direction Method of Multipliers
113
Combining the above notation, the augmented Lagrangian function Eq. (5.4) can be reconstructed as 1 L A (x, a; v) = e(x) + f (a) + ⟨v, L x − a⟩ + ||L x − a||22 . 2
(5.21)
The saddle point condition of the augmented Lagrangian function Eq. (5.21) can be described as ( ( ) ( ) ) LA x ∗ , a∗ ; v ≤ LA x ∗ , a∗ ; v ∗ ≤ LA x, a; v ∗ ∀(x, a; v) ∈ X × V × V.
(5.22)
Clearly, this saddle point condition is equivalent to the saddle point condition Eq. (5.5) of the augmented Lagrangian function Eq. (5.4). Applying the ADMM iterative framework to Eq. (5.21) yields ⎧ ⎪ ⎪ x k+1 = arg min e(x) + ⎪ ⎪ ⎪ x ⎨ k+1
|| 1 || || L x − ak + v k ||2 ; 2 2 ||2 ( ) 1 || = arg min f (a) + || L x k+1 − a + v k ||2 = prox f L x k+1 + v k ; 2 a
a ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ k+1 v = v k + L x k+1 − ak+1 .
(5.23)
Expanding the iterative framework (5.22) yields the following iterative rule ⎧ || ||2 H ⎪ ∑ v kh || βh || ⎪ k+1 k ⎪ || || ⎪ x = arg min e(x) + ⎪ || L h x − a h + β || ; ⎪ 2 x ⎪ h 2 ⎨ h=1 ( k) v k+1 k+1 ⎪ ⎪ h = 1, . . . , H ; + h ⎪ a h = prox fh / βh L h x ⎪ βh ⎪ ⎪ ( ) ⎪ k+1 ⎩ v h = v kh + βh L h x k+1 − ak+1 h = 1, . . . , H. h
(5.24)
Algorithm 5.1 summarizes the first parallel ADMM algorithm based on (5.24). Algorithm 5.1 Parallel Alternating Direction Method of Multipliers (PADMM1) Step 1: Initialize x0 ,a0h ,v 0h to 0, set k = 0 and β h > 0, h = 1, …, H. Step 2: Determine whether the termination condition is met; if not, perform the following steps. Step 3: Execute iterative rule (5.24). Step 4: k = k + 1. Step 5: End the loop and output xk+1 . The updates of auxiliary variables a1 , . . . , a H in Algorithm 5.1 are independent of each other, and similarly, the dual variables v 1 , . . . , v H have the same relationship,
114
5 Parallel Alternating Derection Method of Multipliers with Application …
so it is called “parallel”. Although Algorithm 5.1 is highly parallel, its structure can be further optimized by eliminating auxiliary variables. According to the Moreau decomposition, the proper convex function f is related to its Fenchel conjugate f ∗ as follows ( ) v . (5.25) v = proxβ f ∗ v + βprox f / β β In the iterative rule (5.24), if we consider v kh + β h Lxh k+1 as a whole, then it is found and ak+1 constitute a pair of Moreau decompositions. By that the iterations on v k+1 h h placing the update of x after the update of vH (an operation that is found not to break the existing order of variable updates if two iterations are examined consecutively), Eq. (5.24) can be transformed into the following iterative rule by eliminating auxiliary variables. ⎧ k+1 ( ) v h = proxβh fh∗ βh L h x k + v kh , h = 1, . . . , H ; ⎪ ⎪ ⎨ H (5.26) || 1 ∑ || || L h x − L h x k + 2v k+1 − v k ||2 . ⎪ arg min e(x) + ⎪ h 2 h ⎩ x 2 h=1
According to the iterative rule (5.26), the following PADMM algorithm containing only the original variables and the dual variables can be obtained. Algorithm 5.2 Equivalent Parallel Alternating Direction Method of Multipliers (PADMM2) Step 1: Initialize x0 , v 0h to 0, set k = 0 and β h > 0, h = 1, …, H. Step 2: Determine whether the termination condition is met; if not, perform the following steps. Step 3: Execute iterative rule (5.26). Step 4: k = k + 1. Step 5: End the loop and output xk+1 . Remark 5.1 Compared to Algorithms 5.1 and 5.2 is more compact due to the elimination of the auxiliary variables a1 , . . . , a H . Although Algorithms 5.1 and 5.2 are equivalent, the lower single-step computational complexity and more balanced loading of variables make Algorithm 5.2 more suitable for parallel computation. Theoretically, Algorithm 5.2 can be viewed as a direct algorithm for solving the saddle point of the Lagrangian function of problem Eq. (5.1). The Lagrangian function for problem Eq. (5.1) is LA (x; v 1 , . . . , v H ) = e(x) +
H ∑ ( ) ⟨L h x, v h ⟩ − f h∗ (v h ) . h=1
(5.27)
5.3 Convergence Analysis
115
Remark 5.2 The update of the original variable x in Algorithms 5.1 and 5.2 does not look easy. However, several special cases for which closed-form solutions can be derived are of interest, and are general enough for image inverse problems in practice. First, if e(x) = 0, then the update of x in Algorithms 5.1 and 5.2, respectively, has the least-square forms.
x
k+1
=
( H ∑
)−1 βh L ∗h L h
h=1
H ∑
( ) L ∗h βh akh − v kh ,
(5.28)
( ) . L ∗h βh L h x k + v kh − 2v k+1 h
(5.29)
h=1
and x
k+1
=
( H ∑
)−1 βh L ∗h L h
h=1
H ∑ h=1
Second, if e(x) has a quadratic form, then the update of x in Algorithms 5.1 and 5.2 still has least-square form. Third, if e(x) is only proximity-approachable, i.e., its proximity operator exists in closed-form or can be solved easily, then the sub-steps on x in Algorithms 5.1 and 5.2 can be linearized so that the update of x can be achieved with the help of the proximity operator of e(x). The third case involves additional convergence conditions and its solution is discussed in detail in Chap. 6.
5.3 Convergence Analysis This section proves that the iteration sequence generated by the proposed algorithm starting at any initial value converges to the saddle point of the augmented Lagrangian ( / ) function Eq. (5.4) and that the proposed algorithm has a convergence rate of O 1 k in the worst-case. The convergence analysis in this section is based on Algorithm 5.1, but by virtue of the equivalence of Algorithms 5.1 to 5.2, the analysis is equally applicable to Algorithm 5.2. In addition, the convergence analysis of the algorithm is inspired by [23], where the basic tool is the variational inequality.
5.3.1 Convergence Proof According to Lemma 4.2, the problem Eq. (5.3) and the augmented Lagrangian function Eq. (5.21) can be described by the following variational inequality problem: find (x ∗ , a∗ , v ∗ ) ∈ X × V × V such that
116
5 Parallel Alternating Derection Method of Multipliers with Application …
( ∗) ⟨ ⟩ ⎧ ∗ ∗ ∗ ⎪ ⎨ e(x) − e (x )+ x⟨ − x , L v ⟩ ≥ 0; f (a) − f a∗ + a − a∗ , −v ∗ ≥ 0; ⎪ ⟩ ⎩⟨ v − v ∗ , −L x ∗ + a∗ ≥ 0.
(5.30)
Letting y = (x, a, v) ∈ Y ≜ X × V × V and F( y) = (L ∗ v, −v, −L x + a) ∈ Y , Eq. (5.30) can be transformed into ( ) ⟨ ( ) ( )⟩ VI(Y, F, f ) : e(x) + f (a) − e x ∗ − f a∗ + y − y∗ , F y∗ ≥ 0 ∀ y ∈ Y. (5.31) Let the solution set of the problem VI(Y, F, f ) be Y ∗ , i.e., the set of all (x ∗ , a∗ , v ∗ ). Furthermore, define z = (a, v) ∈ Z ≜ V × V and denote Z* for the set of all ⟨z* = (a* , v* ). It is ( easy )⟩ to verify that the linear map F (y) is monotone, i.e., y − y' , F( y) − F y' ≥ 0 ∀ y, y' ∈ Y . Lemma 5.2 gives the contractibility of the sequence produced by Algorithm 5.2. { k k } x , a1 , . . . , akH ; v k1 , . . .{, v kH} Lemma 5.2 Let be the { k ksequence } k z = a ,v = 5.1, then generated by Algorithm / / } {√ √ √ √ β1 ak1 , . . . , β H akH , v k1 β1 , . . . , v kH β H satisfies || k+1 ||2 || ||2 || ||2 || z − z ∗ ||2 ≤ || z k − z ∗ ||2 − || z k+1 − z k ||2 ∀z ∗ ∈ Z ∗ .
(5.32)
Proof Since ak+1 is a solution of the minimization problem in Eq. (5.23) with respect to a, by Lemma 4.2, we have ) ⟨ ⟩ ( f (a) − f ak+1 + a − ak+1 , ak+1 − L x k+1 − v k ≥ 0 ∀a ∈ V .
(5.33)
Substituting the third equation in Eq. (5.23) into Eq. (5.33) yields ) ⟨ ⟩ ( f (a) − f ak+1 + a − ak+1 , −v k+1 ≥ 0 ∀a ∈ V.
(5.34)
By the same token, it follows that ⟩ ( ) ⟨ f (a) − f ak + a − ak , −v k ≥ 0 ∀a ∈ V.
(5.35)
Substituting a = ak and a = ak+1 into Eqs. (5.34) and (5.35), respectively, and adding them together, we get ⟨
⟩ ak − ak+1 , v k − v k+1 ≥ 0.
(5.36)
5.3 Convergence Analysis
117
From the optimality condition for the subproblem on x in Eq. (5.23), we know that ) ⟨ ( )⟩ ( e(x) − e x k+1 + x − x k+1 , L ∗ L x k+1 − ak + v k ≥ 0 ∀x ∈ X.
(5.37)
Substituting the third equation in Eq. (5.23) into Eq. (5.37) yields ) ⟨ ( )⟩ ( e(x) − e x k+1 + x − x k+1 , L ∗ v k+1 + ak+1 − ak ≥ 0 ∀x ∈ X.
(5.38)
The third equation in the iterative rule Eq. (5.23) shows that ⟨
⟩ v − v k+1 , −L x k+1 + ak+1 + v k+1 − v k = 0 ∀v ∈ V .
(5.39)
Adding Eqs. (5.34), (5.38), and (5.39), we get ) ( ) ⟨ )⟩ ( ( e(x) + f (a) − e x k+1 − f ak+1 + y − yk+1 , F yk+1 ⟨ ( )⟩ + x − x k+1 , L ∗ ak+1 − ak ⟨ ⟩ + v − v k+1 , v k+1 − v k ≥ 0 ∀ y ∈ Y.
(5.40)
Inequality (5.40) shows that if || k+1 ||2 || ||2 || ||2 || z − z k ||2 = || ak+1 − ak ||2 + ||v k+1 − v k ||2 = 0
(5.41)
) ( ) ( e(x) + f (a) − e x k+1 − f ak+1 ⟨ )⟩ ( + y − yk+1 , F yk+1 ≥ 0 ∀ y ∈ Y.
(5.42)
holds, then
That is, yk+1 is a solution to problem VI(Y, F, f ) and (xk+1 , ak+1 ; vk+1 ) is the saddle point of the augmented Lagrangian function (5.21). Substituting y = y* into Eq. (5.40) yields ⟨
⟩ ⟨ ⟩ a∗ − ak+1 , ak+1 − ak + v ∗ − v k+1 , v k+1 − v k ( ) ( ) ( ) ( ) ≥ e x k+1 + f ak+1 − e x ∗ − f a∗ ⟨ )⟩ ⟨ ⟩ ( + yk+1 − y∗ , F yk+1 + a∗ − ak+1 , ak+1 − ak ⟨ ( ) ⟩ − L x ∗ − x k+1 , ak+1 − ak .
(5.43)
Since y* is an optimal solution to the problem VI(Y, F, f ), it follows that ) ( ) ( ) ( ) ⟨ ( ( )⟩ e x k+1 + f ak+1 − e x ∗ − f a∗ + yk+1 − y∗ , F y∗ ≥ 0.
(5.44)
118
5 Parallel Alternating Derection Method of Multipliers with Application …
From the monotonicity of F, it follows that ⟨
)⟩ ⟨ ( ( )⟩ yk+1 − y∗ , F yk+1 ≥ yk+1 − y∗ , F y∗ .
(5.45)
From Eq. (5.36), it follows that ⟨
⟩ ⟨ ( ) ⟩ a∗ − ak+1 , ak+1 − ak − L x ∗ − x k+1 , ak+1 − ak ⟨ ⟩ ⟨ ⟩ = L x k+1 − ak+1 , ak+1 − ak = v k+1 − v k , ak+1 − ak ≥ 0.
(5.46)
Combining Eqs. (5.43) to (5.46) yields ⟨
⟩ ⟨ ⟩ ⟨ ⟩ a∗ − ak+1 , ak+1 − ak + v ∗ − v k+1 , v k+1 − v k = z ∗ − z k+1 , z k+1 − z k ≥ 0. (5.47) So there are || || || || k || z − z ∗ ||2 = || z ∗ − z k+1 + z k+1 − z k ||2 2 2 || ∗ || k+1 || || k+1 ||2 k ||2 || || z = z −z + − z 2 2 ⟩ ⟨ + 2 z ∗ − z k+1 , z k+1 − z k ||2 || ||2 || ≥ || z ∗ − z k+1 || + || z k+1 − z k || . 2
2
(5.48)
Lemma 5.2 is proved. {|| ||2 } Lemma 5.2 shows that the bounded nonnegative sequence || z k − z ∗ ||2 is nonincreasing, so it must have a limit, and hence, as k → + ∞, there must be || k+1 ||2 || z − z k ||2 → 0. From Eq. (5.42), if k → + ∞, {yk+1 } converges to the solution of VI(Y, F, f ) and sequence {xk+1 , ak+1 ; vk+1 } converges to the saddle point of Eq. (5.21). Thus, by the equivalence of Eqs. (5.4) and (5.21), the following theorem is obtained. { } Theorem 5.2 The sequence x k , ak1 , . . . , akH ; v k1 , . . . , v kH generated by Algorithm 5.1 converges to the saddle point of the augmented Lagrangian function Eq. (5.4), and in particular, {xk } converges to a solution of problem Eq. (5.1).
5.3.2 Convergence Rate Analysis This subsection analyzes the convergence rate of Algorithm{5.1. It begins by giving || k+1 || } k ||2 || −z 2 . Lemma 5.3 which shows the monotonicity of the sequence z Lemma generated
{ k k } x , a1 , . . . , akH ; v k1 , . . .{, v kH} 5.3 Let zk by Algorithm 5.1, then
be =
the { k ksequence } a ,v =
5.3 Convergence Analysis
{√
β1 ak1 , . . . ,
√
β H akH , v k1
119
/√ /√ } β1 , . . . , v kH β H satisfies
|| k+1 ||2 || ||2 || z − z k ||2 ≤ || z k − z k−1 ||2
∀k ≥ 1.
(5.49)
k k+1 k+1 k k+1 k ,a − ak + v k = v k+1 + ak+1 − Proof Note ) x = xk ( k = ka k ),v = L x ( k that k k k a ,z = a , v and y = x , a , v . Define the linear operator M : (a, v) → (a, −a + v). Then we have
) ( z k+1 = z k − M z k − z k .
(5.50)
Based on the above notation, inequality (5.40) can be transformed into ( ) ⟨ ( )⟩ ( ) e(x) + f (a) − e x k − f ak + y − yk , F yk ⟨ )⟩ ( + z − z k , M z k − z k ≥ 0 ∀ y ∈ Y.
(5.51)
By the same token, it follows that ) ( ) ⟨ )⟩ ( ( e(x) + f (a) − e x k+1 − f ak+1 + y − yk+1 , F yk+1 ⟨ ( )⟩ + z − z k+1 , M z k+1 − z k+1 ≥ 0 ∀ y ∈ Y.
(5.52)
Substituting y = yk+1 and y = yk into Eqs. (5.51) and (5.52), respectively, and adding them together yields ) ( )⟩ ( ⟨ − yk+1 − yk , F yk+1 − F yk ⟨ (( ) ( ))⟩ + z k+1 − z k , M z k+1 − z k − z k+1 − z k ≥ 0.
(5.53)
From the monotonicity of F, it follows that ⟨
(( ) ( ))⟩ z k+1 − z k , M z k+1 − z k − z k+1 − z k ≥ 0.
(5.54)
⟨( k+1 ) ( ) (( ) ( ))⟩ z Adding − z k − z k+1 − z k , M z k+1 − z k − z k+1 − z k to both ⟨ ⟩ sides of Eq. (5.54) and considering the Eq. ⟨z, M z⟩ = 21 z, (M + M ∗ )z (note that ⟨z, Q z⟩ = ||z||2Q , if Q is semi-positivly definite), it follows that ⟨
) ( ))⟩ (( z k+1 − z k , M z k+1 − z k − z k+1 − z k ) ( )||2 1 ||( ≥ || z k+1 − z k − z k+1 − z k || M+M ∗ . 2
(5.55)
120
5 Parallel Alternating Derection Method of Multipliers with Application …
From Eq. (5.50), it follows that ⟨
) (( ) ( ))⟩ ( M z k − z k , M z k − z k − z k+1 − z k+1 ) ( )||2 1 ||( ≥ || z k − z k − z k+1 − z k+1 || M+M ∗ . 2
(5.56)
Therefore, there are || ( )||2 || ( )||2 || || || || ||M z k − z k || − ||M z k+1 − z k+1 || 2 2 ⟨ ( ) (( ) ( ))⟩ || (( ) ( ))||2 || || k k k k k+1 =2 M z −z ,M z −z − z − z k+1 − ||M z k − z k − z k+1 − z k+1 || 2 ||( || (( ) ( )||2 ) ( ))|| || || || k − z k − z k+1 − z k+1 ||2 ≥ || z k − z k − z k+1 − z k+1 || − z || ||M M+M ∗ 2 ||( ) ( )||2 || k || k k+1 k+1 ≥ 0. = || z − z − z −z (5.57) || ( M+M ∗ −M ∗ M )
|| || ||2 ||2 Again using Eq. (5.50) we get || z k+1 − z k ||2 ≤ || z k − z k−1 ||2 . Lemma 5.3 is proved. { k k } x , a1 , . . . , akH ; v k1 , . {. . , }v kH Theorem 5.3 Let be the{ sequence } k k k z = a 5.1, then , v = byAlgorithm / / } {generated √ √ √ √ β1 ak1 , . . . , β H akH , v k1 β1 , . . . , v kH β H satisfies || ||2 || k+1 ||2 || z 0 − z ∗ ||2 k || z ∀z ∗ ∈ Z ∗ . − z ||2 ≤ k+1 ( / ) That is, Algorithm 5.1 has a worst-case O 1 k convergence rate.
(5.58)
Proof From Eq. (5.32), it follows that ∞ ∑ || || i+1 || || || z − z i ||2 ≤ || z 0 − z ∗ ||2 ∀z ∗ ∈ Z ∗ . 2 2
(5.59)
i=0
From Lemma 5.3 it follows that k || ||2 ∑ || i+1 || || || || z − z i ||2 ≤ || z 0 − z ∗ ||2 ∀z ∗ ∈ Z ∗ . (k + 1)|| z k+1 − z k ||2 ≤ 2 2 i=0
Theorem 5.3 is proved.
(5.60)
5.4 Application of PADMM to TGV/Shearlet Compound Regularized …
121
5.4 Application of PADMM to TGV/Shearlet Compound Regularized Image Restoration This section deals the implementation of PADMM applied to the solution of compound l 1 -regularized inverse problems. The chosen compound regularizer incorporates two state-of-the-art regularization tools: the total generalized variation [8] (TGV) and the shearlet transform [24]. Like the TV model, the TGV model imposes a constraint on the smoothness in the space domain, while the shearlet transform imposes a constraint on the sparsity of the image in the shearlet transform domain. It is worth mentioning that PADMM is not only applicable to regularization methods based on variational or frame theory. As a generalization of the TV model, the TGV model introduces a constraint on the higher-order derivatives of the image function, and thus, it is better able to strike a balance between image edge preservation and staircasing effects suppression. This strategy is also adopted [3–13] by some other regularization tools based on variational partial differentiation. Compared to the conventional wavelet transform, the shearlet transform is able to better characterize anisotropic information in images, such as image edges and curves. It can be expected that the organic combination of TGV and shearlet transform can provide stronger assurance for image detail preservation. Although PADMM can be applied to higher-order TGV models, for simplicity, this chapter considers only second-order TGV model, which is sufficient in most practical applications. The shearlet transform used here is FFST [24], whose shearlets are finitely supported in the frequency domain, i.e., they have limited bandwidth. The generic model used for image restoration developed in this chapter is (
N ∑ ) ||SHr (u)||1 u∗ , p∗ = arg min α1 ||∇u − p||1 + α2 ||E p||1 + α3 u, p
r =1
{
}{
} s.t.u ∈ ψ ≜ u : |K u − f |22 ≤ c u : |K u − f |1 ≤ c .
(5.61)
In the minimization model Eq. (5.61), u, f ∈ Rmno denote the vector representations of the original and observed images, respectively, which both have support domains of size m × n × o; the first two l1 terms form the second-order TGV model TGV2α , which degenerates to a TV model when α 2 = 0 and p = 0 (p is a variable introduced by the second-order TGV model); ∇ is a first-order difference operator, while ε is a symmetric difference operator; SHr (u) ∈ Rmno in the third l 1 -term is the rth non-downsampled shearlet transform subband of u. The total number of transform subbands N is determined by the number of transform layers; α 1 , α 2 , and α 3 are pre-determined weights, which serve to balance the three l1 -norm terms; and ψ is a data fidelity constraint. In this chapter, ψ has two forms, where the l2 -form corresponds to Gaussian noise and the l1 -form corresponds to impulse noise. Depending on the image degradation mechanism, the degradation matrix K has different forms: if the degradation is image blurring, K is a convolution matrix; if the degradation is pixel loss, K is a diagonal selection matrix (whose elements are 1 or 0); if the
122
5 Parallel Alternating Derection Method of Multipliers with Application …
problem is MRI reconstruction, K is the product of a diagonal selection matrix and the two-dimensional Fourier transform matrix. In TGV2α , α1 ||∇u − p||1 represents the restriction on discontinuous elements, while(α2 ||E p||1( represents the))restriction for smooth slope regions. Let p ∈ Rmno × Rmno pi, j,l = pi, j,l,1 , pi, j,l,2 , then (E p)i, j,l , 1 ≤ i ≤ m, 1 ≤ j ≤ n, 1 ≤ l ≤ o is given by the following equation [
(E p)i, j,l
(E p)i, j,l,1 (E p)i, j,l,3 = (E p)i, j,l,3 (E p)i, j,l,2
[
] =
∇1 pi, j,l,1
∇2 pi, j,l,1 +∇1 pi, j,l,2 2
∇2 pi, j,l,1 +∇1 pi, j,l,2 2
∇2 pi, j,l,2
] . (5.62)
The l 1 -norm of p and E p are defined respectively as [ m,n m,n |∑ ) ∑ ∑ | o ( || || √ || pi, j || = || p||1 = pi,2 j,l,1 + pi,2 j,l,2 , 2 i, j=1
i, j=1
(5.63)
l=1
[ m,n m,n |∑ ∑ ∑ | o ( || || ) √ ||(E p)i, j || = ||E p||1 = (E p)i,2 j,l,1 + (E p)i,2 j,l,2 + 2(E p)i,2 j,l,3 . 2 i, j=1
i, j=1
l=1
(5.64) The rth shearlet transform subband of u can be achieved by point-by-point product in the frequency domain according to the relevant part of Chap. 3. In this paper, the multichannel shearlet transform is performed in separate channels, which corresponds to a two-dimensional shearlet transform for each channel separately. To apply PADMM (Algorithm 5.2) to the solution of Eq. (5.61), the following assignments are made to the variables and operators: ( x ) = (u, p),e(x) = 0, f 1 (L 1 x) = α1 ||∇u − p||1 , f 2 (L 2 x) = α2 ||E p||1 , f 3 L 3,r x = α3 ||Sr u||1 , and f 4 (L 4 x) = ιψ (u). Besides, denote vˆ k+1 = 2v˜ k+1 − v kh . h h According to Algorithm 5.2 (PADMM2), the following PADMM algorithm for TGV/shearlet regularized image restoration can be obtained. Algorithm 5.3 Parallel alternating direction method of multipliers for TGV/Shearlet Regularization (PADMM-TGVS). Step 1: Set k = 0, x0 = 0, v 0h = 0, and β h > 0, h = 1, …, H. Step 2: Determine whether the termination conditions are met; if not, perform the following steps. Step 3: for i = 1, . . .(, m;( j = 1, . . . , n; l)= 1, . . . , )o ( k) k k Step 4:v k+1 1,i, j,l = PBα1 β1 ∇u i, j,l − pi, j,l + v 1,i, j,l . ( ( ) ) k k Step 5: v k+1 2,i, j,l = PBα2 β2 E p i, j,l + v 2,i, j,l . ( ( ) ) k+1 k k β r = 1, . . . , N . S Step 6: v3,r,i, = P u + v B 3 r α3 3,r,i, j,l j,l i, j,l Step 7: End the for loop.
5.4 Application of PADMM to TGV/Shearlet Compound Regularized …
123
Step 8: If the noise is (Gaussian noise, ) perform the next step. v k4 k+1 k √ Step 9. v 4 = β4 S c β4 + K u − f . Step 10: If the noise((is impulse noise,)the following steps are)) performed. ( k v4 v k4 k+1 k k Step 11: v 4 = β4 β4 + K u − f − Pc β4 + K u − f . ( )−1 N ∑ Step 12: uk+1 = uk − β1 ∇ ∗ ∇ + β3 Sr∗ Sr + β4 K ∗ K r =1 ( ) N ∑ ∗ k+1 ˆ4 ∇ ∗ vˆ k+1 + Sr∗ vˆ k+1 . 1 3,r + K v r =1 ( )]−1 ( ) [ ∇2∗ ∇2 k+1 k ∗ k+1 ∗ k+1 ∗ ˆ ˆ ∇ −ˆ v v v Step 13: pk+1 = p − β I + β ∇ + + ∇ + ∇ 1 2 1 1,1 1 1 2,1 2 2,3 . 1 1 2 ( ∗ )]−1 ( ) [ ∇1 ∇1 k+1 k ∗ ∗ k+1 ∗ k+1 ˆ ˆ −ˆ v v v + ∇ ∇ + ∇ + ∇ Step 14: pk+1 = p − β I + β 2 1 2 1,2 2 1 2,3 2 2,2 . 2 2 2 Step 15: k = k + 1. Step 16: End the loop and output uk+1 . In Algorithm 5.3, note that p1 and p2 are the sets of all pi, j,l,1 and pi, j,l,2 (1 ≤ i ≤ m, 1 ≤ j ≤ n, 1 ≤ l ≤ o), respectively. In the same manner, define v 1,1 and v 1,2 . v 2 has the same structure as E p, and note that v 2,1 ,v 2,2 ,and v 2,3 are the combinations of all v2,i, j,l,1 , v2,i, j,l,2 , and v2,i, j,l,3 respectively. PBα1 ,PBα2 , and PBα3 denote the two-dimensional, four-dimensional, and one-dimensional projection operators, respectively. S√c is an mno-dimensional contraction operator. The point-by-point operations of PBα and S√c are defined as ) (|| || ( ) q i, j,l PBα = q i, j,l = min ||q i, j ||2 , α || || , ||q i, j ||
(5.65)
( √ ) z S√c (z) = max ||z||2 − c, 0 . ||z||2
(5.66)
2
and
Pc is the projection operator onto the l1 -sphere, which is more difficult to implement than the operator projected onto the l2 -sphere, and this problem is solved in this book using the approach in [25], the basic implementation of which is shown in Sect. 2.4.4. The structure of Algorithm 5.3 is highly parallel and has closed-form solutions for each of its sub-steps. The subproblems on v1 , v2 , v3 , and v4 are independent of each other and can be implemented in parallel, and the subproblems on u, p1 , and p2 have similar properties. Furthermore, the subproblems on v1 , v2 , and v3 are again pixel-by-pixel. Thus, Algorithm 5.3 can be accelerated by parallel computing devices such as GPU. The non-downsampled shearlet transform in Algorithm 5.3 is the most time-consuming, so the overall computational complexity of the algorithm is mn log mn for an m × n image.
124
5 Parallel Alternating Derection Method of Multipliers with Application …
Remark 5.3 In Algorithm 5.3, Algorithm 5.2 is applied instead of the equivalent Algorithm 5.1 to solve Eq. (5.61) for two reasons. On the one hand, as mentioned above, Algorithm 5.2 has a more compact form; on the other hand, if Algorithm 5.1 is applied to the solution of Eq. (5.61), the updates of u, p1 , and p2 will be coupled together due to the special form of the TGV, and then Cramer’s law needs to be applied to achieve the decoupled updates of the three variables, and this strategy will significantly increase the single-step computational complexity of the algorithm and will somewhat destroy the parallelism of u, p1 and p2 updates. Remark 5.4 In Algorithm 5.3, the update method for the regularization parameter λ is not given explicitly. However, based on the constraint model Eq. (5.61), as described in Chap. 4, λ can be updated in closed-form, which is exactly the idea of parameter estimation based on Morozov’s discrepancy principle. On the one hand, if Eq. (1.6) has D(K u, f ) = ||K u − f ||22 ≤ c (by convention, at this point Eq. (1.7) should have D(K u, f ) = 21 ||K u − f ||22 ), i.e., the observed data contains Gaussian noise, then λ can be updated by the closed-form (same as in 4 chapter). λk+1
|| || β4 || f − K uk − = √ c
||
v k4 || β4 ||2
− β4 .
(5.67)
On the other hand, if D(K u, f ) = ||K u − f ||1 ≤ c, i.e., the observations contain impulse noise, then λ can be solved by the method in [25]. It is worth mentioning that none of the above methods requires the introduction of an inner iterative algorithm such as the Newton method. From this point of view, the algorithm PADMM-TGVS can contain the algorithm APE-ADMM of Chap. 4, which is fully equivalent to PADMM-TV.
5.5 Experimental Results This section sets up several experiments to verify the effectiveness of the proposed PADMM algorithm: grayscale/color image deblurring under Gauss/impulse noise, and MR image reconstruction from partial Fourier observations. The 6 test images of the experiments are given in Fig. 5.1. The Kodim 14 images are taken from Kodak’s online image database1 while the foot images are radial T1-weighted foot MR image data.2 The quality of the degraded and restored images is evaluated quantitatively by two metrics: peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM).
1 2
http://r0k.us/graphics/kodak/. http://www.mr-tip.com.
5.5 Experimental Results
(a) Lena (256×256)
(b) Barbara (512×512)
(e) Peppers (512×512)
125
(c) Man (1024×1024)
(d) Foot (512×512)
(f) Kodim14 (768×512)
Fig. 5.1 Test images
In the image deblurring experiments in this chapter, we set (α 1 , α 2 , α 3 ) = (1, 3, 0.1) for grayscale image and (α 1 , α 2 , α 3 ) = (2, 9, 0) for RGB image; for the MRI reconstruction problem, set (α 1 , α 2 , α 3 ) = (1, 3, 10). For image restoration under Gaussian noise, set β 1 = β 2 = β = β 3 = 1 and β 4 = (0.101BSNR−1) β 1 ; for image restoration under impulse noise, set β 1 = β 2 = β 3 = 0.1 and β 4 = 10(1LEVER)β 1 , where LEVER is the proportion of impulse noise. In addition, c = (1.09–0.006BSNR)mnσ 2 and c = (0.99–0.0009BSNR)mnoσ 2 are set for grayscale and color image restoration, respectively, where σ can be estimated by the median criterion based on the wavelet transform [26]. In the impulse noise condition, c is set to be the l1 -norm of the difference between noisy and non-noisy observations, which in practice needs to be estimated in advance. In cases where noise level estimation is not possible, c is chosen by trial-and-error. The shearlets are 2-layer transformed in the experiments, so N = 13|| [24] in Eq.|| (5.61). / || ||The termination criterion of the algorithms is set uniformly as ||uk+1 − uk ||2 ||uk ||2 ≤ 10−4 . By setting α 3 = 0, the regularizer of model Eq. (5.61) contains only TGV2α , and thus, Algorithm 5.3 is noted as PADMM-TGV; furthermore, if α 2 = 0 and p = 0 hold simultaneously, Algorithm 5.3 degenerates to PADMM-TV containing only the TV regularization term. In Table 5.2, the best results for each comparison metric are shown in bold form.
126
5 Parallel Alternating Derection Method of Multipliers with Application …
5.5.1 Grayscale Image Deblurring Experiment In this experiment, the images involved in the comparison are Lena, Barbara, and Man, and the following well-known algorithms participated in the comparison: the adaptive TV algorithm Wen-Chan [26], the wavelet transform-based image restoration algorithm Cai-Osher-Shen [27], and the TV algorithm with box constraint ChanTao-Yuan [28]. The first two algorithms participated in comparison under Gaussian noise, while the latter algorithm is involved in comparative experiments under impulse noise. The latter two algorithms mentioned above are both examples of applications of the ADMM algorithm, and a detailed description of the algorithm Wen-Chan has been given in the previous chapter. Unlike algorithms PADMM and Wen-Chan, both algorithms Cai-Osher-Shen and Chan-Tao-Yuan require multiple solutions to manually select the regularization parameter, which makes these two algorithms more time-consuming in practice. In addition, the algorithm Chan-TaoYuan uses box constraint on pixels, and the role of box constraint has been discussed in Chap. 4. In order to better compare the differences between the restored images, the restoration results are mostly localized and enlarged. Table 5.1 sets up three grayscale image deblurring problems in the context of Gaussian noise and impulse noise. Table 5.2 gives the comparison results of several algorithms, in terms of PSNR, SSIM, CPU time, and total number of iteration steps. The best result of each test item is highlighted in boldface. The following conclusions can be obtained from the results under Gaussian noise in Table 5.2. First, compared with other algorithms, PADMM-TGVS can obtain higher PSNR and SSIM, which is mainly due to the more complex TGV/ shearlet compound regularization model; second, compared with the other algorithms, PADMM-TV consumes the lowest CPU time; third, the non-adaptive CaiOsher-Shen based on wavelet transform can usually obtain higher SSIM than TV algorithms PADMM-TV and Wen-Chan, but it takes longer time; fourth, PADMMTV has a higher single-step execution efficiency compared to Wen-Chan. It should be emphasized that due to the high parallelism of the PADMM algorithm, the execution time of PADMM-TGVS and PADMM-TGV can be compressed significantly if distributed computation is performed with the help of parallel computing devices such as GPU. Table 5.1 Details of experimental settings for grayscale image deblurring
Blur kernel
Image
Gaussian noise
Impulse noise (%)
G(9, 3)
Lena
σ =2
50
A(9)
Barbara
σ =3
60
M(30, 30)
Man
σ =4
70
0.8049
27.46
27.38
Wen-Chan
Cai-Osher-Shen
24.13
23.95
23.87
23.97
24.02
0.6708
0.6639
0.6684
0.6751
0.6817
SSIM
26.88
7.63
5.64
10.57
40.91
Time
Barbara (22.39 dB, 0.5474) PSNR
0.8565
0.8538
0.8526
28.81
28.75
28.71
PADMM-TGV
PADMM-TV
Chan-Tao-Yuan
SSIM
0.8600
PSNR
28.97
算法
2.04
3.24
5.16
20.41
236
215
228
237
23.80
23.81
23.84
24.00
0.6708
0.6684
0.6751
0.6817
SSIM
7.98
17.26
28.83
102.75
Time
PSNR
Step
64
149
138
136
120
Step
Barbara (8.22 dB, 0.0122) Time
6.60
1.59
1.30
2.30
9.34
Time
Lena (9.87 dB, 0.0288)
0.8120
PADMM-TGVS
Impulse noise
0.8094
27.63
PADMM-TV
0.8196
0.8147
27.89
27.70
PADMM-TGVS
PADMM-TGV
SSIM
Lena (23.88 dB, 0.6841)
PSNR
Algorithm
Gaussian noise
191
202
216
220
Step
50
119
104
105
97
Step
0.6851
0.6765
0.6850
0.6859
0.6877
SSIM
25.22
25.20
25.32
25.41
PSNR
0.6541
0.6532
0.6544
0.6558
SSIM
Man (7.63 dB, 0.0070)
26.76
26.65
26.88
26.95
27.02
PSNR
46.55
80.86
146.41
441.38
Time
154.15
42.20
30.66
59.91
217.13
Time
Man (21.68 dB, 0.4681)
212
203
236
219
Step
70
150
121
127
120
Step
Table 5.2 The PSNR (dB), SSIM, CPU time (s), and total number of iteration steps of different algorithms in grayscale image deblurring experiments
5.5 Experimental Results 127
128
5 Parallel Alternating Derection Method of Multipliers with Application …
The restored Lena and Barbara images derived from different algorithms under Gaussian noise are given in Figs. 5.2 and 5.3, respectively. As can be seen from Figs. 5.2 and 5.3, PADMM-TGV can effectively suppress the staircasing effects prevalent in the restoration results of PADMM-TV and Wen-Chan; the edges in the restoration results of PADMM-TGVS are sharper and neater than those in the restoration results of PADMM-TGV; Cai-Osher-Shen does not introduce the staircasing effects, but edges in its results are not as sharp as those in the results of the other algorithms. Figures 5.4a and b give the curves of the PSNR relative to CPU time for different algorithms for Lena image and Barbara image restoration under Gaussian noise, respectively. It can be found that, first, the PSNR of the PADMM-TV algorithm rises and converges the fastest, due to its simple regularization model and compact structure without inner iterations. Second, because of the more compound regularization model, PADMM-TGVS takes longer time than the other algorithms. Since both PADMM-TV and Wen-Chan are TV algorithms and both enable adaptive estimation of the regularization parameter, Fig. 5.4c and d compare the curves of their regularization parameter relative to CPU time. Although the final regularization parameter of PADMM-TV and Wen-Chan are similar, the higher PSNR and SSIM indicate that PADMM-TV is able to find more accurate regularization parameter in a shorter time. The following conclusions can be drawn from the comparison results in Table 5.2 under the impulse noise condition. First, the PSNR and SSIM of PADMMTGVS are higher than the other algorithms, which again validates the advantages of TGV and shearlet compound regularization. Second, compared with the TV-based algorithm Chan-Tao-Yuan, PADMM-TV can obtain similar PSNR and SSIM when the algorithm is executed only once, but it consumes more time. The reason for this is that PADMM-TV contains a more complex l1 -projection problem when dealing with impulse noise, which is not present in Chan-Tao-Yuan. Nevertheless, it is important to emphasize that PADMM can automate the image restoration process provided that the noise level can be reasonably estimated. In contrast, the Chan-Tao-Yuan algorithm requires multiple executions to select better regularization parameter, and this manual parameter selection process is often more time-consuming than PADMM. Figure 5.5 gives the Man image restoration results of different algorithms under impulse noise. The results of PADMM-TV and Chan-Tao-Yuan are similar, PADMM-TGV can suppress the staircasing effects better, while PADMM-TGVS can obtain better results than PADMM-TGV.
5.5.2 RGB Image Deblurring Experiment In this subsection, two RGB image deblurring problems are designed in the context of Gaussian noise and impulse noise, respectively. The design of the background problem is given in Table 5.3. Two sets of blurs are generated as follows:
5.5 Experimental Results Degraded image, PSNR=23.88dB, SSIM=0.6841
PADMM-TGV, PSNR=27.70dB, SSIM=0.8147
Wen-Chan, PSNR=27.46dB, SSIM=0.8049
129 PADMM-TGVS, PSNR=27.89dB, SSIM=0.8196
PADMM-TV, PSNR=27.63dB, SSIM=0.8094
Cai-Osher-Shen, PSNR=27.38dB, SSIM=0.8120
Fig. 5.2 Restored Lena images of different algorithms under Gaussian blur G(9, 3) and Gaussian noise with σ = 2
130
5 Parallel Alternating Derection Method of Multipliers with Application … Degraded image, PSNR=22.39dB, SSIM=0.5474
PADMM-TGVS, PSNR=24.13dB, SSIM=0.6817
PADMM-TGV, PSNR=24.02dB, SSIM=0.6751
PADMM-TV, PSNR=23.97dB, SSIM=0.6684
Wen-Chan, PSNR=23.87dB, SSIM=0.6639
Cai-Osher-Shen, PSNR=23.95dB, SSIM=0.6708
Fig. 5.3 Restored Barbara images of different algorithms under average blur A(9) and Gaussian noise with σ = 3
5.5 Experimental Results
(a) PSNR relative to CPU time for Lena
(c) λ relative to CPU time for Lena
131
(b) PSNR relative to CPU time for Barbara
(d) λ relative to CPU time for Barbara
Fig. 5.4 Curves of PSNR and λ relative to CPU time of different algorithms in the Lena and Barbara restoration experiments under Gaussian noise
(i) Generate 9 blur kernels: {A(13), A(15), A(17), G(11, 9), G(21, 11), G(31, 13), M(21, 45), M(41, 90), M(61, 135)}; (ii) Assign the above 9 blur kernels to {K 11 , K 12 , K 13 ; K 21 , K 22 , K 23 ; K 31 , K 32 , K 33 }, where K ii is intra-channel blur while the rest is inter-channel blur; (iii) Multiply the above blur kernel by the weights {1, 0, 0; 0, 1, 0; 0, 0, 1} (blur 1) and {0.8, 0.1, 0.1; 0.2, 0.6, 0.2; 0.15, 0.15, 0.7} (blur 2) to obtain the final blur kernels. After blurring the Peppers image and the Kodim 14 image using the above blur kernels, Gaussian noise or impulse noise shown in Table 5.3 are applied to them to obtain the final observed images.
132
5 Parallel Alternating Derection Method of Multipliers with Application … Degraded image, PSNR=7.63dB, SSIM=0.0070
PADMM-TGV, PSNR=25.32dB, SSIM=0.6544
PADMM-TGVS, PSNR=25.41dB, SSIM=0.6558
PADMM-TV, PSNR=25.20dB, SSIM=0.6532
Chan-Tao-Yuan, PSNR=25.22dB, SSIM=0.6541
Fig. 5.5 Restored man images for different algorithms under M (30, 30) blur and 70% impulse noise
5.5 Experimental Results Table 5.3 Details of RGB image deblurring experiment settings
133
Blur Image
Gaussian noise Impulse noise (%)
1
Peppers, Kodim14 σ = 2
50
2
Peppers, Kodim14 σ = 6
80
It is worth mentioning that, few adaptive image deconvolution algorithms based on the Morozov’s discrepancy principle have been extended to multi-channel image processing. This is because, in multichannel image deblurring, the presence of interchannel blurring means that the blur matrix can only be non-exactly diagonalized by the FFT, and the Gaussian elimination method is subsequently required to complete the subsequent solving work of linear equation [29]. This procedure significantly limits the execution efficiency of many commonly used inner iterative algorithms such as the Newton method. In contrast, due to the fact that each subproblem can be solved analytically, the proposed algorithm PADMM-TGVS can be smoothly extended to multi-channel image deconvolution by borrowing the strategy employed in FTVD-v4. In this experiment, the classical FTVD-v4 is compared with PADMM. FTVDv4 requires manual selection of regularization parameter compared to the PADMM algorithm. Thanks to the more information provided by multiple channels, PADMMTGV is able to obtain similar results to PADMM-TGVS in the restoration of RGB image. Therefore, the algorithms that are compared with FTVD-v4 in this experiment include only PADMM-TGV and PADMM-TV. Table 5.4 gives the comparing results of several algorithms in four aspects: PSNR, SSIM, CPU time, and total number of iteration steps. The best result of each test item is highlighted in boldface. From Table 5.4, it can be seen that, first, PADMMTGV yields the highest PSNR and SSIM in comparison; second, PADMM-TGV and PADMM-TV are more time-consuming than FTVD-v4 in a single execution due to the l 2 (Gaussian noise) and l1 (impulse noise) projection problems involved. However, as with grayscale image deblurring, in practice FTVD-v4 may be more time consuming than PADMM due to the manual parameter selection process. Figures 5.6 and 5.7 further demonstrate the advantages of the TGV-based algorithm over the TVbased algorithm under Gaussian noise and impulse noise. From Figs. 5.6 and 5.7, it can be noticed that, there are staircasing effects appearing on the surface of the raft and pepper in the restoration results of PADMM-TV and FTVD-v4. In contrast, almost no staircasing exists in the restored images of PADMM-TGV.
5.5.3 MRI Reconstruction Experiment This subsection demonstrates the potential of the PADMM algorithm for magnetic resonance image (MRI) reconstruction, a well-known example of the compressed sensing techniques. MRI is a slow medical image acquisition process, and applying compressive sampling techniques to MRI can significantly reduce imaging scan
7.94
6.38
5.97
Kodim14
Peppers
Kodim14
8.36
Peppers
2
PSNR
Image
Blur
1
Degraded
19.75
Impulse noise
Kodim14
20.69
18.17
Peppers
Kodim14
2
20.51
Peppers
1
PSNR
Degraded
Image
Blur
Gaussian noise
0.0096
0.0127
0.0161
0.0231
SSIM
0.3882
0.5610
0.4246
0.6351
SSIM
0.5512
0.7691
0.6410
0.8125
SSIM
22.83
25.65
27.15
30.36
PSNR
0.5279
0.7747
0.7717
0.8652
SSIM
PADMM-TGV
23.50
25.83
25.12
28.00
PSNR
PADMM-TGV
186.76
107.12
174.29
87.56
Time
135.07
61.53
53.94
38.40
Time
209
185
198
154
Step
197
138
79
86
Step
0.5506
0.7657
0.6391
0.8082
SSIM
22.80
25.51
26.94
29.96
PSNR
0.5262
0.7679
0.7588
0.8582
SSIM
PADMM-TV
23.45
25.82
25.08
27.82
PSNR
PADMM-TV
130.46
75.89
124.76
58.03
Time
66.51
28.34
35.65
22.79
Time
197
177
194
140
Step
146
96
79
77
Step
SSIM
22.81
25.49
26.77
29.93
PSNR
0.5263
0.7652
0.7462
0.8550
SSIM
0.5493
0.7653
0.6377
0.8076
FTVD-v4
23.43
25.79
25.04
27.73
PSNR
FTVD-v4
82.74
41.28
86.39
37.89
Time
24.20
16.77
24.37
15.38
Time
Table 5.4 Comparison in terms of PSNR (dB), SSIM, CPU time (s) consumed, and number of iteration steps in RGB image deblurring experiment
193
150
200
138
Step
97
103
98
95
Step
134 5 Parallel Alternating Derection Method of Multipliers with Application …
5.5 Experimental Results
135
Degraded image, PSNR=20.69dB, SSIM=0.4246
PADMM-TGV, PSNR=25.12dB, SSIM=0.6410
PADMM-TV, PSNR=25.08dB, SSIM=0.6391
FTVD-v4, PSNR=25.04dB, SSIM=0.6377
Fig. 5.6 Restored Kodim14 images of different algorithms under Blur 1 and Gaussian noise with fσ =2
times and thus significantly cut medical expenses. The successful application of compressive sampling to MRI is due to two factors [30]: (i) medical images are usually sparsely encoded in some transform domain; and (ii) MRI scanning systems usually acquire coded sampled signals rather than direct point pixels. Similar to image deconvolution, MRI reconstruction is more often performed by nonlinear methods, due to the fact that linear methods usually have a large number of artifacts in the reconstruction results, which can seriously interfere the subsequent clinical lesion diagnosis [30]. In this experiment, PADMM was compared with the edge guided compressed sensing reconstruction method [31] (edge-CS) and the TV-based C-SALSA[32]. The Foot map as the original image has clear edge features and rich soft tissue structure. The background problem here is to recover the Foot image from 50 2-D Fourier radiation observation lines (sampling ratio of 10.64%) with a SNR of 40 dB for the observation data.
136
5 Parallel Alternating Derection Method of Multipliers with Application … Degraded image, PSNR=6.38dB, SSIM=0.0127
PADMM-TGV, PSNR=25.65dB, SSIM=0.7747
PADMM-TV, PSNR=25.51dB, SSIM=0.7679
FTVD-v4, PSNR=25.49dB, SSIM=0.7652
Fig. 5.7 Restored Peepers images of different algorithms under Blur 2 and and 80% impulse noise
Figures 5.8 give the local enlargements of the Foot images reconstructed by the different algorithms. It can be seen that, firstly, PADMM-TV can achieve higher PSNR and SSIM than edge-CS and C-SALSA. The reconstruction results of these three are similar, and all of them contain obvious staircasing effects. Second, PADMM-TGV can effectively suppress the staircasing effects present in the TVbased algorithms. However, it cannot reconstruct the texture details that are present in large amounts in the original image. Third, compared with other algorithms, PADMM-TGVS can obtain the highest PSNR and SSIM, and more valuable, it can reconstruct the skewed texture on bones and the finely varied parts between bones and soft tissues.
5.5 Experimental Results Local enlargement of Foot
137 PADMM-TGVS, PSNR=31.59dB, SSIM=0.8922
PADMM-TGV, PSNR=31.24dB, SSIM=0.8908
PADMM-TV, PSNR=30.16dB, SSIM=0.8852
Edge-CS, PSNR=29.61dB, SSIM=0.8839
C-SALSA, PSNR=29.63dB, SSIM=0.8809
Fig. 5.8 Reconstruction results of foot image with different algorithms when the number of sampled radiation lines in the frequency domain is 50
138
5 Parallel Alternating Derection Method of Multipliers with Application …
References 1. Chan T, Shen J (2005) Image processing and analysis: variational, PDE, wavelet, and stochastic methods. SIAM, Philadelphia 2. Guo WH, Qin J, Yin WT (2014) A new detail-preserving regularization scheme. SIAM J Image Sci 7(2):1309–1334 3. Chambolle A, Lions PL (1997) Image recovery via total variation minimization and related problems. Numer Math 76(2):167–188 4. Chan T, Marquina A, Mulet P (2000) Higher order total variation-based image restoration. SIAM J Sci Comput 22(2):503–516 5. Chan T, Esedoglu S, Park FE (2005) A fourth order dual method for staircase reduction in texture extraction and image restoration problems. UCLA CAM Report 05-28, UCLA, Los Angeles 6. Maso GD, Fonseca I, Leoni G, Morini M (2009) A higher order model for image restoration: the one-dimensional case. SIAM J Math Anal 40(6):2351–2391 7. Stefan W, Renaut RA, Gelb A (2010) Improved total variation-type regularization using higher order edge detectors. SIAM J Imag Sci 3(2):232–251 8. Bredies K, Kunisch K, Pock T (2010) Total generalized variation. SIAM J Imag Sci 3(3):492– 526 9. Bredies K, Dong Y, Hintermüller M (2013) Spatially dependent regularization parameter selection in total generalized variation models for image restoration. Int J Comput Math 90(1):109–123 10. Yang Z, Jacob M (2013) Nonlocal regularization of inverse problems: a unified variational framework. IEEE Trans Image Proc 22(8):3192–3203 11. Hu Y, Jacob M (2012) Higher degree total variation (HDTV) regularization for image recovery. IEEE Trans Image Proc 21(5):2559–2571 12. Hu Y, Ongie G, Ramani S, Jacob M (2014) Generalized higher degree total variation (HDTV) regularization. IEEE Trans Image Proc 23(6):2423–2435 13. Lefkimmiatis S, Ward JP, Unser M (2013) Hessian schatten-norm regularization for linear inverse problems. IEEE Trans Image Proc 22(5):1873–1888 14. Tian D, Xue D, Wang D (2015) A fractional-order adaptive regularization primal-dual algorithm for image denoising. Inf Sci 296:147–159 15. He N, Lu K, Bao B, Zhang L, Wang J (2014) Single-image motion deblurring using an adaptive image prior. Inf Sci 281:736–749 16. Hu W, Li W, Zhang X, Maybank S (2015) Single and multiple object tracking using a multifeature joint sparse representation. IEEE Trans Pattern Anal Mach Intell 37(4):816–833 17. Gao H, Cai J, Shen Z, Zhao H (2011) Robust principal component analysis-based fourdimensional computed tomography. Phys Med Biol 56:3181–3198 18. Li J, Gong W, Li W (2015) Dual-sparsity regularized sparse representation for single image super-resolution. Inf Sci 298:257–273 19. Liu J, Huang T, Selesnick I, Lv X, Chen P (2015) Image restoration using total variation with overlapping group sparsity. Inf Sci 295:232–246 20. He B, Yuan X (2014) On the direct extension of ADMM for multi-block separable convex programming and beyond: from variational inequality perspective. http://www.optimizationonline.org/DB_HTML/2014/03/4293.html 21. Deng W, Lai M-J, Peng Z, Yin W (2013) Parallel multi-block ADMM with o(1/k) convergence. UCLA CAM Report 13-64. UCLA, Los Angeles 22. He C, Hu C, Li X et al (2016) A parallel alternating direction method with application to compound l 1 -regularized imaging inverse problems. Inf Sci 348:179–197 23. He B, Yuan X. On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. http://www.optimization-online.org/DBHTML/2012/01/3318.html 24. Häuser S, Steidl G (2014) Fast finite shearlet transform. Preprint, arXiv: 1202.1773 25. Weiss P, Blanc-Féraud L, Aubert G (2009) Efficient schemes for total variation minimization under constraints in image processing. SIAM J Sci Comput 31(3):2047–2080
References
139
26. Wen Y, Chan RH (2012) Parameter selection for total-variation-based image restoration using discrepancy principle. IEEE Trans Image Proc 21(4):1770–1781 27. Cai J-F, Osher S, Shen Z (2010) Split Bregman methods and frame based image restoration. Multiscale Model Simulat 8(2):337–369 28. Chan RH, Tao M, Yuan X (2013) Constrained total variational deblurring models and fast algorithms based on alternating direction method of multipliers. SIAM J Imag Sci 6(1):680–697 29. Yang J, Yin W, Zhang Y, Wang Y (2009) A fast algorithm for edge-preserving variational multichannel image restoration. SIAM J Imag Sci 2(2):569–592 30. Lustin M, Donoho DL, Santos JM, Pauly JM (2008) Compressed Sensing MRI: a look at how CS can improve on current imaging techniques. IEEE Sig Proc Mag 25(3):72–82 31. Guo W, Yin W (2012) Edge guided reconstruction for compressive imaging. SIAM J Imag Sci 5(3):809–834 32. Afonso MV, Bioucas-Dias JM, Figueiredo MAT (2011) An augmented Lagrange approach to the constrained optimization formulation of imaging inverse problems. IEEE Trans Image Proc 20(3):681–695
Chapter 6
Parallel Primal-dual Method with Application to Image Restoration
6.1 Summarize From the analysis in Chap. 3, it is clear that image restoration is usually an illposed linear inverse problem and its solution often involves a linear operator (matrix) inversion. The ADMM-based algorithms used in both Chaps. 4 and 5 require a linear operator inversion in image restoration. However, this inversion process may have the following pitfalls: in some cases, the inversion of linear operators (matrices) may not be possible or may be very complicated. In fact, the multichannel image deblurring experiment in Chap. 5 encountered a similar problem. In multichannel image deblurring, the blur matrix K is usually not fully diagonalized by FFT as in grayscale image deblurring, which can significantly increase the operational expense of matrix inversion, because of the inter-channel blurring. Eliminating the matrix inversion is one of the key issues to further improve the execution efficiency of operator splitting algorithms and a feasible way to improve the efficiency of parallel processing of image big data. The linear ADMM (LADMM) method and the linear splitting Bregman method mentioned in Chap. 1 provide a better solution to this problem, and by doing Taylor series expansion of the quadratic terms about the original variables, these two equivalent methods can eliminate the linear operator inverse operation about the original variables. The PADMM algorithm presented in Chap. 5 can eliminate auxiliary variables through Moreau decomposition to obtain a more concise and parallel-friendly Primaldual form, so is it possible to derive a useful Primal-dual algorithm directly through the Lagrangian function of the minimization function? In 2013, Condat proposed an Primal-dual method to the solution of the linear inverse problem [1, 2]. The method solves both primal and dual problems by finding the saddle point of the Lagrangian function of the problem, and eliminates the linear inverse operations that are widely found in ill-posed linear inverse problems through a “full splitting” strategy. However, the shortcoming is that in Condat’s Primal-dual algorithm, all
© Chemical Industry Press 2023 C. He and C. Hu, Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems, Advanced and Intelligent Manufacturing in China, https://doi.org/10.1007/978-981-99-3750-9_6
141
142
6 Parallel Primal-dual Method with Application to Image Restoration
function terms are treated as equivalent, whereas in practice, different function terms may represent different physical meanings. Deriving image inverse problem solving algorithms with high parallelism by means of Primal-dual splitting is a current research hotspot in academia. In addition, the connection between Primal-dual splitting methods and other splitting methods is also an issue worth exploring in depth. In this chapter, a novel parallel Primal-dual splitting (PPDS) method [3] is proposed based on the Condat Primal-dual algorithm. Unlike the Condat algorithm, all linear operators in the proposed algorithm are assigned a positive weight in order to enhance the flexibility of the algorithm and to speed up the actual convergence rate of the algorithm. The structure is highly parallel. The proposed algorithm eliminates linear inverse operation, so that it can be naturally applied to different image boundary conditions, such as circular or symmetric boundary conditions. This chapter proves the convergence of the proposed algorithm and analyzes its o(1/k) convergence rate, using the theory of maximally monotone and nonexpansive operator rather than the more commonly used variational inequalities, which also makes the convergence analysis more concise and straightforward. It is shown that the PPDS algorithm is a relaxed generalized form of the parallel linear alternating direction method of multipliers (PLADMM) and it is further extended to optimization problems with Lipschitz continuous gradient terms. Further, this chapter applies the PPDS algorithm to the solution of image restoration problems with TGV/shearlet compound regularization. Finally, the theory and methods of this chapter are validated by several image restoration experiments built on different databases. This chapter is structured as follows: Sect.6.2 models a generic image inverse problem optimization function with specific properties and gives its saddle point condition, derivation procedure, and two equivalent forms of the parallel Primaldual splitting method for solving the generic model. Section 6.3 gives a convergence proof and convergence rate analysis of the proposed algorithm. Section 6.4 describes the relationship between PPDS and PLADMM and further generalizes the PPDS algorithm. Section 6.5 then details the strategy of applying the PPDS algorithm in TGV/shearlet compound regularized image restoration. Section 6.6 gives the relevant comparative experimental results.
6.2 Parallel Primal-dual Splitting Method 6.2.1 A General Description of the Objective Function for Image Restoration with Proximity Splitting Terms As in Chap. 5, the compound regularizer considered in this chapter is still linear combinations of multiple regularizers. Let all proper, convex, and lower semicontinuous functions mapped from the Hilbert space X to R ∪ {+∞} be ⎡0 {X }, and the minimization objective function model for the image inverse problem developed in
6.2 Parallel Primal-dual Splitting Method
143
this chapter is min g(x) + x∈X
H ∑
f h (L h x),
(6.1)
h=1
where g ∈ ⎡0 {X } and f h ∈ ⎡0 {Vh }. g and f h are sufficiently “simple” in the sense that their proximity operators exist in closed-form or can be easily solved. Unlike in Chap. 5, the “proximity” of g is required here. L h (X → Vh ) is a linear bounded operator whose Hilbert adjoint operator is denoted as L ∗h and the norm induced by its inner product is denoted ||L h || = sup{||L h x||2 : ||x||2 = 1} < +∞. Furthermore, assume that the optimal solution of the problem Eq. (6.1) exists. As in Eq. (5.1), by defining the indicator function of the set, the model Eq. (6.1) can uniformly describe constrained or unconstrained image inverse problems, such as image deblurring, inpainting, compressed sensing, and segmentation. The difficulties involved in the application of practical image inverse problems are discussed in Chap. 5 and will not be repeated here. In the subsequent sections, it is assumed that all Hilbert spaces discussed are finite-dimensional if not specifically stated, and this condition is sufficiently relaxed for practical applications and computations.
6.2.2 Variational Conditions for Optimization of the Objective Function Consider the Fenchel conjugate of the minimization function (6.1), one can obtain its Lagrangian problem min
max
x∈X v 1 ∈V1 ,...,v H ∈VH
g(x) −
H ∑ (
) f h∗ (v h ) − ⟨L h x, v h ⟩ ,
(6.2)
h=1
where f h∗ is the Fenchel conjugate function of f h . The proposed algorithm is called “Primal-dual” because it solves both the primal problem (6.1) and its dual problem by finding the saddle point of the Lagrangian function Eq. (6.2) ( ( max
v 1 ∈V1 ,...,v H ∈VH
− g
∗
−
H ∑ h=1
) L ∗h v h
+
H ∑
) f h∗ (v h )
.
(6.3)
h=1
( ) That is, if x ∗ , v ∗1 , . . . , v ∗H is a saddle point of Lagrangian function ( ) Eq. (6.2), then x* is a solution of the original problem Eq. (6.1) and v ∗1 , . . . , v ∗H is a solution to the dual problem Eq. (6.3). By the classical Karush-Kuhn-Tucker (KKT) theorem, the saddle point of the Lagrangian function Eq. (6.2) satisfies the following variational condition
144
6 Parallel Primal-dual Method with Application to Image Restoration
⎛ ⎞ H ∑ ⎛ ⎞ ∗ ∗ ∗ 0 ⎜ ∂g(x ) + h=1 L h v h ⎟ ⎜0⎟ ⎜ ( ) ⎟ ⎜ ⎟ ⎜ −L 1 x ∗ + ∂ f ∗ v ∗ ⎟ 1 1 ⎜. ⎟ ∈ ⎜ ⎟, ⎟ ⎝ .. ⎠ ⎜ . ⎝ .. ⎠ ( ) 0 ∗ ∗ ∗ −L H x + ∂ f H v H
(6.4)
Condat constructed its Primal-dual algorithm using Lagrangian functional Eq. (6.2) and variational conditional Eq. (6.4). In this chapter, to differentiate each of the f h∗ , consider the following equivalent weighted Lagrangian function min
max
x∈X √v1 ∈V1 ,..., √v H ∈VH
g(x) −
H ( ∑
βH
β1
∗ fh
(
h=1
vh √ βh
)
/ −
√
vh βh L h x, √ βh
√) ,
(6.5)
∗( / √ ) βh = f h∗ (vh ), h = 1, . . . , H . By the KKT theorem, its correwhere ¯f h vh sponding variational condition should be
⎛ ⎞ H √ ∗ ∑ ∗ √v h ∗ ⎛ ⎞ βh L h β ⎜ ∂g(x ) + ⎟ h 0 h=1 ⎜ ( ∗ ) ⎟ ∗ ⎟ ⎜0⎟ ⎜ √ v ⎟ ⎜ ⎟ ⎜ − β1 L 1 x ∗ + ∂ f 1 √β1 1 ⎟. ⎜. ⎟ ∈ ⎜ ⎟ ⎝ .. ⎠ ⎜ . ⎜ .. ⎟ ⎝ ⎠ ( ) 0 ∗ √ ∗ v − β H L H x ∗ + ∂ f H √βH
(6.6)
H
6.2.3 Algorithm Derivation To simplify the subsequent derivation, note that ( v=
v1 vH √ ,..., √ β1 βH
) ∈ V ≜ V1 × · · · × VH .
(6.7)
Define ∗
f (v) ≜
H ∑
f h∗ (v h )
h=1
=
H ∑ h=1
∗ fh
(
vh √ βh
)(
∗ fh
(
vh √ βh
) =
f h∗ (v h )
) .
(6.8)
Define the linear operator L : X → V as Lx ≜
(√
) √ β1 L 1 x, . . . , β H L H x ∈ V .
(6.9)
6.2 Parallel Primal-dual Splitting Method
145
Let the adjoint operator of L be L* . Based on the above notation and definitions, it is easy to verify that the following property holds H ∑ √
vh βh L ∗h √
H ∑
= L ∗h v h ∈ X, β h h=1 h=1 || H || H ||∑ || ∑ || ∗ || || ∗ || || ∗ || L L || = βh || || L L h ||. ≤ β L L || h h h || h || || ∗
L v=
h=1
(6.10)
(6.11)
h=1
With the help of the above notation and properties, the optimality condition Eq. (6.6) can be transformed into ( ) ( ) 0 ∂g(x ∗ ) + L ∗ v ∗ . ∈ −L x ∗ + ∂ f ∗ (v ∗ ) 0
(6.12)
Define the operator M:
(x, v)
→
( ) ∂g(x) + L ∗ v, −L x + ∂ f ∗ (v) .
(6.13)
and note that y = (x, v) ∈ Y ≜ X × V . By Theorem 20.40, Lemma 16.24, Proposition 20.22, and Proposition 20.23 of the literature [4], the operator (x, v) → (∂g(x), ∂ f ∗ (v)) is maximally monotone. Similarly, by Example 20.30 of the literature [4], the operator (x, v) → (L ∗ v, −L x) is also maximally monotone. Thus, ⟨by Lemma '25.4 of 'the ⟩ literature [4], the operator M is maximally monotone, i.e., M y − M y , y − y ≥ 0, ∀ y, y' ∈ Y always holds. The variational condition Eq. (6.12) shows that the saddle point of the Lagrangian function Eq. (6.2) is the null point of the maximally monotone operator M and vice versa. On the other hand, the null points of M are equivalent to the fixed-point of its nonexpansive single-valued resolvent operator (I + M)−1 . Thus, the null points of M can be gained by the relaxed fixed-point iteration as follows [4]. ⎧
˜yk+1 = (I + M)−1( yk ; ) yk+1 = ρ k ˜yk+1 + 1 − ρ k yk .
(6.14)
Expanding the first equation of the above iterative rule yields ) ⎞ ⎛ ( ( ) ) ( k+1 k ∂g x˜ k+1 + L ∗ v˜ k+1 0 ( ) ⎠ + x˜ k+1 − x . ∈⎝ v˜ − vk 0 −L x˜ k+1 + ∂ f ∗ v˜ k+1
(6.15)
It is easy to find that the solution of Eq. (6.15) is very difficult because the update of x˜ k+1 and the update of v˜ k+1 are coupled together. To decouple the updates of these two variables, a bounded nonnegative definite self-adjoint operator R is introduced
146
6 Parallel Primal-dual Method with Application to Image Restoration
and (6.15) is reshaped as ⎛ ( ) ⎞ ) ( k+1 (1 ) ( ) k+1 ∗ k+1 ∗ ˜ ˜ ∂g x + L v x˜ 0 − xk ( )⎠+ t I L . ∈⎝ 0 v˜ k+1 − v k L I −L x˜ k+1 + ∂ f ∗ v˜ k+1 ,, , ,, ,, , , ,, , k+1 k R M y˜ k+1
˜y
(6.16)
−y
R in Eq. (6.16) can be found by the undetermined coefficient method. Expanding Eq. (6.16) and adding a relaxation step yields the following iterative rule for solving Eq. (6.12). ⎧ k+1 ( ) = prox f ∗(L x k + v k(; v˜ ⎪ ⎪ )) ⎨ k+1 = proxtg x k − t L ∗ 2v˜ k+1 − v k ; x˜ ( ) ⎪ ⎪ ⎩ ( x k+1 , v k+1 ) = ρ k x˜ k+1 , v˜ k+1 + (1 − ρ k )( x k , v k ).
(6.17)
Due to ( ) √ v kh v˜ k+1 h k ∗ βh L h x + √ √ = prox f h βh βh || ( ) ( )||2 √ v kh || 1 || v h vh ∗ k || β L x + − + || = arg min f h √ √ √ h h 2 || βh βh βh ||2 √v h βh
)|| ( 1 1 || ||v h − βh L h x k + v k ||2 , = √ arg min f h∗ (v h ) + h 2 2βh βh vh
(6.18)
we have ( ) = proxβh fh∗ βh L h x k + v kh . v˜ k+1 h
(6.19)
Therefore, Eq. (6.17) is equivalent to ⎧ ( ) ⎪ = proxβh f h∗ βh L h x k + v kh , h = 1, . . . , H ; v˜ k+1 ⎪ h ⎪ ( ⎪ ( )) H ⎪ ∑ ⎨ k+1 k+1 ∗ k k = proxtg x − t L h 2v˜ h − v h ; x˜ h=1) ( ⎪ k+1 k+1 ⎪ k k k ⎪ ⎪ ⎪ v hk+1 = ρ kv˜ hk+1 + (1 − ρ k )v hk, h = 1, . . . , H ; ⎩ = ρ x˜ + 1−ρ x . x
(6.20)
6.2 Parallel Primal-dual Splitting Method
147
Algorithm 6.1 gives the parallel Primal-dual splitting (PPDS) algorithm that summarizes the above discussion. Algorithm 6.1 Parallel Primal-dual Splitting Algorithm (PPDS1) Step 1: Initialize x 0 =) 0, v 0h = 0, βh > 0, h = 1, . . . , H , k = 0, 0 < t ≤ ( /∑ || ∗ || H || || . 1 h=1 βh L h L h Step 2: Determine whether the termination conditions are met; if not, perform the following steps. ( ) Step 3: v˜ k+1 = proxβh fh∗ βh L h x k + v kh , h = 1, . . . , H . h ( ( )) ∑H Step 4: x˜ k+1 = proxtg x k − t h=1 L ∗h 2v˜ k+1 − v kh . h ( ) Step 5: v k+1 = ρ k v˜ k+1 + 1 − ρ k v kh , h = 1, . . . , H . h h ( ) Step 6: x k+1 = ρ k x˜ k+1 + 1 − ρ k x k . Step 7: k = k + 1 . Step 8: End the loop and output x k+1 . In Eq. (6.16), introduce the following linear operator to replace R R' =
) I −L ∗ , −L I
(1 t
(6.21)
then x˜ k+1 will be updated before v˜ k+1 , from which the following equivalent PPDS algorithm can be obtained. Algorithm 6.2 Equivalent Parallel Primal-dual Splitting Algorithm (PPDS2) Step 1: Initialize x 0 =) 0, v 0h = 0, βh > 0, h = 1, . . . , H, k = 0, 0 < t ≤ ( /∑ || ∗ || H || || . 1 h=1 βh L h L h Step 2: Determine whether the termination conditions are met; if not, perform the following steps. ( ) ∑H Step 3: x˜ k+1 = proxtg x k − t h=1 L ∗h v kh . ( ( ) ) Step 4: v˜ k+1 = proxβh fh∗ βh L h 2 x˜ k+1 − x k + v kh , h = 1, . . . , H . h ( ) Step 5: x k+1 = ρ k x˜ k+1 + 1 − ρ k x k . ) ( Step 6: v k+1 = ρ k v˜ k+1 + 1 − ρ k v kh , h = 1, . . . , H . h h Step 7: k = k + 1. Step 8: End the loop and output x k+1 . The condition on t in both algorithms stems from the nonnegative characterization of the linear operators R and R’, and the derivation of this condition has a detailed exposition in the convergence discussion in the next subsection. As can be seen from Algorithms 6.1 and 6.2, the proposed PPDS method has a highly parallel structure with independent parallel updates of the dual variables, and is therefore suitable for distributed computation.
148
6 Parallel Primal-dual Method with Application to Image Restoration
6.3 Convergence Analysis This section will show that the sequence generated by the PPDS algorithm starting at an arbitrary point converges to the saddle point of the Lagrangian function Eq. (6.2) and that the algorithm has a convergence rate of o(1/k).
6.3.1 Convergence Proof Lemma 6.1 below is derived from Theorem 6.14 and its proof in the literature [4]. The proof of Theorem 6.1 is then inspired by the convergence analysis in the literature [1]. Lemma 6.1 [4] Let W be a nonempty closed convex set in a finite-dimensional || || let T : W → W be a nonexpansive operator , i.e., || T w − T w' || ≤ Hilbert spaceH; || || ||w − w' ||, ∀w, w' ∈ W with Fix T /= ∅ (i.e., there exists a fixed-point of T { } ( ) ); let μk k∈N be a bounded sequence in (0, 1], let τ k = μk 1 − μk and have ∑ k 0 k∈N τ = +∞ ; let w ∈ W and have ) ( wk+1 = 1 − μk wk + μk T w k .
(6.22)
Then ∀w∗ ∈ FixT and the following terms holds. {|| ||2 } (i) ||w k − w∗ || is monotonically non-increasing. {|| ||2 } (ii) || T w k − wk || }is monotonically non-increasing and converges to 0. { || ||2 } ||2 || ||2 ∑∞ i || τ || T wi − wi || ≤ ||w 0 − w ∗ || . (iii) τ k || T w k − wk || is addable and has i=0 { } (iv) wk converges to a point in FixT . { } { } { } k Lemma 6.2 [5] Let∑a k (, bk ,) and ck be non-negative /( ) series and if c < 1 , k+1 k k k k k k a ≤ c a + b , k∈N 1 − c = +∞ andb 1 − c → 0 holds, then there must be a k → 0. Theorem 6.1 Let {xk , v k1 , …., v kH } be the sequence generated by the PPDS algorithm > 0, βh >( 0, h =) 1, …, H and ||the following two conditions: (i) t ∑ ∑ H and ||satisfy t h=1 βh || L ∗h L h || ≤ 1 hold; (ii) ρ k ∈ [ε, 2 − ε] and k∈N ρ k 2 − ρ k = +∞ hold for some ε > 0. Then {xk , v k1 , …., v kH } converges to the saddle point of problem Eq. (6.2) and, in particular, {xk } converges to a solution of problem Eq. (6.1). Proof Let P be an orthogonal projection operator projected onto the range ran R (or ran R' ) of R (or R' ). Due to the positive semidefinite R, P is positive semidefinite and self-conjugate and I − P is an orthogonal projection operator projected onto the null domain zer R = (ran R)⊥ of P. It is easy to verify that Q ≜ R + I − P is a positive definite operator, so that the inner product ⟨·, ·⟩ Q can be defined by
6.3 Convergence Analysis
149
√ ⟨ y, y⟩ Q = ⟨ y, Q y⟩ and norm ||·|| Q can be defined by || y|| Q = ⟨ y, Q y⟩. Due to the positive definiteness of Q, the inner product ⟨·, ·⟩ Q and the norm ||·|| Q are equivalent to the inner product ⟨·, ·⟩ and the norm ||·|| in Hilbert space, respectively. Define T : yk → ˜yk+1 .
(6.23)
It is easy to verify that R ◦ P = R, P ◦ R = R, and T ◦ P = T hold and that there are ( ( ) ) P yk+1 = 1 − ρ k P yk + ρ k P T yk .
(6.24)
Next, we prove that the composite operator P ◦ T is firmly non-expansive. From the definition of T and Eq. (6.16), it follows that 0 ∈ M(T y) + R(T y) − R y ∀ y ∈ Y.
(6.25)
That is, (Ty, Ry − R(Ty)) belongs to the image gra M of the operator M. Since M is maximally monotone, we have ∀ y, y' ∈ Y , the following equation holds )⟩ ⟨ ( 0 ≤ T y − T y' , R y − R(T y) − R y' + R T y' ( ) )⟩ ( P◦R=R⟨ = P(T y) − P T y' , R y − R(T y) − R y' + R T y' ⟨ ( ) )⟩ ( = P(T y) − P T y' , R y − R y' − R P(T y) + R P T y' || ( ⟨ ( ) ⟩ )||2 = P(T y) − P T y' , y − y' Q − || P(T y) − P T y' || Q .
(6.26)
Thus, by Proposition 5.2 (iv) of the literature [4], the composite operator P ◦ T is firmly non-expansive. Next, we establish the connection between the null points of the maximally monotone operator M and the fixed-point of P ◦ T . Let y∗ ∈ zer M /= ∅, then both (T y∗ , R y∗ − R(T y∗ )) and ( y∗ , 0) belong to gra M. By the monotonicity of M, we have )⟩ ⟨ ( 0 ≤ T y∗ − y∗ , R y∗ − R T y∗ .
(6.27)
Since R is non-negative definite, we have ⟨ ( ) ⟩ 0 ≤ T y∗ − y∗ , R T y∗ − R y∗ .
(6.28)
Combining Eqs. (6.27) and (6.28) yields ⟨ and
)⟩ ( T y∗ − y∗ , R T y∗ − y∗ = 0,
(6.29)
150
6 Parallel Primal-dual Method with Application to Image Restoration
) ( R T y∗ − y∗ = 0.
(6.30)
i.e., (T y∗ − y∗ ) ∈ zerR. This conclusion together with the definition of P shows that P(T y∗ − y∗ ) = 0 holds. Since T ◦ P = T , it follows that )) ( ) ( ( P T P y∗ = P T y∗ = P y∗ .
(6.31)
That is, P y∗ ∈ Fix P ◦ T holds. Conversely, if we assume z ∗ ∈ Fix P ◦ T , we have ( ( ( ) ) ) (5.25) z ∗ = P T z ∗ ⇒ Rz ∗ = R P T z ∗ ⇒ Rz ∗ = R T z ∗ ⇒ T z ∗ ∈ zerM. (6.32) The iterative rule Eq. (6.17) can be transformed into (
Py
k+1
ρk = 1− 2
) P yk +
) ( ρk (2 P ◦ T − I ) P yk . 2
(6.33)
Since P ◦ T is firmly non-expansive, it follows from Proposition 5.2 (iii) of the literature [4] that 2 P ◦ T − I is non-expansive. It follows from the fourth conclusion of Lemma 6.1 that {Pyk } converges to z ∗ ∈ Fix P ◦ T . It follows from the continuity of the proximity operator ( ) that the operator T is continuous, and hence, according to ˜yk+1 = T yk = T P yk , there is ˜yk+1 converges to T z ∗ ∈ zer M. Moreover, we have || || | || k+1 || ||| || || || || y (6.34) − T z ∗ || ≤ ρ k || ˜yk+1 − T z ∗ || + |1 − ρ k ||| yk − T z ∗ || ∀k ∈ N. || || || | || | || || Let a k = || yk − T z ∗ || , bk = ρ k || ˜yk+1 − T z ∗ || , and ck = |1 − ρ k | , then we have ( ) ∑ bk → 0 . By the second assumption in Theorem 6.1, we have k∈N 1 − ck = / ( ) { +∞,} so we have bk 1 − ck → 0. Thus, by Lemma 6.2, there is a k → 0 and yk+1 converges to T z ∗ ∈ zerM, i.e., some saddle point { of Lagrangian } function Eq. (6.5). By the equivalence of Eqs. (6.2) and (6.5), x k ; v k1 , . . . , v kH converges to some saddle point of Eq. (6.2), and in particular, {xk } converges to a solution of the problem Eq. (6.1). Theorem 6.1 is proved.
6.3.2 Convergence Rate Analysis ( / ) This subsection gives the o (1 /k )convergence rate of the proposed PPDS algorithm, which is stronger than the O 1 k convergence rate described in the previous chapter [6].
6.3 Convergence Analysis
151
/√ /√ ) ( β1 , . . . , v H β H = (x, v) ∈ X × V , and Theorem { k k 6.2 kLet } y = x, v 1 x ; v 1 , . . . , v H be the sequence generated by the PPDS algorithm under the conditions of Theorem 6.1; let P be the self-adjoint orthogonal projection operator of the projection to the range ran R (or ran R' ) of R (or R' ); and note that ( ) ρk ρk 1− > 0, τ = inf 2 2
(6.35)
|| k+1 ||2 )⟩ ⟨ ( || y − yk || P = yk+1 − yk , P yk+1 − yk .
(6.36)
and
Then the following two points hold (i) || k+1 ||2 || y − yk || P ≤
1 τ
( k )2 || || ρ || y0 − y∗ ||2 2 P k+1
;
(6.37)
|| ||2 || ||2 ( / ) (ii) || yk+1 − yk || P = o 1 k holds, i.e., || yk+1 − yk || P is a higher order / infinitesimal of 1 k. Proof Let || ||2 ) ( ek = ||(2 P ◦ T − I ) P yk − P yk ||2 ,
(6.38)
( ) ρk ρk 1− . τ = 2 2
(6.39)
and k
{ } τ i . By the second and third conclusions of Lemma 6.1, ek || ||2 ∑+∞ i i is monotonically non-increasing and has i=0 τ e ≤ || P y0 − P y∗ ||2 ≤ +∞. Therefore, we have Define τˆk =
∑k
i=0
τˆk ek = ek
k ∑ i=0
τi ≤
k ∑
τ i ei ≤
+∞ ∑
i=0
τ i ei ,
(6.40)
i=0
and e ≤ k
|| || 0 || P y − P y∗ ||2 2 τˆk
.
(6.41)
152
6 Parallel Primal-dual Method with Application to Image Restoration
Therefore, it holds that || || k+1 ||2 ||2 || y − yk || P = || P yk+1 − P yk ||2 ( k )2 || || ) ( ρ ||(2 P ◦ T − I ) P yk − P yk ||2 = 2 2 ( k )2 || || || 1 ρ ( k )2 || || y0 − y∗ ||2 || P y0 − P y∗ ||2 τ 2 ρ P 2 ≤ ≤ . 2 τˆk k+1
(6.42)
On the other hand, because 2k ∑ ( ) τˆ2k − τˆk e2k ≤ τ 2k e2k + · · · + τ k+1 ek+1 = τ i ei ,
(6.43)
i=k+1
there is k ∑
( ) τˆk − τˆ[k / 2] ek ≤
τ i ei →k→+∞ 0,
(6.44)
i=(k+1)/ 2
[ / ] where k 2 is the operation of rounding up k /2. Inequality (6.44) shows that ( ek = o
) 1 . τˆk − τˆ[k / 2]
(6.45)
On the other hand, there is ( [ / ]) k−1 . τˆk − τˆ[k / 2] ≥ τ k − k 2 ≥ τ 2
(6.46)
Equations (6.45) and (6.46) show that || k+1 ||2 || y − yk || P = Then Theorem 6.2 is proved.
(
ρk 2
)2
( / ) ek = o 1 k .
(6.47)
6.4 Further Discussion and Extension of the Primal-Dual Splitting Method
153
6.4 Further Discussion and Extension of the Primal-Dual Splitting Method This section further explores the relationship between the PPDS method and the linear ADMM algorithm and extends it to the case with Lipschitz derivable terms.
6.4.1 Relation to Parallel Linear Alternating Direction Method of Multipliers The parallel linear alternating direction method of multipliers (PLADMM) for solving problem Eq. (6.1) is first derived. The augmented Lagrangian function of Eq. (6.1) is given by LA (x, a1 , . . . , a H ; v 1 , . . . , v H ) ) H ( ∑ βh f h (a h ) + ⟨v h , L h x − a h ⟩ + ||L h x − a h ||22 . = g(x) + 2 h=1
(6.48)
According to the relevant theory in Chap. 5, The PADMM iterative rule for solving the saddle point of Eq. (6.48) is as follows ( ) ⎧ v kh k+1 k ⎪ a L h = 1, . . . , H ; = prox x + h ⎪ f β h β h h / h ⎪ ⎨ k+1 ( ) k+1 k k v h = v h + βh L h x − a h h = 1, . . . , H ; || ||2 H ⎪ ∑ ⎪ k+1 || v k+1 βh || k+1 h ⎪ ⎩x L = arg min g(x) + x − a + . || h h 2 βh || x
(6.49)
2
h=1
By making a Taylor series expansion of the quadratic term in the third equation above around xk and taking the first two terms, then we have /
x
k+1
H ∑
(
)\
|| 1 || || x − x k ||2 2 2t x h=1 || ( )||2 H || ∑ v k+1 1 || || || k+1 h k ∗ k βh L h L h x − a h + = arg min g(x) + || x − x + t || || 2t || β x h h=1 2 ( ( )) H k+1 ∑ v = proxtg x k − t βh L ∗h L h x k − ak+1 + h . (6.50) h βh h=1 = arg min g(x) + x − x , k
βh L ∗h
Lh x − k
ak+1 h
v k+1 + h βh
+
154
6 Parallel Primal-dual Method with Application to Image Restoration
Therefore, the PLADMM iterative rule can be obtained as follows ( ) ⎧ v kh k+1 k ⎪ a L h = 1, . . . , H ; = prox x + h ⎪ f β h βh h/ h ⎪ ⎨ k+1 ( ) k+1 k k v h = v h + β( h = 1, . . . , H ; h L h x − ah ( )) H ⎪ ∑ ⎪ v k+1 ∗ k+1 k h ⎪ ⎩x . = proxtg x − t βh L h L h x k − ak+1 + h βh
(6.51)
h=1
Treating v kh + βh L h x k as a whole and applying the Moreau decomposition to the first two steps of Eq. (6.51) yields ⎧ k+1 ( ) v h = proxβh fh∗ βh L h x k + v kh h = 1, . . . , H ; ⎪ ⎪ ⎨ ) ( H ∑ ( k+1 ) k+1 k ∗ k ⎪ = proxtg x − t L h 2v h − v h . ⎪ ⎩x
(6.52)
h
It follows that the PLADMM algorithm is the PPDS algorithm at ρ k ≡1. Therefore, PPDS can be viewed as a relaxed generalized form of( PLADMM.The convergence /∑ || ∗ ||) H || || . L β ondition of PLADMM is similarly given by 0 < t ≤ 1 h Lh h=1 h
6.4.2 Further Extensions of the Parallel Primal-Dual Splitting Method This subsection considers the following general optimization issue min p(x) + g(x) + x∈X
H ∑
f h (L h x).
(6.53)
h=1
Problem Eq. (6.53) adds to problem Eq. (6.1) a convex differentiable function p(x) with gradient ∇ p having γ -Lipschitz continuity, i.e., there exists some γ such that || || || ( )|| ||∇ p(x) − ∇ p x ' || ≤ γ || x − x ' || ∀x, x ' ∈ X.
(6.54)
The Lagrangian function corresponding to the problem Eq. (6.53) is min
max
x∈X v 1 ∈V1 ,...,v H ∈VH
p(x) + g(x) −
H ∑ ( h=1
) f h∗ (v h ) − ⟨L h x, v h ⟩ .
(6.55)
6.4 Further Discussion and Extension of the Primal-Dual Splitting Method
155
Its equivalent weighted Lagrangian function is max
min
x∈X √v1 ∈V1 ,..., √v H ∈VH
p(x) + g(x) −
βH
β1
H ( ∑ h=1
∗ fh
(
vh √ βh
)
/ −
√
vh βh L h x, √ βh
√) . (6.56)
By the KKT theorem, its corresponding variational condition should be ⎛ H √ ∑ v∗ ∗ ∗ ⎛ ⎞ βh L ∗h √βh ⎜ ∇ p(x ) + ∂g(x ) + h 0 ⎜ ) ( h=1 ∗ ∗ ⎜0⎟ ⎜ √ v ⎜ ⎟ ⎜ − β1 L 1 x ∗ + ∂ f 1 √β1 1 ⎜. ⎟ ∈ ⎜ ⎝ .. ⎠ ⎜ . ⎜ .. ⎝ √ ( ∗ ) 0 ∗ v − β H L H x ∗ + ∂ f H √βH
⎞ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎠
(6.57)
H
(1) Generalized Primal-dual splitting method Using the same simplified labeling method as in 6.3.3 section and introducing the same weighting matrix, the following equation can be obtained ( −
(
∇p x 0 ,, , B yk
k
( ) ⎞ ) ( k+1 (1 ) ∗ ∂g x˜ k+1 + L ∗ v˜ k+1 x˜ − xk ( )⎠+ t I L ∈⎝ . v˜ k+1 − v k L I −L x˜ k+1 + ∂ f ∗ v˜ k+1 , ,, , ,, , ,, , , ,, k+1 k R
))
⎛
M y˜ k+1
˜y
−y
(6.58) Expanding Eq. (6.58) and adding a relaxation step yields the following iterative rule for solving (6.53) ⎧ k+1 ( ) = prox f ∗(L x k + v k ; v˜ ⎪ ⎪ ( )) ⎨ k+1 ) ( = proxtg x k − ∇ p x k − t L ∗ 2v˜ k+1 − v k ; x˜ ( ) ⎪ ⎪ ⎩ ( x k+1 , v k+1 ) = ρ k x˜ k+1 , v˜ k+1 + (1 − ρ k )( x k , v k ).
(6.59)
Further expansion gives ⎧ ( ) ⎪ = proxβh f h∗ βh L h x k + v kh h = 1, . . . , H ; v˜ k+1 ⎪ h ⎪ ( ⎪ ( )) H ⎪ ( k) ∑ ⎨ k+1 k+1 ∗ k k = proxtg x − ∇ p x − t L h 2v˜ h − v h ; x˜ h=1 ) k ( ⎪ k+1 k+1 ⎪ k k ⎪ v = ρ v˜ h + (1 − ρ )v h h = 1, . . . , H ; ⎪ ⎪ ⎩ hk+1 = ρ k x˜ k+1 + 1 − ρ k x k . x
(6.60)
156
6 Parallel Primal-dual Method with Application to Image Restoration
Algorithm 6.3 Generalized Parallel Primal-Dual Splitting Algorithm (PPDS3) ( / ) Step 1: Set k = 0 , x 0 = 0, v 0h = 0, βh > 0, h = 1, . . . , H and 1 t − || ∗ || ( / ) ∑H || || h=1 βh L h L h ≥ γ 2 . Step 2: Determine whether the termination conditions are met; if not, perform the following steps. ( ) Step 3: v˜ k+1 = proxβh fh∗ βh L h x k + v kh , h = 1, . . . , H . h ( ( )) ( ) ∑H Step 4: x˜ k+1 = proxtg x k − ∇ p x k − t h=1 L ∗h 2v˜ k+1 − v kh . h ) ( Step 5: v k+1 = ρ k v˜ k+1 + 1 − ρ k v kh , h = 1, . . . , H . h h ) ( Step 6: x k+1 = ρ k x˜ k+1 + 1 − ρ k x k . Step 7: k = k + 1 . Step 8: End the loop and output x k+1 . If R’ is introduced in Eq. (6.57) to instead R, then x˜ k+1 will be updated before updated before v˜ k+1 , which leads to the equivalent generalized PPDS algorithm in Algorithm 6.4. Algorithm 6.4 Equivalent Generalized Parallel Primal-Dual Splitting Algorithm (PPDS3) ( / ) Step 1: Set k = 0 , x 0 = 0, v 0h = 0, βh > 0, h = 1, . . . , H and 1 t − || ∗ || ( / ) ∑H || || h=1 βh L h L h ≥ γ 2 . Step 2: Determine whether the termination conditions are met; if not, perform the following steps. ( ) ( ) ∑H Step 3: x˜ k+1 = proxtg x k − ∇ p x k − t h=1 L ∗h v kh . ( ( ) ) Step 4: v˜ k+1 = proxβh fh∗ βh L h 2 x˜ k+1 − x k + v kh , h = 1, . . . , H . h ) ( Step 5: x k+1 = ρ k x˜ k+1 + 1 − ρ k x k . ) ( Step 6: v k+1 = ρ k v˜ k+1 + 1 − ρ k v kh , h = 1, . . . , H . h h Step 7: k = k + 1 . Step 8: End the loop and output x k+1 . (2) Principle of the algorithm and convergence analysis Although PPDS Algorithm 6.3 (6.4), which incorporates the Lipschitz derivable term, is structurally similar to Algorithm 6.1, the underlying basis for its derivation is quite different. In fact, the promoted PPDS Algorithm 6.3 (6.4) can be seen as a special case of forward-backward splitting algorithm, while the PPDS Algorithm 6.1 (6.2) can be seen as a special case of proximity iteration algorithm. The following theorem establishes the convergence of the extended PPDS algorithm and gives its convergence conditions.
6.4 Further Discussion and Extension of the Primal-Dual Splitting Method
157
Lemma 6.3 Let M 1 : H → 2H be a maximally monotone operator and let M 2 : H → H be a κ-cocoercive (i.e., κ( M/2 is)a firmly nonexpansive operator) map, let k τ∑∈ (0, 2κ] ( andk )define δ ≜ 2 − τ 2κ . Furthermore, let ρ ∈ (0, δ) and have k = +∞. Suppose zer(M 1 + M 2 ) /= ∅ such that k∈N ρ 2 − ρ ( ) ( ) wk+1 = ρ k (1 + ∂ M 1 )−1 w k − τ M 2 wk + 1 − ρ k w k .
(6.61)
{ } Theorem 6.3 Let x k ; v k1 , . . . , v kH be the sequence generated by extending PPDS Algorithm 6.3 (6.4) such that t > 0, β h > 0, h = 1, …, H, if the following two conditions hold (i) || γ 1 ∑ || − βh || L ∗h L h || ≥ ; t 2 h=1 H
(6.62)
(ii) ∀n ∈ N, ρ k ∈ (0, δ), where γ δ ≜2− 2
(
|| 1 ∑ || − βh || L ∗h L h || t h=1 H
)−1 ∈ (1, 2),
(6.63)
( ) ∑ and we have k∈N ρ k 2 − ρ k = +∞. { } Then x k ; v k1 , . . . , v kH converges to the saddle point of the problem Eq. (6.55), and in particular, {xk } converges to a solution of the problem Eq. (6.53). Proof It follows from condition (i) that R is an invertible operator, and by writing its inverse as R−1 , Eq. (6.57) is equivalent to ( ) ( ) ˜yk+1 = I + R−1 ◦ M ◦ I − R−1 ◦ B yk .
(6.64)
As mentioned before, M is a maximally monotone operator, so it follows from the monoprojectivity of R that R−1 ◦ M is monotone in the Hilbert space Y R which is induced by the inner product ⟨·,·⟩ R . By the method of undetermined coefficients, R1 is (( ) )−1 I − L∗ L −t L ∗ (I − t L L ∗ )−1 −1 t R = . (6.65) )−1 ( −L It − L ∗ L (I − t L L ∗ )−1
158
6 Parallel Primal-dual Method with Application to Image Restoration
The residual coercion of R−1 ◦ B is proved below. For any y = (x, v) and y’ = (x’, v’), we have || −1 ( )|| || R ◦ B( y) − R−1 ◦ B y' ||2 ( )⟩ ⟨ −1 ( 'R) −1 = R ◦ B( y) − R ◦ B y , B( y) − B y' \ /( )−1 ( ( ' )) ( ') I ∗ −L L ∇ p(x) − ∇ p x , ∇ p(x) − ∇ p x = t ( ) || −1 || ( )|| I || ||∇ p(x) − ∇ p x ' ||2 − || L ∗ L || ≤ 2 t 1 ∗ ) ( || −1 || || κ= t −||γL L|| γ || || 1 || || x − x ' ||2 || x − x ' ||2 . = ≤ γ2 − || L ∗ L || 2 2 t κ
(6.66)
Define the linear operator Q : (x, v) → (x, 0), then ( R−κQ =
L∗ L I I t
)
( −κ
I0 00
) =
) ( || ∗ || || L L || I L ∗ . L I
(6.67)
It is easy to verify that R − κ Q is semi-positive definite, so we have || ||2 ⟨ ||2 )⟩ ⟨ )⟩ || ( ( γ κ || x − x ' ||2 = y − y' , γ κ Q y − y' ≤ y − y' , R y − y' = || y − y' || R . (6.68) Combining Eqs. (6.66) and (6.68) yields || || ||2 ( )||2 κ || R−1 ◦ B( y) − R−1 ◦ B y' || R ≤ || y − y' || R .
(6.69)
Therefore, κ R−1 ◦ B is non-expansive in Y R . Defining the function q : (x, v) → p(x) , then there is ∇q = R−1 ◦ B in Y R . Therefore, by Corollary 18.16 of the literature [4], κ R−1 ◦ B is firmly non-expansive in Y R , i.e., R−1 ◦ B/is κ-cocoercive. According to condition (i) of Theorem 6.3, we know ( / that) κ ≥ 1 2, so according to Lemma 6.3, we can set τ = 1 and δ = 2 − τ 2κ , i.e., Eq. (6.63) holds. ( ) According to Lemma 6.3, yk converges to zer R−1 ◦ M + R−1 ◦ B = zer(M + B), i.e., converges to the solution of the variational condition Eq. (6.57) or the saddle point Lagrangian { k of } function Eq. (6.56). By the equivalence of Eqs. (6.55) and (6.56), x ; v k1 , . . .{, v k}H converges to the saddle point of the problem Eq. (6.55), and in particular, x k converges to a solution of the problem Eq. (6.53). Theorem 6.3 is proved. From the above discussion, it follows that Algorithm 6.3 (6.4) can be regarded as a special form of forward-backward splitting method. If p(x) = 0 is set, it is found that Algorithm 6.1 (6.2) can then be considered as a special form of proximity iteration algorithm.
6.5 Application of PPDS to TGV/Shearlet Compound Regularized Image …
159
6.5 Application of PPDS to TGV/Shearlet Compound Regularized Image Restoration This section considers the following regularized image inverse problem (
N ∑ ) ||SHr (u)||1 u∗ , p∗ = arg min α1 ||∇u − p||1 + α2 ||E p||1 + α3
∫ s.t.
u, p
r =1
u ∈ Ω ≜ {u : 0 ≤ u ≤ 255}, { } { } u ∈ ψ ≜ u : ||K u − f ||22 ≤ c or u : ||K u − f ||1 ≤ c .
(6.70)
The difference between the above model and the model in Chap. 5 is the interval constraint imposed for the image pixels, which can significantly improve the quality of the image restoration results when the pixel values in the image take a large number of boundary values. To apply PPDS to the solution of Eq. (6.70), the variables and operators are assigned according to Algorithm 6.1 as follows: x =( (u, p), ) g(x) = ιΩ (u) , f 1 (L 1 x) = α1 ||∇u − p||1 , f 2 (L 2 x) = α2 ||E p||1 , f 3 L 3,r x = α3 ||Sr u||1 , and f 4 (L 4 x) = ιψ (u) . In addition, 记 vˆ k+1 = 2v k+1 − v kh . h h According to Algorithm 6.1 (PPDS1), the following PPDS algorithm for the solution of the TGV/shearlet regularization inverse problem can be obtained. Algorithm 6.5 Parallel Primal-dual Splitting Algorithm for Total generalized variation/Shearlet Regularization (PPDS-TGVS). Step 1: Set k = 0, x 0) = 0, v 0h = 0, βh > 0, h = 1, . . . , 4 and 0 < t ≤ ( /∑ || ∗ || 4 || || . 1 h=1 βh L h L h Step 2: Determine whether the termination conditions are met; if not, perform the following steps. Step 3: for i = 1, . . . ( , m;( j = 1, . . . , n; l )= 1, . . . , o) . ( k) k k Step 4: v˜ k+1 = P Bα1 β1 ∇u i, j,l − pi, j,l + v 1,i, j,l . 1,i, j,l ( ( ) ) k k Step 5: v˜ k+1 2,i, j,l = PBα2 β2 E p i, j,l + v 2,i, j,l . ( ( ) ) k+1 k k r = 1, . . . , N . Step 6: v˜3,r,i, j,l = PBα3 β3 Sr u i, j,l + v3,r,i, j,l Step 7: End the for loop. Step 8: If the noise is ( Gaussian noise, ) perform the next step. v k4 k+1 k √ Step 9: v˜ 4 = β4 S c β4 + K u − f . Step 10: If the noise((is impulse noise,)the following steps are)) performed. ( k v4 v k4 k+1 k k Step 11: v˜ 4 = β4 β4 + K u − f − Pc β4 + K u − f .
160
6 Parallel Primal-dual Method with Application to Image Restoration
(
)) ( N ∑ ∗ k+1 ∗ k+1 ∗ k+1 Step 12: u˜ = PΩ u − t ∇ vˆ 1 + Sr vˆ 3,r + K vˆ 4 . r =1 ) ( ∗ k+1 ˆ 2,1 + ∇2∗ vˆ k+1 Step 13: pk+1 = pk1 − t −ˆv k+1 1,1 + ∇1 v 2,3 . 1 ) ( k+1 k ∗ k+1 ∗ k+1 ˆ ˆ v v Step 14: pk+1 = p − t −ˆ v + ∇ + ∇ 1,2 1 2,3 2 2,2 . 2 2 ) ( ( ( k+1 k+1 ) ) k+1 k+1 + (1 − ρ) uk , pk . = ρ u˜ , p˜ Step 15: u , p k+1
k
Step 16: v k+1 = ρ v˜ k+1 + (1 − ρ)v kh h = 1, . . . , H . h h Step 17: k = k + 1 . Step 18: End the loop and output uk+1 . PΩ is the projection operator that projects the pixel values onto the interval constraint Ω, the solution of which is described in detail in Chap. 5. The structure of Algorithms 6.5 is highly parallel and has closed-form solutions for each of its sub-steps. The subproblems on v1 , v2 , v3 , and v4 are independent of each other and can be implemented in parallel, and the subproblems on u, p1 , and p2 have similar properties. Furthermore, the subproblems on v1 , v2 , and v3 are again pixel-by-pixel. Thus, Algorithm 6.5 can be accelerated by parallel computing devices such as GPU. The non-downsampled shearlet transform in Algorithm 6.5 is the most time-consuming, and thus has an overall computational complexity of mn log mn for an m × n image. In many inverse problems, the data acquisition process is usually accompanied by the loss of certain data components, e.g., image blurring can lead to||the attenuation || of high frequency image information. Such processes usually make || K ∗ K || ≤ 1[7], which is the case in this chapter. In addition, there are ||∇u||22 =
∑ (( )2 ( )2 ) u i, j − u i−1, j + u i, j − u i, j−1 i, j
∑( ) 2 2 2 2 u i−1, ≤ j + 2u i, j + u i, j−1 ≤ 8||u||2
(6.71)
i, j
together with ||E
p||22
=
∑ [( i, j
pi, j,1 − pi−1, j,1
)2
+
)2 1( pi, j,1 − pi, j−1,1 + pi, j,2 − pi−1, j,2 2
)2 ] ( + pi, j,2 − pi, j−1,2 ∑[ ( ) ( 2 ) 2 2 2 2 2 pi,2 j,1 + pi−1, ≤ j,1 + 2 pi, j,1 + pi, j−1,1 + pi, j,2 + pi−1, j,2 i, j
)] ( +2 pi,2 j,2 + pi,2 j−1,2 ∑ [( ) ( ) ( 2 ) 2 2 pi,2 j,1 + pi,2 j,2 + pi,2 j,1 + pi,2 j,2 + pi−1, = j,1 + pi−1, j,2 i, j
( ) + pi,2 j−1,1 + pi,2 j−1,2 ≤ 8|| p||22 .
(6.72)
6.6 Experimental Results
161
Thus, according to the operator assignment scheme, there are 4 ∑
|| || || || || || βh || L ∗h L h || = β1 ||∇ T ∇ || + β2 ||E T E ||
h=1
+ β3
N ∑ || || ∗ || || || S Sr || + β4 || K ∗ K || ≤ 8β1 + 8β2 + Nβ3 + β4 . r
r =1
(6.73) Therefore, in this chapter, we set t=
1 . 8β1 + 8β2 + Nβ3 + β4
(6.74)
By setting α3 = 0, the model Eq. (6.70) contains only TGV2α as a regularizer, at which point, Algorithm 6.5 is noted as PPDS-TGV; furthermore, if α 2 = 0 and p = 0 hold simultaneously, Algorithm 6.5 degenerates to PPDS-TV containing only the TV regularization term. In the experimental tables, the best results for each comparison metric are bolded explicitly. It is important to emphasize that the PPDS algorithm eliminates the inverse of the linear operator based on the PADMM algorithm, which makes the PPDS algorithm more potential for multichannel image processing. In multichannel image processing, the degenerate operator (matrix) K often has a more complex form. For example, in multichannel image deblurring, the blur matrix K may not be fully diagonalized by FFT because of the presence of interchannel blur, which can significantly increase the operational expense of matrix inversion.
6.6 Experimental Results The following four image inverse problem experiments are set up in this section to verify the effectiveness of the proposed PPDS algorithm: (i) grayscale/color image deblurring under Gauss/impulse noise; (ii) inpainting of incomplete image data; (iii) MR image reconstruction from partial Fourier observations; and iv) pixel interval constraint validity detection experiments. The quality of the degraded and restored images is evaluated quantitatively by two metrics: peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). To better demonstrate the singlestep execution efficiency of PPDS, the algorithm stopping criterion for the following experiments is more often set to a fixed number of iteration steps. In the image deblurring/inpainting experiments in this chapter, set (α 1 , α 2 , α 3 ) = (1, 3, 0.1) if the image is a grayscale image, and (α 1 , α 2 , α 3 ) = (2, 9, 0.05) if the image is an RGB image; for the MRI reconstruction problem, set (α 1 , α 2 , α 3 ) = (1,
162
6 Parallel Primal-dual Method with Application to Image Restoration
Table 6.1 Details of experimental settings for grayscale image deblurring
Blur kernel Imaging
Gaussian noise Impulse noise (%)
A(9)
Barbara, boat, couple, Elaine
σ =2
G (9, 3)
Goldhill, Lena, σ = man, mandrill
M (30, 30)
Peppers, plane, σ = 4 stream, zelda
√ 8
50 60 70
3, 10). The number of shearlet transform layers is set to 2, i.e., the total number of subbands is 13. In all experiments, let ρ k ≡ 1.9 in the PPDS algorithm. The rest of the parameter settings are the same as the experiments in Chap. 5.
6.6.1 Image Deblurring Experiment To test the applicability of the PPDS algorithm (Algorithms 6.5) for grayscale/multichannel image deconvolution, this subsection sets up several image deblurring experiments in the context of the Lansel image database [8] and the Kodak image database.1 The Lansel image database contains 12 grayscale images of size 512 × 512, while the Kodak image database contains 24 high-resolution RGB images of size 768 × 512 or 512 × 768. Several algorithms are involved in the comparison with the PPDS algorithm, including: the parallel alternating direction method of multipliers (PADMM) of Chap. 5 and the TV-based Wen-Chan and C-SALSA. All the four algorithms are adaptive and avoid the manual selection of the regularization parameter λ. Both PPDS and PADMM can accomplish image deblurring task under Gauss/ impulse noise, while Wen-Chan and C-SALSA can only handle grayscale images under Gaussian noise. Unlike PPDS, all three algorithms, PADMM, Wen-Chan and C-SALSA, contain a matrix inversion process, and the latter two even contain a nested iterative structure. (1) Grayscale image deblurring Table 6.1 gives the background problems for the grayscale image deblurring experiments under Gaussian noise and impulse noise, with three sub-problems for each type of noise. The stopping criterion for all algorithms is the number of iteration steps reaches 150. The PSNR, SSIM and CPU time obtained by each algorithm under Gaussian noise and impulse noise are given in Tables 6.2 and 6.3, respectively. The best result of each test item is highlighted in boldface. From Tables 6.2 and 6.3, the following phenomena can be observed. First, PPDS can obtain PSNR and SSIM similar to but slightly higher than PADMM, which is due to the interval constraint of image 1
http://r0k.us/graphics/kodak/.
6.6 Experimental Results
163
pixel values introduced by PPDS. In particular, PPDS-TGVS obtains the best results. Second, the more complex the regularization model is, the higher the quality of the restored image, but it also takes longer time. It should be noted that in some cases, the PSNR of the TGV algorithm is not significantly better compared to that of the TV algorithm, but the higher SSIM and better visual results (Figs. 6.1 and 6.2) still support the superiority of TGV over TV. Third, the single-step execution of PPDS is more efficient than PADMM, which introduces in each iteration step a linear inverse operation shaped as of the linear inverse operation. The efficiency advantage of PPDS is weakened when the shearlet transform is included in the regularized model, due to the fact that the non-downsampled shearlet transform consumes most of the computing time. The efficiency advantage of PPDS over PADMM is not significant under impulse noise, due to the time-consuming l1 -projection problem involved in both algorithms. Fourth, although PPDS-TGV, which introduces the second-order derivative of the image function, has a more complex regularization model than C-SALSA and Wen-Chan, its single-step execution efficiency is higher than that of C-SALSA and comparable to that of Wen-Chan, due to the simpler algorithm structure. Figures 6.1 and 6.2 show the restored images of Boat under Gaussian noise and Lena under impulse noise for different algorithms, respectively. It can be seen from these recovery images that the TGV model is effective in suppressing the staircasing effects existent in the TV algorithm results under different noises, by seeing the surface of the boat in restored images of the PPDS-TV, PADMM-TV, Wen-Chan, and C-SALSA algorithms in Fig. 6.1, and Lena’s face in restored images of the PPDS-TV and PADMM-TV algorithms in Fig. 6.2. In contrast to the TGV model, the TGV/shearlet compound model preserves the edges in the image more clearly and neatly due to the fact that the shearlet transform can better represent anisotropic features in the image, such as edges and curves. Figures 6.3a and b further demonstrate the convergence process of each algorithm under the two background problems mentioned above through the PSNR variation curves relative to CPU time. First, Fig. 6.3a illustrates that PPDS-TV and PADMMTV have an advantage over the TV-based Wen-Chan and C-SALSA in terms of both convergence speed and PSNR. Second, although PPDS may lag behind PADMM at the beginning due to the linearization means employed, it steps into convergence earlier than PADMM, due to the higher single-step execution efficiency. (2) Multi-channel image deblurring For multichannel image deblurring, the stopping criterion for each algorithm is the number of iteration steps reaches 300. A detailed description of the background problem is given in Table 6.4. Three of the inter-channel blurs are generated by (i) Generate 9 blur kernels: {A(13), A(15), A(17), G(11, 9), G(21, 11), G(31, 13), M(21, 45), M(41, 90), M(61, 135)}; (ii) Assign the above 9 blur kernels to {K 11 , K 12 , K 13 ; K 21 , K 22 , K 23 ; K 31 , K 32 , K 33 }, where K ii is intra-channel blur while the rest are inter-channel blurs;
SSIM
CPU(s)
0.6930
0.6919
0.6875
24.35
24.32
24.42
24.34
24.34
24.20
24.22
PPDS-TV
PADMM-TGVS
PADMM-TGV
PADMM-TV
Wen-Chan
C-SALSA
22.22
9.65
7.97
14.94
64.14
5.97
10.07
59.61
0.7229
0.7164
0.7127
0.7007
0.7109
28.32
28.54
28.37
28.33
28.17
28.26
PPDS-TV
PADMM-TGVS
PADMM-TGV
PADMM-TV
Wen-Chan
C-SALSA
21.96
9.54
7.84
14.55
63.48
5.91
9.97
59.26
30.40
30.33
PPDS-TGVS
PPDS-TGV
0.8369
0.8393 9.98
59.96
Peppers (24.88, 0.6618)
0.7125
0.7168
28.39
PPDS-TGV
0.7237
28.58
PPDS-TGVS
Goldhill (25.80, 0.5992)
0.6854
0.6843
0.6988
0.6865
0.6992
24.44
PPDS-TGV
Barbara (22.45, 0.5691)
PSNR
PPDS-TGVS
Method SSIM
0.7659
0.7631
0.7690
0.7721
0.7782
0.7689
0.7723
0.7802
0.8154
0.8135
0.8210
0.8302
0.8368
0.8198
0.8301
0.8362
27.93
28.00 0.8557
0.8588
Plane (23.02, 0.6381)
29.76
29.73
29.99
30.13
30.36
29.93
30.15
30.40
Lena (26.69, 0.7214)
28.48
28.31
28.53
28.59
28.76
28.49
28.57
28.85
Boat (23.30, 0.5428)
PSNR
10.02
59.89
21.99
9.48
7.85
14.58
63.40
5.93
9.99
59.82
21.90
9.56
7.81
14.53
63.61
5.99
10.11
59.89
CPU(s)
Table 6.2 Experimental results of grayscale image deblurring under Gaussian noise SSIM
CPU(s)
0.7793
0.7759
0.7795
0.7820
0.7890
0.7800
0.7821
0.7916
0.7508
0.7471
0.7495
0.7537
0.7612
0.7495
0.7550
0.7620
21.99
24.50
24.56
0.6549
0.6590
9.51
7.85
14.58
63.42
5.92
10.07
60.00
21.85
9.57
7.84
14.55
63.31
5.91
9.96
59.92
10.06
60.29
Stream (21.11, 0.4249)
27.97
27.92
27.96
28.02
28.17
28.02
28.06
28.22
Man (25.29, 0.6263)
28.32
28.25
28.32
28.37
28.58
28.37
28.41
28.65
Couple (23.19, 0.5060)
PSNR
SSIM
CPU(s)
0.7272
0.7265
0.7296
0.7343
0.7408
0.7319
0.7374
0.7419
21.91
9.61
7.93
15.08
63.81
5.98
10.00
60.02
0.5020
0.5026
0.5044
0.5063
0.5120
0.5053
0.5097
0.5139
32.23
32.32
0.8527
0.8561
Zelda (28.00, 0.7006)
21.81
21.83
21.93
21.95
22.02
21.92
21.96
22.03
(continued)
9.99
59.98
22.01
9.57
7.85
14.58
63.37
5.92
10.01
59.87
Mandrill (20.85, 0.3704)
31.06
31.05
31.11
31.24
31.39
31.16
31.28
31.43
Elaine (27.30, 0.6572)
PSNR
164 6 Parallel Primal-dual Method with Application to Image Restoration
0.8331
0.8382
0.8371
30.15
30.36
30.29
30.12
29.41
29.80
PPDS-TV
PADMM-TGVS
PADMM-TGV
PADMM-TV
Wen-Chan
C-SALSA
0.8305
0.8295
0.8325
SSIM
PSNR
Method
Table 6.2 (continued)
21.99
9.58
7.85
14.57
63.32
5.94
CPU(s)
27.85
27.83
27.86
27.96
27.98
27.89
PSNR
0.8547
0.8546
0.8556
0.8570
0.8584
0.8549
SSIM
22.00
9.70
7.85
14.57
63.31
5.92
CPU(s)
24.46
24.53
24.49
24.51
24.52
24.48
PSNR
0.6441
0.6498
0.6531
0.6551
0.6585
0.6527
SSIM
22.13
9.61
7.90
14.67
64.11
5.98
CPU(s)
31.52
31.45
31.58
32.21
32.23
31.56
PSNR
0.8358
0.8274
0.8336
0.8525
0.8560
0.8328
SSIM
21.99
9.48
7.85
14.58
63.36
5.93
CPU(s)
6.6 Experimental Results 165
24.41
24.26
24.24
PADMM-TGVS
PADMM-TGV
PADMM-TV
12.77
20.08
73.40
12.03
0.7868
0.7824
26.62
26.69
PADMM-TGV
PADMM-TV
0.7871
26.88
PADMM-TGVS
13.72
21.36
74.24
12.49
24.22
24.05
24.29
24.17
0.7756
0.7778
0.7812
0.7751
0.7783
0.7819
0.7840
24.08
26.76
17.18
68.91
PPDS-TV
0.8402
0.8431
24.37
30.15
30.20
0.7882
13.32
21.06
0.8512
0.7948
0.7383
0.7392
30.66
0.8405
0.8435
26.69
28.20
PADMM-TV
74.07
30.16
30.22
27.11
28.22
PADMM-TGV
0.7433
12.02
16.27
0.8534
PPDS-TGV
28.58
PADMM-TGVS
0.7407
0.7430
30.77
PPDS-TGVS
28.56
PPDS-TV
68.22
Plane (7.52, 0.0115)
28.62
PPDS-TGV
0.7490
0.8099
0.8119
0.8184
0.8112
Lena (8.50, 0.0137)
29.05
29.12
29.44
29.04
0.8130
Peppers (7.96, 0.0098)
28.89
PPDS-TGVS
Goldhill (8.20, 0.0115)
0.7156
0.7203
0.7256
0.7163
24.26
PPDS-TV
16.66
29.10
0.7207
24.29
0.8230
PPDS-TGV
68.22
Boat (9.15, 0.0140)
SSIM
29.59
PSNR
0.7284
CPU(s)
Barbara (8.99, 0.0163)
SSIM
24.45
PSNR
PPDS-TGVS
Method
13.04
20.02
72.36
12.31
16.47
69.62
13.10
20.64
73.74
11.94
16.23
69.78
12.94
20.37
74.84
12.13
16.22
70.01
CPU(s)
Table 6.3 Experimental results of grayscale image deblurring under impulse noise SSIM
0.8199
0.8207
0.8294
0.8228
0.8242
0.8359
0.7768
0.7789
0.7829
0.7790
0.7810
0.7869
21.18
21.24
21.37
21.27
21.35
21.56
0.4665
0.4689
0.4787
0.4680
0.4702
0.4801
Stream (6.90, 0.0096)
28.37
28.39
28.54
28.39
28.44
28.63
Man (8.73, 0.0140)
28.80
28.82
29.16
28.83
28.87
29.33
Couple (9.36, 0.0158)
PSNR
13.81
21.33
76.05
12.67
16.92
70.24
12.93
20.39
74.86
12.14
16.32
69.41
12.81
20.18
73.64
11.99
16.17
68.64
CPU(s)
SSIM
0.7661
0.7677
0.7710
0.7658
0.7682
0.7740
12.90
20.39
73.98
11.89
16.26
69.86
CPU(s)
0.5066
0.5081
0.5113
0.5122
0.5143
0.5173
29.64
29.80
30.01
29.64
29.83
30.19
0.8001
0.8214
0.8252
0.8003
0.8219
0.8279
Zelda (10.01, 0.0127)
21.56
21.54
21.77
21.71
21.72
21.84
13.47
20.83
74.71
12.52
16.84
70.13
13.15
20.58
74.14
12.12
16.49
69.13
Mandrill (9.05, 0.0142)
31.90
31.96
32.25
31.88
31.98
32.27
Elaine (8.87, 0.0140)
PSNR
166 6 Parallel Primal-dual Method with Application to Image Restoration
6.6 Experimental Results Ground truth
PPDS-TGVS, PSNR=28.85dB, SSIM=0.7802
PPDS-TV, PSNR=28.49dB, SSIM=0.7689
167 Degraded image, PSNR=23.30dB, SSIM=0.5428
PPDS-TGV, PSNR=28.57dB, SSIM=0.7723
PADMM-TGVS, PSNR=28.76dB, SSIM=0.7782
Fig. 6.1 Restored boat images of different algorithms under A(9) blur and Gaussian nosie with σ =2
168
6 Parallel Primal-dual Method with Application to Image Restoration PADMM-TGV, PSNR=28.59dB, SSIM=0.7721
Wen-Chan, PSNR=28.31dB, SSIM=0.7631
PADMM-TV, PSNR=28.53dB, SSIM=0.7690
C-SALSA, PSNR=28.48dB, SSIM=0.7659
Fig. 6.1 (continued)
(iii) Multiply the above blur kernels by the weights {0.8, 0.1, 0.1; 0.2, 0.6, 0.2; 0.15, 0.15, 0.7} (Blur 1), {0.6, 0.2, 0.2; 0.15, 0.7, 0.15; 0.1, 0.1, 0.8} (Blur 2) and {0.7, 0.15, 0.15; 0.1, 0.8, 0.1. 0.2, 0.2, 0.6} (Blur 3) to obtain the final blur kernels. The experimental results of PPDS-TGV, PPDS-TV, PADMM-TGV, and PADMMTV under two noise conditions are given in Tables 6.5 and 6.6, respectively. The best result of each test item is highlighted in boldface. Compared to the TGV model, the TGV/shearlet compound regularization model does not have an advantage in the deblurring of RGB images, but rather takes longer time. Therefore, the results are not included in the table. Tables 6.5 and 6.6 show that the TGV model has an advantage over the TV model in terms of quantitative comparison of PSNR and SSIM. Figure 6.4 (Gaussian noise) and Fig. 6.5 (impulse noise) further visualize the visual advantages of the TGV-based restoration results compared to the TV-based restoration results. In the TV-based restoration results, the girl’s face and the plane’s belly appear to have significant staircasing effects, in contrast to the TGV-based
6.6 Experimental Results Ground truth
169 Degraded image, PSNR=8.50dB, SSIM=0.0137
PPDS-TGVS, PSNR=30.77dB, SSIM=0.8534
PPDS-TGV, PSNR=30.22dB, SSIM=0.8435
PPDS-TV, PSNR=30.16dB, SSIM=0.8405 PADMM-TGVS, PSNR=30.66dB, SSIM=0.8512
PADMM-TGV, PSNR=30.20dB, SSIM=0.8431 PADMM-TV, PSNR=30.15dB, SSIM=0.8402
Fig. 6.2 Restored Lena images of different algorithms under G(9, 3) blur and 60% impulse noise
170
6 Parallel Primal-dual Method with Application to Image Restoration 31 28.8
30 29
28.4
PPDS-TGVS PPDS-TGV PPDS-TV PADMM-TGVS PADMM-TGV PADMM-TV Wen-Chan C-SALSA
28.2 28 27.8 27.6 0
PSNR (dB)
PSNR (dB)
28.6
10
20
30
40
50
60
28 27
PPDS-TGVS PPDS-TGV PPDS-TV PADMM-TGVS PADMM-TGV PADMM-TV
26 25 24
70
23 0
40
20
CPU time (s)
60
80
CPU time (s)
(a) PSNR curves relative to CPU time for Boat
(b) PSNR curves relative to CPU time for Lena
Fig. 6.3 PSNR curves relative to CPU time for boat restoration experiment under Gaussian noise and for Lena restoration experiment under impulse noise a PSNR curves relative to CPU time for Boat b PSNR curves relative to CPU time for Lena
Table 6.4 RGB image deblurring experimental settings Blur
Image
Gaussian noise
Impulse noise (%)
1
Kodim1–Kodim8
σ =2
50
2
Kodim9–Kodim16
σ =4
60
3
Kodim17–Kodim24
σ =6
70
model restoration results, which contain almost no staircasing effects. In addition, PPDS is slightly better than PADMM in both PSNR and SSIM. Tables 6.5 and 6.6 also illustrate that PPDS is faster than PADMM and that the single-step execution efficiency of PPDS is significantly better than that of PADMM for two reasons. On the one hand, as with grayscale image deblurring, PPDS does not need to introduce matrix inversion operations; on the other hand, when faced with matrix inversion operations, PADMM needs to introduce a Gaussian elimination process to achieve the update of the original variables, which further aggravates the computational burden of PADMM. Figure 6.6 shows that PPDS can step into convergence faster than PADMM, although it may be behind at the beginning. (3) Comparison of PPDS with Condat algorithm and PLADMM algorithm The PPDS method can be considered as a generalization of the PLADMM and Condat Primal-dual methods. To further demonstrate the superiority of PPDS over these two methods, the restored Kodim05 images of PPDS-TGV, PLADMM-TGV, and Condat-TGV methods under Blur 1 and Gaussian noise are given in Fig. 6.7. The number of iteration steps is 300 for all three methods. The parameter settings of PLADMM-TGV and Condat-TGV algorithms make the restoration results optimal. From Fig. 6.7, we can find that the results of PPDS-TGV are clearer than the other
0.3282
0.4533
18.01
5
0.7306
20.41
20
0.6354
0.4657
0.3869
20.21
20.09
18
19
0.5429
22.49
17
0.6387
0.5488
20.26
24.20
15
16
0.4102
19.67
14
0.6382
0.2357
22.38
18.00
12
0.4865
0.6054
13
21.20
11
0.6287
22.24
22.89
9
10
0.2956
16.25
8
0.6022
20.87
21.18
6
7
0.6381
22.81
21.95
3
4
0.3037
0.6636
19.25
20.89
1
2
SSIM
PSNR
Image
Degraded
26.50
24.01
23.35
26.48
27.12
27.80
24.02
20.55
28.41
25.15
27.12
27.58
21.63
27.36
24.63
22.60
28.64
29.83
29.14
23.34
PSNR
0.8039
0.6767
0.5684
0.7401
0.6631
0.7788
0.5859
0.4090
0.7613
0.6492
0.7600
0.7953
0.6820
0.8401
0.6254
0.6258
0.7575
0.8329
0.7630
0.5863
SSIM
PPDS-TGV
140.19
137.96
138.57
137.13
140.62
137.54
140.11
141.70
140.15
140.01
136.96
137.62
140.31
139.13
139.23
139.04
137.93
139.67
139.93
140.13
CPU(s)
26.38
24.00
23.28
26.51
27.10
27.76
23.97
20.53
28.35
25.11
27.01
27.52
21.49
27.24
24.61
22.48
28.54
29.81
29.08
23.22
PSNR
0.8018
0.6749
0.5666
0.7385
0.6593
0.7770
0.5807
0.4074
0.7608
0.6466
0.7540
0.7939
0.6728
0.8351
0.6238
0.6171
0.7516
0.8310
0.7591
0.5841
SSIM
PPDS-TV
Table 6.5 Experimental results of RGB image deblurring under Gaussian noise
96.69
94.21
94.12
94.23
96.70
96.77
96.71
96.63
96.64
96.83
94.13
94.17
96.64
96.54
96.78
96.68
94.14
96.59
96.58
96.82
CPU(s) 23.32
26.48
24.02
23.35
26.50
27.10
27.78
24.02
20.54
28.36
25.13
27.14
27.53
21.51
27.33
24.62
22.52
28.63
29.82
29.14
0.8036
0.6765
0.5679
0.7403
0.6616
0.7779
0.5845
0.4084
0.7611
0.6485
0.7603
0.7943
0.6724
0.8389
0.6256
0.6191
0.7567
0.8326
0.7622
0.5857
SSIM
PADMM-TGV PSNR
206.37
202.31
202.43
203.49
206.95
206.82
206.79
206.96
207.39
207.39
201.82
201.80
206.41
206.75
206.45
206.94
201.36
206.27
206.34
206.14
CPU(s) 23.21
26.37
23.99
23.26
26.49
27.08
27.72
23.99
20.52
28.31
25.11
27.01
27.46
21.45
27.21
24.60
22.46
28.56
29.81
29.09
0.8013
0.6749
0.5659
0.7377
0.6587
0.7764
0.5811
0.4076
0.7603
0.6463
0.7544
0.7932
0.6712
0.8341
0.6233
0.6163
0.7522
0.8304
0.7602
0.5835
SSIM
PADMM-TV PSNR
(continued)
134.28
131.87
131.72
132.19
134.17
134.37
134.18
134.21
134.30
134.12
131.61
131.89
134.35
134.11
134.36
134.14
131.65
134.41
134.32
134.48
CPU(s)
6.6 Experimental Results 171
20.19
19.44
23
24
0.4098
0.6691
0.4830
0.4670
20.22
21.34
21
22
SSIM
PSNR
Image
Degraded
Table 6.5 (continued)
23.83
22.60
27.73
25.42
0.6200
0.8528
0.6404
0.6842
SSIM
PPDS-TGV
PSNR
140.64
140.28
139.24
139.95
CPU(s) 23.78
22.57
27.61
25.38 0.6189
0.8446
0.6388
0.6827
SSIM
PPDS-TV PSNR
96.58
96.80
96.77
96.84
CPU(s) 23.81
22.58
27.73
25.43 0.6193
0.8524
0.6400
0.6837
SSIM
PADMM-TGV PSNR
206.89
206.41
206.65
206.62
CPU(s) 23.77
22.54
27.56
25.37
0.6183
0.8436
0.6383
0.6822
SSIM
PADMM-TV PSNR
134.25
134.21
134.18
134.16
CPU(s)
172 6 Parallel Primal-dual Method with Application to Image Restoration
0.0189
0.0160
7.72
5
0.0176
5.48
20
0.0103
0.0115
0.0101
6.30
6.87
18
19
0.0105
6.39
17
0.0123
0.0149
6.55
7.58
15
16
0.0137
7.18
14
0.0143
0.0131
7.44
7.20
12
0.0133
0.0149
13
7.37
11
0.0147
7.80
7.79
9
10
0.0201
7.77
8
0.0192
8.04
8.36
6
7
0.0181
8.21
8.17
3
4
0.0183
0.0164
8.30
7.68
1
2
SSIM
PSNR
Image
Degraded
27.62
24.37
23.99
27.95
28.59
29.77
25.82
21.39
30.76
26.79
29.49
30.09
22.90
30.91
26.18
24.25
30.94
32.18
31.43
25.16
PSNR
0.8442
0.7216
0.6488
0.8073
0.7569
0.8410
0.7130
0.5380
0.8349
0.7533
0.8532
0.8614
0.7743
0.9219
0.7539
0.7560
0.8376
0.8896
0.8464
0.7273
SSIM
PPDS-TGV
200.14
198.33
197.43
198.15
206.52
194.50
205.93
203.83
198.67
210.43
216.66
198.05
197.57
197.80
197.96
203.79
203.00
201.37
195.45
198.84
CPU(s)
27.58
24.30
23.98
27.84
28.55
29.72
25.75
21.33
30.71
26.78
29.47
30.03
22.82
30.77
26.11
24.17
30.88
32.00
31.31
25.10
PSNR
0.8414
0.7202
0.6459
0.8007
0.7549
0.8394
0.7107
0.5365
0.8316
0.7521
0.8523
0.8591
0.7713
0.9195
0.7475
0.7506
0.8362
0.8863
0.8426
0.7215
SSIM
PPDS-TV
Table 6.6 Experimental results of RGB image deblurring under impulse noise
158.54
156.41
155.86
155.22
156.89
153.86
156.75
158.42
155.56
156.55
153.73
153.33
155.73
153.61
154.88
155.26
150.57
152.23
152.63
156.42
CPU(s) 25.14
27.58
24.35
23.96
27.91
28.57
29.74
25.80
21.38
30.75
26.79
29.50
30.06
22.83
30.87
26.14
24.20
30.90
32.15
31.44
0.8424
0.7208
0.6475
0.8062
0.7561
0.8407
0.7125
0.5383
0.8340
0.7530
0.8534
0.8609
0.7727
0.9208
0.7526
0.7540
0.8367
0.8889
0.8466
0.7263
SSIM
PADMM-TGV PSNR
264.96
264.40
265.44
263.36
272.03
268.84
272.90
278.40
269.34
273.59
277.88
273.65
267.93
266.02
271.59
267.33
268.28
262.66
269.77
268.79
CPU(s) 25.08
27.51
24.31
23.95
27.82
28.54
29.70
25.72
21.31
30.68
26.77
29.45
30.02
22.76
30.71
26.04
24.11
30.85
31.96
31.29
0.8386
0.7189
0.6410
0.8000
0.7543
0.8392
0.7098
0.5360
0.8309
0.7517
0.8525
0.8594
0.7701
0.9192
0.7443
0.7479
0.8346
0.8857
0.8418
0.7211
SSIM
PADMM-TV PSNR
(continued)
196.26
194.91
195.09
195.01
195.98
194.26
197.19
197.56
194.81
195.88
192.99
192.29
195.96
193.61
194.82
195.81
191.22
192.44
193.18
196.45
CPU(s)
6.6 Experimental Results 173
6.57
6.67
23
24
0.0117
0.0112
0.0123
0.0112
6.96
6.89
21
22
SSIM
PSNR
Image
Degraded
Table 6.6 (continued)
24.52
23.17
28.81
26.38
0.6938
0.8881
0.7083
0.7366
SSIM
PPDS-TGV
PSNR
201.45
199.06
200.94
201.25
CPU(s) 24.49
23.13
28.61
26.32 0.6880
0.8835
0.7044
0.7347
SSIM
PPDS-TV PSNR
159.43
157.10
159.05
159.95
CPU(s) 24.50
23.12
28.74
26.33 0.6917
0.8843
0.7074
0.7360
SSIM
PADMM-TGV PSNR
269.70
267.14
268.05
270.24
CPU(s) 24.46
23.09
28.52
26.29
0.6832
0.8811
0.7018
0.7324
SSIM
PADMM-TV PSNR
198.45
196.13
197.81
198.47
CPU(s)
174 6 Parallel Primal-dual Method with Application to Image Restoration
6.6 Experimental Results Ground truth
PPDS-TGV, PSNR=27.80dB, SSIM=0.7788
PADMM-TGV, PSNR=27.78dB, SSIM=0.7779
175 Degraded image, PSNR=20.26dB, SSIM=0.6387
PPDS-TV, PSNR=27.76dB, SSIM=0.7770
PADMM-TV, PSNR=27.72dB, SSIM=0.7764
Fig. 6.4 Restored Kodim15 images for different algorithms under blur 2 and Gaussian noise with σ=4
results, and the convergence speed of PPDS-TGV is faster than PLADMM-TGV and Condat-TGV.
6.6.2 Image Inpainting Experiment This subsection illustrates the potential application of PPDS in grayscale/RGB image inpainting through two experiments. The methods involved in the comparison are PPDS-TGVS, PPDS-TGV, PPDS-TV, and C-SALSA (for grayscale images only). Barbara and Kodim23 are selected as original images, both of which contain rich detail information. Barbara and Kodim23 are set to lose 0, 10,…, 70% of the pixels and subsequently Gaussian noises with σ = 10 (Barbara) and σ = 20 (Kodim23) are added, respectively. The inpainting problem degenerates to a pure denoising problem
176
6 Parallel Primal-dual Method with Application to Image Restoration Ground truth
PPDS-TGV, PSNR=27.62dB, SSIM=0.8442
PADMM-TGV, PSNR=27.58dB, SSIM=0.8424
Degraded image, PSNR=5.48dB, SSIM=0.0103
PPDS-TV, PSNR=27.58dB, SSIM=0.8414
PADMM-TV, PSNR=27.51dB, SSIM=0.8386
Fig. 6.5 Restored Kodim20 images for different algorithms under blur 3 and 70% impulse noise
at 0% pixel loss. The total number of iteration steps of the algorithms is set to 100 and 150 for grayscale image inpainting and RGB image inpainting, respectively. Figures 6.8a and b plot the PSNR and SSIM corresponding to Barbara images when the percentage of pixels missing varies. Figures 6.8c and d plot the PSNR and SSIM corresponding to Kodim23 images. From Figs. 6.8a–d, it can be observed that, first, compared to the TGV model and the TV model, the TGV/shearlet compound model exhibits consistent superiority. This advantage diminishes when the pixel loss rate rises. In particular, when the loss rate reaches 70%, the PSNR and SSIM of the three models are already quite close. Second, in grayscale Barbara image inpainting, the advantage of the TGV/shearlet compound model over the TGV model
6.6 Experimental Results
177 28
27.8
27.5
27.4
PSNR (dB)
PSNR (dB)
27.6
27.2 27 26.8
PPDS-TGV PPDS-TV PADMM-TGV PADMM-TV
26.6 26.4 26.2
0
50
100
150
200
27 26.5 26
PPDS-TGV PPDS-TV PADMM-TGV PADMM-TV
25.5 25
0
50
100
CPU time (s) (a) PSNR curves relative to CPU time for Kodim15
150
200
250
300
CPU time (s) (b) PSNR curves relative to CPU time for Kodim20
Fig. 6.6 PSNR curves relative to CPU time for Kodim15 restoration experiment under Gaussian noise and for Kodim20 restoration experiment under impulse noise
is more pronounced than in RGB Kodim23 image inpainting. Third, PPDS clearly outperforms C-SALSA in both PSNR and SSIM. Figure 6.9 gives the restored images and the corresponding error images of the four methods at 50% pixel loss for Barbara (the original error image is affinely projected onto the [0, 255] interval for visual enhancement), while Fig. 6.10 gives the restored images and the corresponding local zoomed images of the three methods at 40% pixel loss for Kodim23. On the one hand, PPDS-TGV is able to suppress well the staircasing effects prevalent in the results of PPDS-TV and C-SALSA. On the other hand, PPDS-TGVS is able to better recover the texture detail information in the images compared to PPDS-TGV, such as Barbara’s dress in Fig. 6.9 and the partially zoomed image in Fig. 6.10. Figure 6.8e and f give the PSNR variation curves relative to CPU time corresponding to each method in the context of the above two problems, respectively. It can be noticed that PPDS-TGVS is more time consuming than the other algorithms; despite the more compound regularization model, PPDS-TGV can step into convergence faster than the TV-based C-SALSA.
6.6.3 Image Compressed Sensing Experiments This subsection examines the applicability of PPDS to the compressed sensing problem-MRI reconstruction. The comparative algorithms introduced for this experiment are the edge-guided compressed sensing reconstruction method [9] (Edge-CS) and the TV-based C-SALSA algorithm. The original image used is the Foot map from the MRI experiment in Chap. 4. In this experiment, the sampling rate of the Fourier data is set to 4.57, 6.45, 8.79, 10.64, 12.92, 14.74, 16.94, 18.73, and 20.87%, and the corresponding number of radiation sampling lines in the frequency domain is 20, 30, …, 100, respectively. The Gaussian noise variance in the observed data is kept constant, which makes the data signal-to-noise ratio is 40 dB at a sampling rate
178
6 Parallel Primal-dual Method with Application to Image Restoration Ground truth
PPDS-TGV, PSNR=22.60dB, SSIM=0.6258
Degraded image, PSNR=18.02dB, SSIM=0.3282
PLADMM-TGV, PSNR=22.50dB, SSIM=0.6183
Condat-TGV, PSNR=22.15dB, SSIM=0.6044
PSNR vs CPU time (s) 23
PSNR (dB)
22 21
20 PPDS-TGV PLADMM-TGV Condat-TGV
19
18 0
20
40
60
80
100
120
140
CPU time (s)
Fig. 6.7 Restored Kodim05 images of PPDS, PLADMM, and Condat methods and the corresponding PSNR curves relative to CPU time
of 10.64% and the number of frequency domain radiation sampling lines is 50.The number of iteration steps for all algorithms is 300. The variation curves of PSNR and SSIM with respect to the frequency domain sampling rate are given in Fig. 6.11a and b. Figure 6.11c plots the variation curves of PSNR with respect to time for a sampling rate of 6.45% (the number of radiation lines of the frequency-domain sampling mask is 30). It can be seen that, first, PPDS (including PPDS-TV) consistently outperforms edge-CS and C-SALSA in terms of
6.6 Experimental Results
179
31
PPDS-TGVS PPDS-TGV PPDS-TV C-SALSA
29
PPDS-TGVS PPDS-TGV PPDS-TV C-SALSA
0.85
28
SSIM
PSNR (dB)
30
27
0.8
26 0.75
25 24 0.1
0
0.3
0.2
0.4
0.6
0.5
0.7 0
0.7
0.1
0.3
0.2
Pixel loss level
0.4
0.7
(b) SSIM curve relative to pixel loss level for
(a) PSNR curve relative to pixel loss level for Barbara Barbara
0.91
PPDS-TGVS PPDS-TGV PPDS-TV
33 PPDS-TGVS PPDS-TGV PPDS-TV
32
0.9 0.89
31
SSIM
PSNR (dB)
0.6
0.5
Pixel loss level
30
0.88 0.87 0.86
29
0.85
28 0
0.1
0.3
0.2
0.5
0.4
0.6
0.7
0.84 0
0.1
0.2
0.3
Pixel loss level
0.4
0.5
0.6
0.7
Pixel loss level (b) SSIM curve relative to pixel loss level for
(c) PSNR curve relative to pixel loss level for Kodim23 Kodim23 26
30
22
PSNR (dB)
PSNR (dB)
24
20 PPDS-TGVS PPDS-TGV PPDS-TV C-SALSA
18 16 -1
10
0
10
1
10
25
20 PPDS-TGVS PPDS-TGV PPDS-TV
15 2
10
0
10
CPU time (s) (e) PSNR curve of Barbara at 50% pixel loss and σ=10 and σ=20
1
10
2
10
CPU time (s) (f) PSNR curve of Kodim23 at 40% pixel loss
Fig. 6.8 PSNR and SSIM curves for Barbara and Kodim23
3
10
180
6 Parallel Primal-dual Method with Application to Image Restoration Ground truth
PPDS-TGVS, PSNR=25.70dB, SSIM=0.7874
PPDS-TGV, PSNR=25.48dB, SSIM=0.7812
Degraded, PSNR=8.89dB, SSIM=0.0949
Error image for PPDS-TGVS
Error image for PPDS-TGV
Fig. 6.9 Restored and error Barbara images at 50% pixel loss and Gaussian noise with σ = 10
6.6 Experimental Results PPDS-TV, PSNR=25.47dB, SSIM=0.7789
C-SALSA, PSNR=25.31dB, SSIM=0.7664
181 Error image for PPDS-TV
Error image for C-SALSA
Fig. 6.9 (continued)
both PSNR and SSIM at different sampling rates. A plausible explanation for this phenomenon is that interval constraints on pixel values play a crucial role in the improvement of MRI reconstruction quality, due to the fact that MRI raw images usually have a large number of pixel values located at the edges of a given dynamic region, e.g., all pixel values in the background portion of the foot image are 0. As shown in Fig. 6.12, the edges between the foot and the black background reconstructed by PPDS-TV are not worse than those in the edge-CS results. Second, the edge-CS method is more effective at lower sampling rates, but it does not improve or may even degrade the reconstruction quality when the number of radiuses in the frequency domain sampling mask is larger than 50. Third, PPDS-TGV can effectively eliminate the staircasing effects present in the TV method, but it does not reconstruct well the large amount of texture details present in the original image. In contrast, PPDS-TGVS successfully reconstructs texture features of the foot with a very low sampling rate, thanks to the fact that the shearlet transform can better characterize the image details in the frequency domain. Finally, Fig. 6.11c shows that the PSNR of PPDS-TV rises faster than that of edge-CS and C-SALSA, and can step
182
6 Parallel Primal-dual Method with Application to Image Restoration Ground truth
Degraded, PSNR=10.55dB, SSIM=0.0474
PPDS-TGVS, PSNR=30.89dB, SSIM=0.8905
Partially Zoomed Image for PPDS-TGVS
PPDS-TGV, PSNR=30.80dB, SSIM=0.8887
Partially Zoomed Image for PPDS-TGV
PPDS-TV, PSNR=30.71dB, SSIM=0.8869
Partially Zoomed Image for PPDS-TV
Fig. 6.10 Restored and error Kodim23 images at 40% pixel loss and Gaussian noise with σ = 20
6.6 Experimental Results
183
38
32
SSIM
PSNR (dB)
34 30 28
PPDS-TGVS PPDS-TGV PPDS-TV Edge-CS C-SALSA
26 24 22
0.05
0.1
0.15
Sampling rate
0.2
0.9 0.85 PPDS-TGVS PPDS-TGV PPDS-TV Edge-CS C-SALSA
0.8 0.75
0.05
0.1
0.15
0.2
PSNR (dB)
0.95
36
30 29 28 27 26 25 24 23 22 21 -1 10
Sampling rate
PPDS-TGVS PPDS-TGV PPDS-TV Edge-CS C-SALSA 0
10
1
10
2
10
3
10
CPU time (s)
Fig. 6.11 PSNR and SSIM curves of reconstructed Foot images relative to frequency domain sampling rate, and PSNR curve relative to CPU time of different methods at a sampling rate of 6.45% and a signal-to-noise ratio of 42 dB
into convergence earlier. In fact, edge-CS uses a hybrid C/MATLAB programming for acceleration.
6.6.4 Experiments on the Validity of Pixel Interval Constraints This subsection illustrates the effectiveness of interval constraint in specific cases especially when image pixel values take a large number of boundary values of interval constraint, and the superiority of PPDS algorithm compared to other algorithms by comparing PPDS-TV with four other algorithms. The four algorithms involved in the comparison are Chan-Tao-Yuan [10], box-constrained multiplicative iterative algorithm[ [11] (BCMI), APEBCADMM-TV [12] (APE-BCADMM algorithm in Chap. 4), and APEADMM-TV (APE-ADMM in Chap. 4). The first three algorithms, along with PPDS-TV, use interval constraints on pixel values, and all five methods can handle Gaussian noise and impulse noise. In addition, BCMI is able to handle Poisson noise. The three test images shown in Fig. 6.13 are involved in the subsequent comparison experiments. The percentage of pixels taking boundary values (0 or 255) are 100%, 89.81, and 28.87% for the Text, Satellite, and Fingerprint images, respectively. The design of the background problem || || /is|| given || in Table 6.7, and the stopping criterion for all algorithms is ||uk+1 − uk ||2 ||uk ||2 ≤ 10−4 or the number of iteration steps reaches 2000. Tables 6.8 and 6.9 give the PSNR, SSIM, number of iteration steps, and CPU time for different algorithms under Gaussian noise and impulse noise, respectively. The best result of each test item is highlighted in boldface. Figures 6.14 and 6.15 then give the restored Satellite images under G(9, 3) blur and Gaussian noise with σ = 3.5, and the restored Text images under A(9) blur and 50% impulse noise, respectively. From the experimental results, first, interval constraints on pixel values are crucial for improving the quality of the restored image when a large number of pixel values of the image take the boundary values of a given dynamic range. In fact, the higher
184
6 Parallel Primal-dual Method with Application to Image Restoration Frequency Sampling Mask
PPDS-TGVS, PSNR=29.19dB, SSIM=0.8958
PPDS-TGV, PSNR=28.67dB, SSIM=0.8878
PPDS-TV, PSNR=28.46dB, SSIM=0.8857
Edge-CS, PSNR=26.27dB, SSIM=0.7955
C-SALSA, PSNR=25.14dB, SSIM=0.5834
Fig. 6.12 Reconstructed Foot images of different methods at a sampling rate of 6.45% in Frequency domain (the number of radiuses in the mask is 30) and at a SNR of 42 dB
6.6 Experimental Results
185
Fig. 6.13 Test images: text, satellite, and fingerprint, with a size of 256 × 256
Table 6.7 Image deblurring experiment setting details Blur Kernel
Image
Gaussian noise
Impluse noise (%)
A (9)
Text
σ=4
50
G (9, 3)
Satellite
σ = 3.5
45
M (15, 30)
Fingerprint
σ=3
40
Table 6.8 Experimental results under Gaussian noise Image
Method
PSNR (dB)
SSIM
Text 13.69 dB 0.4645
PPDS-TV
21.96
0.7541
364
2.37
Chan-Tao-Yuan
21.64
0.7524
206
1.41
BCMI
22.03
0.7443
1562
9.82
APEBCADMM-TV
21.65
0.7534
313
2.96
APEADMM-TV
20.20
0.7219
251
2.19
PPDS-TV
27.40
0.7474
298
1.98
Chan-Tao-Yuan
27.35
0.7471
211
1.47
BCMI
27.35
0.7397
458
2.85
APEBCADMM-TV
27.38
0.7470
293
2.78
APEADMM-TV
27.01
0.7353
296
2.61
PPDS-TV
23.36
0.8642
179
1.15
Chan-Tao-Yuan
23.36
0.8658
175
1.23
BCMI
21.21
0.8527
1157
8.36
APEBCADMM-TV
23.32
0.8635
125
1.19
APEADMM-TV
22.78
0.8208
110
0.98
Satellite 23.84 dB 0.6343
Fingerprint 14.74 dB 0.3355
Steps
Time (s)
the percentage of pixels taking boundary values, the more pronounced this effect becomes. The reason why BCMI fails to achieve good restoration results when interval constraints are used is that its convergence rate is significantly slower than other algorithms, especially under impulse noise conditions. Second, in general,
186
6 Parallel Primal-dual Method with Application to Image Restoration
Table 6.9 Experimental results under impulse noise Image
Method
PSNR (dB)
SSIM
Steps
Text 7.27 dB 0.0014
PPDS-TV
24.74
0.9842
589
Satellite 7.43 dB 0.0109
Fingerprint 8.23 dB 0.0390
Time (s) 7.50
Chan-Tao-Yuan
24.05
0.9812
872
7.02
BCMI
19.40
0.9277
2000
15.37
APEBCADMM-TV
24.48
0.9840
674
9.17
APEADMM-TV
19.75
0.9233
280
3.78
PPDS-TV
30.07
0.9649
565
6.89
Chan-Tao-Yuan
29.45
0.9625
871
7.12
BCMI
28.41
0.9471
2000
15.36
APEBCADMM-TV
29.79
0.9632
293
8.04
APEADMM-TV
29.36
0.9545
414
5.19
PPDS-TV
20.67
0.8605
271
3.79
Chan-Tao-Yuan
20.97
0.8626
319
2.82
BCMI
20.35
0.8446
2000
17.29
APEBCADMM-TV
20.66
0.894
423
6.32
APEADMM-TV
20.29
0.8377
256
3.69
Degraded image, 23.84dB, 0.6343
BCMI, 27.35dB, 0.7397
PPDS-TV, 27.40dB, 0.7474
Chan-Tao-Yuan, 27.35dB, 0.7471
APEBCADMM-TV, 27.38dB, 0.7470 APEADMM-TV, 27.01dB, 0.7353
Fig. 6.14 Restored Satellite images of different algorithms under Gaussian noise
References
187
Degraded, 7.27dB, 0.0014
BCMI, 19.40dB, 0.9277
PPDS-TV, 24.74dB, 0.9842
Chan-Tao-Yuan, 24.05dB, 0.9812
APEBCADMM-TV, 24.48dB, 0.9840 APEADMM-TV, 19.75dB, 0.9233
Fig. 6.15 Restored Text images of different algorithms under impulse noise
interval constraints may reduce the convergence rate of the algorithm, from comparison of the number of iteration steps and time consumed by APEBCADMM-TV and APEADMM-TV. Third, PPDS-TV can match other good algorithms in terms of PSNR, SSIM, and convergence rate. According to the previous discussion, PPDS does not require matrix inversion when solving image inverse problems, thus its generalization is easier.
References 1. Condat L (2013) A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J Optim Theo Appl 158(2):460–479 2. Condat L (2014) A generic proximal algorithm for convex optimization–application to total variation minimization. IEEE Sig Proc Lett 21(8):985–989 3. He C, Hu C, Li X et al (2016) A parallel primal-dual splitting method for image restoration. Inf Sci 358–359:73–91 4. Bauschke HH, Combettes PL (2011) Convex analysis and Monotone operator theory in Hilbert spaces. Springer, New York 5. Polyak BT (1987) Introduction to optimization. Optimization Software, New York 6. Deng W, Lai M-J, Peng Z, Yin W (2013) Parallel multi-block ADMM with o(1/k) convergence. UCLA CAM Report 13-64, UCLA, Los Angeles 7. Chan T, Shen J (2005) Image processing and analysis: variational, PDE, wavelet, and stochastic methods. SIAM, Philadelphia
188
6 Parallel Primal-dual Method with Application to Image Restoration
8. Rajwade A, Rangarajan A, Banerjee A (2013) Image denoising using the higher order singular value decomposition. IEEE Trans Patten Anal Mach Intell 35(4):849–862 9. Guo W, Yin W (2012) Edge guided reconstruction for compressive imaging. SIAM J Imag Sci 5(3):809–834 10. Chan RH, Tao M, Yuan X (2013) Constrained total variational deblurring models and fast algorithms based on alternating direction method of multipliers. SIAM J Imag Sci 6(1):680–697 11. Chan RH, Ma J (2012) A multiplicative iterative algorithm for box-constrained penalized likelihood image restoration. IEEE Trans Image Proc 21(7):3168–3181 12. He C, Hu C, Zhang W, Shi B (2014) Box-constrained total-variation image restoration with automatic parameter estimation. Acta Automatica Sinica 40(8):1804–1811
Appendix A
Table of key variable symbols R R+ N ∇ ∇1 ∇2 div ε ∂ H < ·, · > 2X Γ 0 (D) inf, min sup, max f * prox f ιΩ σΩ L* ||L|| ranM zerM graM FixM
Set of real numbers Set of positive real numbers Set of non-negative integers Gradient operator Differential operator in horizontal direction Differential operator in vertical direction Divergence operator Symmetric difference operator The subgradient operator or partial derivative operator Hilbert space Inner product in Hilbert space Power set of the set X Set of proper convex functions mapped from D to (-∞, + ∞] Taking the greatest lower bound or take the minimum value Taking the least upper bound or take the maximum value Fenchel conjugate of convex function f Proximal point operator of convex function f Indicator function of convex set Ω Support function of convex set Ω Hilbert adjoint operator of linear operator L Norm of the linear operator L Value range of operator M Null range of operator M Graph of operator M Set of fixed-points of non-expansive operator M.
© Chemical Industry Press 2023 C. He and C. Hu, Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems, Advanced and Intelligent Manufacturing in China, https://doi.org/10.1007/978-981-99-3750-9
189
Appendix B
Description of Key Abbreviations Abbreviations in English English Name ADM ADMM ALM APE-ADMM APE-SBA BCMI BGV BSNR BV CS DCT DRS Edge-CS EDF FISTA FBS FFST FFT FPR FTVD GCV HDTV ISNR LADMM
alternating direction method alternating direction method of multipliers augmented lagrangian method adaptive parameter estimation for ADMM adaptive parameter estimation for SBA box-constrained multiplicative iterative algorithm bounded generalized variation blurred signal-to-noise ratio bounded variation compressed sensing discrete cosine transform Douglas-Rachford splitting edge guided compressed sensing reconstruction method equivalent degrees of freedom fast iterative shrinkage/thresholding algorithm forward-backward splitting fast finite shearlet transform fast Fourier transform fixd-point residual fast total variation deconvolution algorithm generalized cross-validation higher degree total variation improved-sigal-to-noise ratio linearized alternating direction method of multipliers
© Chemical Industry Press 2023 C. He and C. Hu, Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems, Advanced and Intelligent Manufacturing in China, https://doi.org/10.1007/978-981-99-3750-9
191
192
MRF MRI MSE PDE PDHG PDS PLADMM PPA PPDS PRS PSF PSNR RPCA SBA SSIM TGV TV UPRE VI VS
Appendix B
Markov random field magnetic resonance imaging mean square error partial differential equation primal-dual hybrid gradient primal-dual splitting parallel LADMM proximal point algorithm parallel primal–dual splitting Peaceman-Rachford splitting point spread function peak-sigal-to-noise ratio robust principle component analysis splitting Bregman algorithm structured similarity index measurement total generalized variation total variation unbiased predictive risk estimator variation inequality variable splitting
Uncited References
1. Candès EJ (1998) Ridgelets: Theory and Applications [D]. Standford University, Department of Statistics 2. Candès E J. Curvelets [R]. Tech. Report, Department of Statistics, Standford Univer-sity, 1999. 3. Meyer FG, Coifman RR (1997) Brushlets: a tool for directional image analysis and image compression [J]. Appl Comput Harmon Anal 5:147–187 4. Donoho D L, Huo X M. Beamlets and Multiscale Image Analysis s [R]. Tech. Report, Standford University, 2001. 5. Donoho D L. Wedgelets: Nearly Minimax Estimation of Edges [R]. Tech. Report, Standford University, 1997. 6. Welland G (2003) Beyond Wavelets [M]. Academic Press, Waltham 7. Pennec E L, Mallat S. Non linear image approximation with bandelets [R]. Tech. Report, CMAP Ecole Polytechnique, 2003. 8. Eckstein J, Yao W. Augmented Lagrange and alternating direction methods for con-vex optimization: a tutorial and some illustrative computational results [R]. RUTCOR Research Report RRR 32–2012, 2012. 9. Fang Y, Zeng K, Wang Z, Lin W, Fang Z, Lin C (2014) Objective quality assessment for image retargeting based on structural similarity [J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 4(1):95–105
© Chemical Industry Press 2023 C. He and C. Hu, Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems, Advanced and Intelligent Manufacturing in China, https://doi.org/10.1007/978-981-99-3750-9
193
Index
A Adaptive estimation, 26, 73, 74, 82, 91, 108, 126 Algorithm/method Adaptive parameter estimation for ADMM (APE-ADMM), 74 Adaptive parameter estimation for box-constrained ADMM (APE-BCADMM), 92, 183 Adaptive parameter estimation for splitting Bregman method (APE-SBA), 91 Alternating Direction Method of Multipliers (ADMM), 20 Augmented Lagrangian Method (ALM), 20 Box-Constrained Multiplicative Iterative algorithm (BCMI), 183 Bregman iterative method, 18 Cai-Osher-Shen algorithm, 126 Chambolle’s denoising algorithm, 75 Chan-Tao-Yuan algorithm, 126 Condat’s primal-dual splitting algorithm, 141, 170 C-SALSA algorithm, 93, 135, 163 Douglas-Rachford splitting method, 22 Edge-guided compressed sensing reconstruction algorithm (edge-CS), 135 EM algorithm, 13 Fast iterative shrinkage/thresholding algorithm (FISTA), 22 Fast total variation deconvolution algorithm (FTVD), 93 Forward-backward splitting method (FSB), 21
Gradient projection algorithm, 22 Interior-point method, 15 Krasnosel’skiˇı-Mann algorithm, 47 Lagged diffusivity fixed point method, 14 Linearized Alternating Direction Method of Multipliers (LADMM), 20 Linearized Bregman iterative method, 18 Linearized splitting Bregman method, 19, 20 Nesterov acceleration method, 26 Ng-Weiss-Yuan algorithm, 93 Orthogonal projection method, 15 Parallel Alternating Direction Method of Multipliers (PADMM), 108, 114, 121, 141, 153 Parallel Linearized Alternating Direction Method of Multipliers (PLADMM), 142, 154, 170 Parallel Primal-Dual Splitting algorithm (PPDS), 142, 147, 153, 154 Peaceman-Rachford splitting method, 22 Precondition method, 15 Primal-Dual Hybrid Gradient method (PDHG), 24, 75 Primal-Dual Splitting method (PDS), 23 Second-order cone method, 15 Semi-implicit additive iterative algorithm, 9 Semi-implicit gradient descent method, 15 Splitting Bregman method, 19, 91 Time-marching algorithm, 14
© Chemical Industry Press 2023 C. He and C. Hu, Parallel Operator Splitting Algorithms with Application to Imaging Inverse Problems, Advanced and Intelligent Manufacturing in China, https://doi.org/10.1007/978-981-99-3750-9
195
196 Unbiased Predictive Risk Estimator (UPRE) method, 27 Undetermined coefficient method, 146 Variational Bayesian method, 27 Wen-Chan algorithm, 93, 126, 162 Analysis, 9 multi-resolution analysis, 68 multiscale geometric analysis, 10 robust principal component analysis, 11 Anisotropic, 9 B Basis pursuit, 19 Bayesian principle, 12 Blur, 52 Gaussian blur, 5, 52, 59 interchannel blur, 133, 161 motion blur, 5, 52, 59, 94 out-of-focus (Average) blur, 5, 52, 53, 130 Blur kernel, 4, 13, 52, 54, 59, 98 Boundary condition, 59 circulant boundary condition, 59 Neumann boundary condition, 75 Bounded Variation (BV), 7 C Closed-form solution, 13, 16, 61, 75, 80, 82, 100, 115, 160 Compressed sensing, 6, 8, 135, 177 Computer vision, 2 Continuous, 7 Lipschitz continuous, 21, 26 lower semicontinuous, 16, 78, 109, 142 Convergence, 15, 24, 46, 74, 80, 108, 142, 147 strong convergence, 46, 47 weak convergence, 46, 47 Convergence rate, 25, 110, 115, 118, 142, 148 Convex, 10 closed convex set, 47, 148 convex analysis, 16 convex envelope, 11 convex relaxation, 10–12 convex set, 17, 46, 49, 92 Convolution, 4, 35 blind deconvolution, 5 circular convolution, 42, 55 conventional deconvolution, 5 deconvolution, 5, 7, 15, 51, 61, 133, 162 finite discrete convolution, 36
Index full convolution, 37 infimal convolution, 16 one-dimensional discrete convolution, 35 partial convolution, 37 two-dimensional discrete convolution, 38 CT images, 12
D Deblurring, 2, 18, 51, 54, 89, 124, 161 Decomposition, 11 low-rank decomposition, 11 Moreau decomposition, 108, 114, 141, 154 Δ-sampling sequence, 41 Denoising, 2, 11, 15, 101 Distance, 17 Bregman distance, 17, 79, 102 Euclidean distance, 24 Domain, 39 feasible domain, 27, 75 null domain, 56, 148 support domain, 39 Dual gap, 24, 25
E Eigenvector, 54 Euler-Lagrange equation, 14 Extended blur kernel, 56
F Fejér monotone, 47, 83, 86 Fenchel conjugate, 16 Fenchel dual, 15, 24 Fidelity term, 6, 14, 27, 73, 94 Filter, 57 blur filter, 59 FIR filter, 56 low-pass Filter, 58, 67 minimum mean square error filter, 62 Filtering, 3 constrained least square filtering, 7, 51, 61, 62 inverse filtering, 5, 12, 13, 51, 59, 75, 107 least squares inverse filtering, 12 linear filtering, 3 low-pass Filtering, 54 Wiener filtering, 7, 51, 62, 107 Fourier series, 42
Index First-order information, 16, 109 Fixed non-expansive, 25, 46, 47, 150 Fixed-point residual (FPR), 25 Function, 2 Augmented Lagrangian function (AL), 20, 77, 92, 110 blur function, 4, 26 convex differentiable function, 154 convex function, 11, 16, 111 discrete spectral function, 42 indicator function, 18, 143 Lagrangian function, 109 objective function, 2 Piecewise linear decreasing function, 50 Point Spread Function (PSF), 4, 40, 52 proper convex function, 48, 79, 114 support function, 17 Functional, 45 linear bounded functional, 45 linear continuous functional, 45 linear functional, 45 Fundamental sequence, 44
G Generalized Cross-Validation (GCV), 27 Global optimal solution, 14 Go Decomposition (GoDec), 11
H Hermit transpose, 54 Higher Degree Total Variation (HDTV), 9 Higher order infinitesimal, 151
I Image hyperspectral image, 13 image blurring, 4, 52, 121, 160 image big data, 15 image compression, 1 image degradation, 1, 2, 4, 6, 121 image modeling, 10, 12 image restoration, 2, 3, 51, 73, 96, 128 RGB image, 125, 128, 133, 161, 170 Impulse response, 35 Inner product, 16, 44, 76, 109, 143, 148 Inpainting, 2, 108, 161 Interval constraint, 74, 91, 92, 159, 160 Inverse problem, 2, 11, 23, 61, 68, 88, 107, 141 Isotropic, 8
197 K Karush-Kuhn-Tucker (KKT) theorem, 143 K-cocoercive, 157 L Lagrangian multipliers, 20, 77 Law Fermat’s Law, 17, 77 L’Hospital’s Law, 59 L-curve, 27 Learning dictionary, 11 Least squares, 5, 7, 51, 63, 75 Let bandelet, 10 beamlet, 10 brushlet, 10 contourlet, 10 curvelet, 10 ridgelet, 10 shearlet, 10, 51, 64, 68, 108, 142 wavelet frame, 6, 10 wedgelet, 10 Linear constraint, 21, 89, 110 M Machine learning, 11 Mapping, 16 linear mapping, 116 Proximity mapping, 16 set-valued mapping, 16 single-valued mapping, 18 Matrix, 6 block circulant matrix, 56, 69 circulant blur matrix, 56 circulant matrix, 56, 63 Convolution (kernel) matrix, 39, 58 degenerate matrix, 15 diagonal selection matrix, 122 discrete Fourier transform matrix, 56 identity matrix, 15 inverse matrix, 15, 56 non-circulant blur matrix, 56 scale/expansion matrix, 68 shear matrix, 68 symmetric matrix, 65 Maximum likelihood, 12 Maximum posterior, 12 Mean square error (MSE), 6 Median criterion, 88 Morbid condition number, 57 Morozov’s discrepancy principle, 26, 73, 74, 76, 124, 133
198 MRI reconstruction, 6, 122, 125, 161 N Noise, 2 Gamma noise, 5 Gaussian noise, 5, 63 Impulse (pretzel) noise, 5 Poisson noise, 5 Non-expansive, 18 Nonlinear, 13, 14 Non-negative definite, 145, 149 Non-smooth, 12 Norm, 69 BV norm, 8 0/l 0 norm, 10 l 1 norm, 8 l 2 norm, 69 nuclear norm 11 minimal norm, 55 TV norm, 8 O Operator firmly nonexpansive operator, 157 nonexpansive operator, 148, 157 Operator, 9 α-averaged operator, 47 1/2-average operator, 23 adjoint operator, 24, 76, 109, 143, 145 compact operator, 51, 54 convolution operator, 5, 54 diagonalization operator, 69 difference operator, 62, 65, 66, 121 divergence operator, 15, 65, 76 gradient operator, 77 firmly nonexpansive operator, 48 Laplace operator, 63 linear (bounded) operator, 109, 143, 145 maximal monotone operator, 17, 24, 48, 142, 145, 149, 157 monotone operator, 48 nonexpansive operator, 24, 48, 142 orthogonal projection operator, 148 positive definite operator, 148 projection operator, 18, 148 proximity operator, 16, 17, 21, 22, 109, 115, 143, 150 pseudo-inverse operator, 55 range-limited operator, 38 reflection/reflected resolvent operator, 18, 23, 49 resolvent operator, 17, 48, 145
Index self-adjoint compact operator, 54 subdifferential operator, 17 symmetric difference operator, 65 Operator splitting, 12, 16, 77, 109, 141 Optimization, 16 constraint optimization, 6, 16, 83, 110 unconstrained optimization, 6, 17, 109 Optimization problem, 11 constrained optimization problem, 6, 17, 27, 89, 109 NP-hard optimization problem, 11 unconstrained optimization problem, 6, 17, 109 Overcomplete base, 11 Overcomplete dictionary, 11 Over smoothing, 9, 13, 26, 63, 88, 100 P Parameter, 6 hyperparameter, 27 penalty parameter, 18 regularization parameter, 6, 26, 66, 68, 80, 109 Parameter estimation, 13, 27, 74, 76, 82, 91 Partial derivative, 8 first-order partial derivative, 8 Partial Differential Equation (PDE), 6, 9 P-M diffusion model, 9 Point, 17 fixed-point, 18, 22, 145, 148 Frequency domain sampling point, 52 null point, 56, 145, 149 Poisson distribution, 12 Power spectrum, 62 Prior information, 2 Prior knowledge, 6 Prior model, 3, 6 Probability statistics, 12 R Random field, 6, 12, 35 Gauss-Markov random field, 13 Markov random field, 13 Regularization, 2 balanced regularization, 9 compound regularization, 13, 108, 128 l 1 -regularization, 14 nonlinear regularization, 3, 13, 51, 73 semi-quadratic regularization, 7 shearlet regularization, 51, 68, 108, 159 Tikhonov regularization, 2, 13, 51, 61, 63, 107
Index Total Generalized Variation (TGV) regularization, 8, 64 Total Variation (TV) regularization, 7, 73, 107 Regularization term, 6, 14, 26, 73 S Saddle point, 20, 75, 78, 108, 141, 143, 145 Saddle point condition, 74, 83, 109, 110, 142 Sampling period, 41 Schmidt orthogonalization, 54 Semi-positive, 119, 158 Shrinkage, 22, 93 Signal-to-noise ratio, 62 Blurred Signal-to-Noise Ratio (BSNR), 88, 125 Space, 7 banach space, 44 Bounded Generalized Variational function space (BGV), 64 conjugate space, 46 Euclidean space, 16, 76 Hilbert space, 24, 44, 54, 109, 142 improved signal-to-noise ratio (ISNR), 70 inner product space, 44 peak signal-to-noise ratio (PSNR), 59, 70, 93, 124, 161 linear normed space, 44 Sobolev space, 7, 61 Sparse representation, 11, 69 Sparsity, 6, 10, 11, 63, 121 Staircasing effects, 8, 66, 121, 133, 163 Statistical characteristics, 12 Strong convexity, 26 Structural similarity (SSIM), 70, 124, 161 Student’s -t distribution, 13 Subdifferential, 16, 77, 110 Sub-gradient, 17
199 Super resolution, 2 System, 36 causal system, 36 non-causal system, 36
T Taylor series, 141, 153 Tensor, 11 Tight frame, 10 Transformation, 5 Discrete fourier transform, 40 Discrete Fourier inverse transform, 41 Discrete time Fourier inverse transform, 41 Discrete time Fourier transform, 41 Fast Finite Shearlet Transform (FFST), 69 Fast Fourier Transform, 55 Fourier inverse transform, 40 Fourier transform, 5 shearlet transform, 64, 108, 160, 162 wavelet transform, 10, 64, 88
V Value eigenvalue, 51 singular value, 12 value range, 17, 98 Variable, 15 auxiliary variable, 19, 75, 76, 82, 91, 110, 113, 114 dual variable, 15, 110, 114 original variable, 15, 114, 141 Variation total generalized variation (TGV), 51, 64, 108, 121, 159 total variation (TV), 7, 64, 76, 93 variational inequality, 25, 76, 115