303 17 8MB
English Pages 507 Year 2020
Arcangelo Distante Cosimo Distante
Handbook of Image Processing and Computer Vision Volume 1 From Energy to Image
Handbook of Image Processing and Computer Vision
Arcangelo Distante • Cosimo Distante
Handbook of Image Processing and Computer Vision Volume 1: From Energy to Image
123
Arcangelo Distante Institute of Applied Sciences and Intelligent Systems Consiglio Nazionale delle Ricerche Lecce, Italy
Cosimo Distante Institute of Applied Sciences and Intelligent Systems Consiglio Nazionale delle Ricerche Lecce, Italy
ISBN 978-3-030-38147-9 ISBN 978-3-030-38148-6 https://doi.org/10.1007/978-3-030-38148-6
(eBook)
© Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my parents and my family, Maria and Maria Grazia—AD To my parents, to my wife Giovanna and to my children Francesca and Davide—CD
Preface
In the past 20 years, several interdisciplinary research in the fields of physics, information technology and cybernetics, the numerical processing of Signals and Images, electrical and electronic technologies have led to the development of Intelligent Systems. The so-called Intelligent Systems (or Intelligent Agents) represent the still more advanced and innovative frontier of research in the electronic and computer field, which able to directly influence the quality of life, competitiveness, and production methods of companies, to monitor and evaluate the environmental impact, to make public service and management activities more efficient, and to protect people’s safety. The study of an intelligent system, regardless of the area of use, can be simplified into three essential components: 1. The first interacts with the environment for the acquisition of data of the domain of interest, using appropriate sensors (for the acquisition of Signals and Images); 2. The second analyzes and interprets the data collected by the first component, also using learning techniques to build/update adequate representations of the complex reality in which the system operates (Computational Vision); 3. The third chooses the most appropriate actions to achieve the objectives assigned to the intelligent system (choice of Optimal Decision Models) interacting with the first two components, and with human operators, in case of application solutions based on man–machine cooperative paradigms (the current evolution of automation including industrial one). In this scenario of knowledge advancement for the development of Intelligent Systems, the information content of this manuscript is framed in which are reported the experiences of multiyear research and teaching of the authors, and of the scientific insights existing in the literature. In particular, the manuscript divided into three parts (volumes), deals with aspects of the sensory subsystem in order to perceive the environment in which an intelligent system is immersed and able to act autonomously. The first volume describes the set of fundamental processes of artificial vision that lead to the formation of the digital image from energy. The phenomena of light propagation (Chap. 1 and 2), the theory of color perception (Chap. 3), the impact
vii
viii
Preface
of the optical system (Chap. 4), the aspects of transduction from luminous energy are analyzed (the optical flow) with an electrical signal (of the photoreceptors), and aspects of electrical signal transduction (with continuous values) in discrete values (pixels), i.e., the conversion of the signal from analog to digital (Chap. 5). These first 5 chapters summarize the process of acquisition of the 3D scene, in symbolic form, represented numerically by the pixels of the digital image (2D projection of the 3D scene). Chapter 6 describes the geometric, topological, quality, and perceptual information of the digital image. The metrics are defined, the aggregation and correlation modalities between pixels, useful for defining symbolic structures of the scene of higher level with respect to the pixel. The organization of the data for the different processing levels is described in Chap. 7 while in Chap. 8 the representation and description of the homogeneous structures of the scene is shown. With Chap. 9 starts the description of the image processing algorithms, for the improvement of the visual qualities of the image, based on point, local, and global operators. Algorithms operating in the spatial domain and in the frequency domain are shown, highlighting with examples the significant differences between the various algorithms also from the point of view of the computational load. The second volume begins with the chapter describing the boundary extraction algorithms based on local operators in the spatial domain and on filtering techniques in the frequency domain. In Chap. 2 are presented the fundamental linear transformations that have immediate application in the field of image processing, in particular, to extract the essential characteristics contained in the images. These characteristics, which effectively summarize the global informational character of the image, are then used for the other image processing processes: classification, compression, description, etc. Linear transforms are also used, as global operators, to improve the visual qualities of the image (enhancement), to attenuate noise (restoration), or to reduce the dimensionality of the data (data reduction). In Chap. 3 the geometric transformations of the images are described, necessary in different applications of the artificial vision, both to correct any geometric distortions introduced during the acquisition (for example, images acquired while the objects or the sensors are moving, as in the case of satellite and/or aerial acquisitions), or to introduce desired visual geometric effects. In both cases, the geometrical operator must be able to reproduce as accurately as possible the image with the same initial information content through the image resampling process. In Chap. 4, Reconstruction of the degraded image (image restoration), a set of techniques are described that perform quantitative corrections on the image to compensate for the degradations introduced during the acquisition and transmission process. These degradations are represented by the fog or blurring effect caused by the optical system and by the motion of the object or the observer, by the noise caused by the optoelectronic system and by the nonlinear response of the sensors, by random noise due to atmospheric turbulence or, more generally, from the process of digitization and transmission. While the enhancement techniques tend to reduce the degradations present in the image in qualitative terms, improving their
Preface
ix
visual quality even when there is no knowledge of the degradation model, the restoration techniques are used instead to eliminate or quantitatively attenuate the degradations present in the image, starting also from the hypothesis of knowledge of degradation models. Chapter 5, Image Segmentation, describes different segmentation algorithms, which is the process of dividing the image into homogeneous regions, where all the pixels that correspond to an object in the scene are grouped together. The grouping of pixels in regions is based on a homogeneity criterion that distinguishes them from one another. Segmentation algorithms based on the criteria of similarity of pixel attributes (color, texture, etc.) or based on geometric criteria of spatial proximity of pixels (Euclidean distance, etc.) are reported. These criteria are not always valid, and in different applications it is necessary to integrate other information in relation to the a priori knowledge of the application context (application domain). In this last case, the grouping of the pixels is based on comparing the hypothesized regions with the a priori modeled regions. Chapter 6, Detectors and descriptors of points of interest, describes the most used algorithms to automatically detect significant structures (known as points of interest, corners, features) present in the image corresponding to stable physical parts of the scene. The ability of such algorithms is to detect and identify physical parts of the same scene in a repeatable way, even when the images are acquired under conditions of lighting variability and change of the observation point with a possible change of the scale factor. The third volume describes the artificial vision algorithms that detect objects in the scene, attempts their identification, 3D reconstruction, their arrangement, and location with respect to the observer, and their eventual movement. Chapter 1, Object recognition, describes the fundamental algorithms of artificial vision to automatically recognize the objects of the scene, essential characteristics of all systems of vision of living organisms. While a human observer also recognizes complex objects, apparently in an easy and timely manner, for a vision machine the recognition process is difficult, requires considerable calculation time and the results are not always optimal. Fundamental to the process of object recognition, become the algorithms for selecting and extracting features. In various applications, it is possible to have an a priori knowledge of all the objects to be classified because we know the patterns’ (meaningful features) samples from which we can extract useful information for the decision to associate (decision making) each individual of the population to a certain class. These sample patterns (training set) are used by the recognition system to learn significant information about the object’s population (extraction of statistical parameters, relevant characteristics, etc.). The recognition process compares the features of the unknown objects to the model pattern features, in order to uniquely identify their class of membership. Over the years there have been various disciplinary sectors (machine learning, image analysis, object recognition, information research, bioinformatics, biomedicine, intelligent data analysis, data mining, …) and the application sectors (robotics, remote sensing, artificial vision, …) for which different researchers have proposed different methods of recognition and developed different algorithms based on different classification
x
Preface
models. Although the proposed algorithms have a unique purpose, they differ in the property attributed to the classes of objects (the clusters) and the model with which these classes are defined (connectivity, statistical distribution, density, …). The diversity of disciplines, especially between automatic data extraction (data mining) and machine learning (machine learning), has led to subtle differences, especially in the use of results and in terminology, sometimes contradictory, perhaps caused by the different objectives. For example, in data mining the dominant interest is the automatic extraction of groupings, in the automatic classification is fundamental the discriminating power of the classes of belonging of the patterns. The topics of this chapter overlap between aspects related to machine learning and those of recognition based on statistical methods. For simplicity, the algorithms described are broken down according to the methods of classifying objects in supervised methods (based on deterministic, statistical, neural, and nonmetric models such as syntactic models and decision trees) and non-supervised methods, i.e., methods that do not use any prior knowledge to extract the classes to which the patterns belong. In Chap. 2 four different types of neural networks, RBF, SOM, Hopfield, and deep neural networks, are described: Radial Basis Functions—RBF, SelfOrganizing Maps—SOM, the Hopfield and the deep neural networks. RBF uses a different approach in the design of a neural network based on the hidden layer (unique in the network) composed of neurons in which radial-based functions are defined, hence the name Radial Basis Functions, and which performs a nonlinear transformation of the input data supplied to the network. These neurons are the basis for input data (vectors). The reason why a non-linear transformation is used in the hidden layer, followed by a linear one in the output one, allows a pattern classification problem to operate in a much larger space (in non-linear transformation from the input in the hidden one) and is more likely to be linearly separable than a small-sized space. From this observation derives the reason why the hidden layer is generally larger than the input one (i.e., the number of hidden neurons is greater than the cardinality of the input signal). The SOM network, on the other hand, has an unsupervised learning model and has the originality of autonomously grouping input data on the basis of their similarity without evaluating the convergence error with external information on the data. Useful when there is no exact knowledge on the data to classify them. It is inspired by the topology of the brain cortex model considering the connectivity of the neurons and in particular the behavior of an activated neuron and the influence with neighboring neurons that reinforce the connections compared to those farther away that are becoming weaker. With the Hopfield network, the learning model is supervised and with the ability to store information and retrieve it through even partial content of the original information. It presents its originality based on physical foundations that have revitalized the entire field of neural networks. The network is associated with an energy function to be minimized during its evolution with a succession of states, until reaching a final state corresponding to the minimum of the energy function. This feature allows it to be used to solve and set up an optimization problem in terms of the objective function to be associated with an energy function. The
Preface
xi
chapter concludes with the description of the convolutional neural networks (CNN), by now the most widespread since 2012, based on the deep learning architecture (deep learning). In Chap. 3, Texture Analysis, the algorithms that characterize the texture present in the images are shown. Texture is an important component for the recognition of objects. In the field of image processing has been consolidated with the term texture, any geometric and repetitive arrangement of the levels of gray (or color) of an image. In this context, texture becomes an additional strategic component to solve the problem of object recognition, the segmentation of images, and the problems of synthesis. Some of the algorithms described are based on the mechanisms of human visual perception of texture. They are useful for the development of systems for the automatic analysis of the information content of an image obtaining a partitioning of the image in regions with different textures. In Chap. 4, 3D Vision Paradigms, the algorithms that analyze 2D images to reconstruct a scene typically of 3D objects are reported . A 3D vision system has the fundamental problem typical of inverse problems, i.e., from single 2D images, which are only a 2D projection of the 3D world (partial acquisition), must be able to reconstruct the 3D structure of the observed scene and eventually define a relationship between the objects. 3D reconstruction takes place starting from 2D images that contain only partial information of the 3D world (loss of information from the projection 3D!2D) and possibly using the geometric and radiometric calibration parameters of the acquisition system. The mechanisms of human vision are illustrated, based also on the a priori prediction and knowledge of the world. In the field of artificial vision, the current trend is to develop 3D systems oriented to specific domains but with characteristics that go in the direction of imitating certain functions of the human visual system. 3D reconstruction methods are described that use multiple cameras observing the scene from multiple points of view, or sequences of time-varying images acquired from a single camera. Theories of vision are described, from the Gestalt laws to the paradigm of Marr's vision and the computational models of stereovision. In Chap. 5 Shape from Shading—(SfS) the algorithms to reconstruct the shape of the visible 3D surface using only the brightness variation information (shading, that is, the level variations of gray or colored) present in the image are reported. The inverse problem of reconstructing the shape of the surface visible from the changes in brightness in the image is known as the Shape from Shading problem. The reconstruction of the visible surface should not be strictly understood as a 3D reconstruction of the surface. In fact, from a single point of the observation of the scene, a monocular vision system cannot estimate a distance measure between the observer and visible object, so with the SfS algorithms there is a nonmetric but qualitative reconstruction of the 3D surface. It is described the theory of the SfS based on the knowledge of the light source (direction and distribution), the model of reflectance of the scene, the observation point, and the geometry of the visible surface, which together contribute to the image formation process. The relationships between the light intensity values of the image and the geometry of the visible surface are derived (in terms of the orientation of the surface, point by point) under
xii
Preface
some lighting conditions and the reflectance model. Other 3D surface reconstruction algorithms based on the Shape from xxx paradigm are also described, where xxx can be texture, structured light projected onto the surface to be reconstructed, or 2D images of the focused or defocused surface. In Chap. 6, Motion Analysis, the algorithms of perception of the dynamics of the scene are reported, analogous to what happens, in the vision systems of different living beings. With motion analysis algorithms it is possible to derive the 3D motion, almost in real time, from the analysis of sequences of time-varying 2D images. Paradigms on movement analysis have shown that the perception of movement derives from the information of the objects evaluating the presence of occlusions, texture, contours, etc. The algorithms for the perception of the movement occurring in the physical reality and not the apparent movement are described. Different methods of movement analysis are analyzed from those with a limited computational load such as those based on time-variant image difference to the more complex ones based on optical flow considering application contexts with different levels of motion entities and scene-environment with different complexities. In the context of rigid bodies, from the motion analysis, derived from a sequence of time-variant images, are described the algorithms that, in addition to the movement (translation and rotation), estimate the reconstruction of the 3D structure of the scene and the distance of this structure by the observer. Useful information is obtained in the case of mobile observer (robot or vehicle) to estimate the collision time. In fact, the methods for solving the problem of 3D reconstruction of the scene are acquired by acquiring a sequence of images with a single camera whose intrinsic parameters remain constant even if not known (camera not calibrated) together without the knowledge of motion. The proposed methods are part of the problem of solving an inverse problem. Algorithms are described to reconstruct the 3D structure of the scene (and the motion), i.e., to calculate the coordinates of 3D points of the scene whose 2D projection is known in each image of the time-variant sequence. Finally, in Chap. 7, Camera Calibration and 3D Reconstruction, the algorithms for calibrating the image acquisition system (normally a single camera and stereovision) fundamental for detecting metric information (detecting an object’s size or determining accurate measurements of object–observer distance) of the scene from the image are reported. The various camera calibration methods are described that determine the relative intrinsic parameters (focal length, horizontal and vertical dimension of the single photoreceptor of the sensor, or the aspect ratio, the size of the matrix of the sensor, the coefficients of the radial distortion model, the coordinates of the main point or the optical center) and the extrinsic parameters that define the geometric transformation to pass from the reference system of the world to that of camera. The epipolar geometry introduced in Chap. 5 is described in this chapter to solve the problem of correspondence of homologous points in a stereo vision system with the two cameras calibrated and not. With the epipolar geometry is simplified the search for the homologous points between the stereo images introducing the Essential matrix and the Fundamental matrix. The algorithms for
Preface
xiii
estimating these matrices are also described, known a priori the corresponding points of a calibration platform. With epipolar geometry, the problem of searching for homologous points is reduced to mapping a point of an image on the corresponding epipolar line in the other image. It is possible to simplify the problem of correspondence through a 1D point-to-point search between the stereo images. This is accomplished with the image alignment procedure, known as stereo image rectification. The different algorithms have been described some based on the constraints of the epipolar geometry (non-calibrated cameras where the fundamental matrix includes the intrinsic parameters) and on the knowledge or not of the intrinsic and extrinsic parameters of calibrated cameras. Chapter 7 ends with the section of the 3D reconstruction of the scene in relation to the knowledge available to the stereo acquisition system. The triangulation procedures for the 3D reconstruction of the geometry of the scene without ambiguity are described, given the 2D projections of the homologous points of the stereo images, known as the calibration parameters of the stereo system. If only the intrinsic parameters are known, the 3D geometry of the scene is reconstructed by estimating the extrinsic parameters of the system at less than a non-determinable scale factor. If the calibration parameters of the stereo system are not available but only the correspondences between the stereo images are known, the structure of the scene is recovered through an unknown homography transformation. Francavilla Fontana, Italy February 2020
Arcangelo Distante Cosimo Distante
Acknowledgments
We thank all the fellow researchers of the Department of Physics of Bari, of the Institute of Intelligent Systems for Automation of the CNR (National Research Council) of Bari and of the Institute of Applied Sciences and Intelligent Systems “Eduardo Caianiello” of the Unit of Lecce, which they have indicated errors and parts to be reviewed. We mention them in chronological order: Grazia Cicirelli, Marco Leo, Giorgio Maggi, Rosalia Maglietta, Annalisa Milella, Pierluigi Mazzeo, Paolo Spagnolo, Ettore Stella, and Nicola Veneziani. A thank you is addressed to Arturo Argentieri for the support on the graphic aspects of the figures and the cover. Finally, special thanks are given to Maria Grazia Distante who helped us realize the electronic composition of the volumes by verifying the accuracy of the text and the formulas.
xv
Contents
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
1 1 2 3 4 7 8 9 10 13 14 17 18 20 26 26 30 37 39 42 50 53 56
2 Radiometric Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Light Sources and Radiometric Aspects . . . . . . . . . . . . 2.3 Bidirectional Reflectance Distribution Function—BRDF 2.3.1 Lambertian Model . . . . . . . . . . . . . . . . . . . . 2.3.2 Model of Specular Reflectance . . . . . . . . . . . 2.3.3 Lambertian–Specular Compound Reflectance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Phong Model . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
57 57 58 63 67 69
....... .......
70 71
1 Image Formation Process . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 From Energy to Image . . . . . . . . . . . . . . . . . . . . 1.3 Electromagnetic Energy, Photons and Light . . . . . 1.3.1 Characteristic of Electromagnetic Waves 1.4 The Energy of Electromagnetic Waves . . . . . . . . . 1.5 Sources of Electromagnetic Waves . . . . . . . . . . . 1.6 Light–Matter Interaction . . . . . . . . . . . . . . . . . . . 1.7 Photons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Propagation of Electromagnetic Waves in Matter . 1.9 The Spectrum of Electromagnetic Radiation . . . . . 1.10 The Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10.1 Propagation of Light . . . . . . . . . . . . . . . 1.10.2 Reflection and Refraction . . . . . . . . . . . 1.11 The Physics of Light . . . . . . . . . . . . . . . . . . . . . . 1.12 Energy of an Electromagnetic Wave . . . . . . . . . . 1.13 Reflectance and Transmittance . . . . . . . . . . . . . . . 1.13.1 Angle of Brewster . . . . . . . . . . . . . . . . 1.13.2 Internal Reflection . . . . . . . . . . . . . . . . 1.14 Thermal Radiation . . . . . . . . . . . . . . . . . . . . . . . 1.15 Photometric Magnitudes . . . . . . . . . . . . . . . . . . . 1.16 Functions of Visual Luminosity . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
xvii
xviii
Contents
2.4 Fundamental Equation in the Process of Image Formation . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 The Theory of Color Perception . . . . . . . . . . . . . 3.2 The Human Visual System . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Visual Phenomena: Sensitivity to Contrast . . . . . . . . . . . . . 3.4 Visual Phenomena: Simultaneous Contrast . . . . . . . . . . . . . 3.5 Visual Phenomena: Bands of Mach . . . . . . . . . . . . . . . . . . 3.6 Visual Phenomena: Color Blindness . . . . . . . . . . . . . . . . . 3.7 The Colors of Nature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Constancy of Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Colorimetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.1 Metamerism and Grassmann’s Law . . . . . . . . . . . 3.10 Additive Synthesis Method . . . . . . . . . . . . . . . . . . . . . . . . 3.10.1 Tristimulus Curves of Equal Radiance . . . . . . . . . 3.10.2 Chromaticity Coordinates . . . . . . . . . . . . . . . . . . 3.11 3D Representation of RGB Color . . . . . . . . . . . . . . . . . . . 3.12 XYZ Color Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . 3.13 Chromaticity Diagram—RGB . . . . . . . . . . . . . . . . . . . . . . 3.14 Chromaticity Diagram—XYZ . . . . . . . . . . . . . . . . . . . . . . 3.14.1 Calculation of the Positions of the RGB Primaries in the Chromaticity Diagram Xy . . . . . . . . . . . . . 3.14.2 Analysis of the Transformation from RGB to the XYZ System . . . . . . . . . . . . . . . . . . . . . . 3.15 Geometric Representation of Color . . . . . . . . . . . . . . . . . . 3.16 HSI Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.17 The Color in Image Processing . . . . . . . . . . . . . . . . . . . . . 3.18 RGB to the HSI Space Conversion . . . . . . . . . . . . . . . . . . 3.18.1 RGB ! HSI . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.18.2 HSI ! RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.19 HSV and HSL Color Space . . . . . . . . . . . . . . . . . . . . . . . . 3.20 CIE 1960/64 UCS Color Space . . . . . . . . . . . . . . . . . . . . . 3.21 CIE 1976 L*a*b* Color Space . . . . . . . . . . . . . . . . . . . . . 3.22 CIE 1976 L*u*v* Color Space . . . . . . . . . . . . . . . . . . . . . 3.23 CIELab LCh and CIELuv LCh Color Spaces . . . . . . . . . . . 3.24 YIQ Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.25 Subtractive Synthesis Method . . . . . . . . . . . . . . . . . . . . . . 3.26 Color Reproduction Technologies . . . . . . . . . . . . . . . . . . . 3.27 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
73 76 79 79 81 87 96 97 99 100 101 105 108 112 118 122 125 127 128 131 134
. . . . 137 . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
138 140 144 147 147 148 151 151 153 156 158 159 160 161 167 172 175
Contents
xix
4 Optical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Reflection of Light on Spherical Mirrors . . . . . . . . . . . . . . 4.3 Refraction of Light on Spherical Surfaces . . . . . . . . . . . . . 4.4 Thin Lens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Diagram of the Main Rays for Thin Lenses . . . . . 4.4.2 Optical Magnification: Microscope and Telescope 4.5 Optical Aberrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Parameters of an Optical System . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
177 177 178 183 189 191 197 202 205 209
5 Digitization and Image Display . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The Human Optical System . . . . . . . . . . . . . . . . . . . 5.3 Image Acquisition Systems . . . . . . . . . . . . . . . . . . . 5.4 Representation of the Digital Image . . . . . . . . . . . . . 5.5 Resolution and Spatial Frequency . . . . . . . . . . . . . . 5.6 Geometric Model of Image Formation . . . . . . . . . . . 5.7 Image Formation with a Real Optical System . . . . . . 5.8 Resolution of the Optical System . . . . . . . . . . . . . . . 5.8.1 Contrast Modulation Function—MTF . . . . 5.8.2 Optical Transfer Function (OTF) . . . . . . . . 5.9 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Digital Image Acquisition Systems—DIAS . . . . . . . 5.11.1 Field of View—FoV . . . . . . . . . . . . . . . . . 5.11.2 Focal Length of the Optical System . . . . . . 5.11.3 Spatial Resolution of Optics . . . . . . . . . . . 5.11.4 Spatial Size and Resolution of the Sensor . 5.11.5 Time Resolution of the Sensor . . . . . . . . . 5.11.6 Depth of Field and Focus . . . . . . . . . . . . . 5.11.7 Depth of Field Calculation . . . . . . . . . . . . 5.11.8 Calculation of Hyperfocal . . . . . . . . . . . . . 5.11.9 Depth of Focus . . . . . . . . . . . . . . . . . . . . 5.11.10 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11.11 Video Camera . . . . . . . . . . . . . . . . . . . . . 5.11.12 Infrared Camera . . . . . . . . . . . . . . . . . . . . 5.11.13 Time-of-Flight Camera—ToF . . . . . . . . . . 5.12 Microscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13 Telescopic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14 The MTF Function of an Image Acquisition System . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
211 211 213 216 221 222 225 227 235 239 242 245 254 259 261 263 263 264 265 265 269 270 270 272 275 278 278 287 289 290 292
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xx
Contents
6 Properties of the Digital Image . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Digital Binary Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Pixel Neighborhood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Image Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 City Block Distance . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Chessboard Distance . . . . . . . . . . . . . . . . . . . . . . . 6.4 Distance Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Adjacency and Connectivity . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.1 Connected Component . . . . . . . . . . . . . . . . . . . . . 6.7.2 Foreground Background and Holes . . . . . . . . . . . . 6.7.3 Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.4 Contour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.5 Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Topological Properties of the Image . . . . . . . . . . . . . . . . . . . 6.8.1 Euler Number . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.2 Convex Hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.3 Area, Perimeter and Compactness . . . . . . . . . . . . . 6.9 Property Independent of Pixel Position . . . . . . . . . . . . . . . . . 6.9.1 Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10 Correlation-Dependent Property Between Pixels . . . . . . . . . . 6.10.1 The Image as a Stochastic Process ! Random Field 6.10.2 Correlation Measurement . . . . . . . . . . . . . . . . . . . 6.11 Image Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.1 Image Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.2 Gaussian Noise . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.3 Salt-and-Pepper Noise . . . . . . . . . . . . . . . . . . . . . 6.11.4 Impulsive Noise . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.5 Noise Management . . . . . . . . . . . . . . . . . . . . . . . . 6.12 Perceptual Information of the Image . . . . . . . . . . . . . . . . . . 6.12.1 Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.12.2 Acuteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
293 293 294 295 295 296 297 297 299 300 302 302 302 303 304 304 305 305 306 306 307 308 309 309 311 312 313 313 314 314 314 315 315 315 316
7 Data Organization . . . . . . . . . . . . . . . . . . . . . 7.1 Data in the Different Levels of Processing 7.2 Data Structures . . . . . . . . . . . . . . . . . . . . 7.2.1 Matrix . . . . . . . . . . . . . . . . . . . 7.2.2 Co-Occurrence Matrix . . . . . . .
. . . . .
. . . . .
. . . . .
317 317 318 318 319
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Contents
xxi
7.3 7.4
Contour Encoding (Chain Code) . . . . . . . . . . . . . . . . . . . . . Run-Length Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Run-Length Code for Grayscale and Color Images . 7.5 Topological Organization of Data-Graph . . . . . . . . . . . . . . . 7.5.1 Region Adjacency Graph (RAG) . . . . . . . . . . . . . . 7.5.2 Features of RAG . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Algorithm to Build RAG . . . . . . . . . . . . . . . . . . . 7.5.4 Relational Organization . . . . . . . . . . . . . . . . . . . . 7.6 Hierarchical Structure of Data . . . . . . . . . . . . . . . . . . . . . . . 7.6.1 Pyramids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.2 Quadtree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.3 T-Pyramid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.4 Gaussian and Laplacian Pyramid . . . . . . . . . . . . . . 7.6.5 Octree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.6 Operations on Quadtree and Octree . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
324 324 326 328 328 328 329 329 329 331 332 334 335 337 337 339
8 Representation and Description of Forms . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 External Representation of Objects . . . . . . . . . . . . . . . . . 8.2.1 Chain Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Polygonal Approximation—Perimeter . . . . . . . . 8.2.3 Polygonal Approximation—Splitting . . . . . . . . . 8.2.4 Polygonal Approximation—Merging . . . . . . . . . 8.2.5 Contour Approximation with Curved Segments . 8.2.6 Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.7 Representation by Convex Hull . . . . . . . . . . . . . 8.2.8 Representation by Means of Skeletonization . . . 8.3 Description of the Forms . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Shape Elementary Descriptors . . . . . . . . . . . . . . 8.3.2 Statistical Moments . . . . . . . . . . . . . . . . . . . . . 8.3.3 Moments Based on Orthogonal Basis Functions . 8.3.4 Fourier Descriptors . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
341 341 342 342 345 346 346 347 347 349 350 355 356 362 375 377 384
9 Image Enhancement Techniques . . . . . . . . . . . . . . . . . . . . . 9.1 Introduction to Computational Levels . . . . . . . . . . . . . 9.2 Improvement of Image Quality . . . . . . . . . . . . . . . . . . 9.2.1 Image Histogram . . . . . . . . . . . . . . . . . . . . . 9.2.2 Probability Density Function and Cumulative Distribution Function of Image . . . . . . . . . . . 9.2.3 Contrast Manipulation . . . . . . . . . . . . . . . . . 9.2.4 Gamma Transformation . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
387 387 389 389
. . . .
. . . .
. . . . . . . 389 . . . . . . . 392 . . . . . . . 399
xxii
Contents
9.3
Histogram Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Histogram Equalization . . . . . . . . . . . . . . . . . . . . . 9.3.2 Adaptive Histogram Equalization (AHE) . . . . . . . . 9.3.3 Contrast Limited Adaptive Histogram Equalization (CLAHE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Histogram Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Homogeneous Point Operations . . . . . . . . . . . . . . . . . . . . . . 9.6 Nonhomogeneous Point Operations . . . . . . . . . . . . . . . . . . . 9.6.1 Point Operator to Correct the Radiometric Error . . . 9.6.2 Local Statistical Operator . . . . . . . . . . . . . . . . . . . 9.7 Color Image Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.1 Natural Color Images . . . . . . . . . . . . . . . . . . . . . . 9.7.2 Pseudo-color Images . . . . . . . . . . . . . . . . . . . . . . . 9.7.3 False Color Images . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Improved Quality of Multispectral Images . . . . . . . . . . . . . . 9.9 Towards Local and Global Operators . . . . . . . . . . . . . . . . . . 9.9.1 Numerical Spatial Filtering . . . . . . . . . . . . . . . . . . 9.10 Spatial Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10.1 1D Spatial Convolution . . . . . . . . . . . . . . . . . . . . 9.10.2 2D Spatial Convolution . . . . . . . . . . . . . . . . . . . . 9.11 Filtering in the Frequency Domain . . . . . . . . . . . . . . . . . . . 9.11.1 Discrete Fourier Transform DFT . . . . . . . . . . . . . . 9.11.2 Frequency Response of Linear System . . . . . . . . . . 9.11.3 Convolution Theorem . . . . . . . . . . . . . . . . . . . . . . 9.12 Local Operators: Smoothing . . . . . . . . . . . . . . . . . . . . . . . . 9.12.1 Arithmetic Average . . . . . . . . . . . . . . . . . . . . . . . 9.12.2 Average Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.3 Nonlinear Filters . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.4 Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.5 Minimum and Maximum Filter . . . . . . . . . . . . . . . 9.12.6 Gaussian Smoothing Filter . . . . . . . . . . . . . . . . . . 9.12.7 Binomial Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.8 Computational Analysis of Smoothing Filters . . . . . 9.13 Low Pass Filtering in the Fourier Domain . . . . . . . . . . . . . . 9.13.1 Ideal Low Pass Filter . . . . . . . . . . . . . . . . . . . . . . 9.13.2 Butterworth Low Pass Filter . . . . . . . . . . . . . . . . . 9.13.3 Gaussian Low Pass Filter . . . . . . . . . . . . . . . . . . . 9.13.4 Trapezoidal Low Pass Filter . . . . . . . . . . . . . . . . . 9.13.5 Summary of the Results of the Smoothing Filters . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 401 . . . 401 . . . 407 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
408 410 413 414 414 415 416 416 417 420 420 421 423 430 430 435 444 445 453 454 459 460 460 464 464 465 466 469 471 472 473 475 477 481 481 484
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
1
Image Formation Process
1.1 Introduction Living organisms, through a process of adaptation with the environment, have developed in the years appropriate sensory organs for information of the surrounding world, fundamental for their survival and reproduction. In particular, the sense organs together with the perceptual capacities allow them to determine the structures and events of the surrounding world. Given the diversity of living organisms, there are different sensory mechanisms with which the surrounding world is perceived. In fact, there are different forms of energy used to capture information about the environment. A form of chemical energy is used through air or water. Another may be mechanical in the form of pressure exercised on the body, and another, also used by people, it is represented by electromagnetic energy. While the first two forms of energy (mechanics and chemistry) are used to get information in the immediate vicinity of the environment, the electromagnetic energy allows to obtain information of objects and organisms even distant in the environment. The natural process of evolution has developed a diversity of sensory organs, with different levels of complexity, from the simplest as it may be for an insect, to a complex like the binocular vision system of man, able to detect instant by instant the static and mobile objects of the environment that is observed. Observing the scene in all natural organisms, contrary to what is commonly thought, is not only attributable to the capabilities of a single sensory organ. For example, in humans, the eyes acquire, through localized receptors on the retinas stimulated by electromagnetic energy, two bi-dimensional images of the scene projected on the retinas. The optic nerve transmits the stimuli of receptors to the brain, and the latter, through the neural cells, converts sensory information from from the optic nerve in terms of shape, color, and texture. This information is useful for identifying and locating the observed objects. © Springer Nature Switzerland AG 2020 A. Distante and C. Distante, Handbook of Image Processing and Computer Vision, https://doi.org/10.1007/978-3-030-38148-6_1
1
2
1 Image Formation Process
Note that to get the information of the scene, the eye is used as a transduction device that converts electromagnetic energy in symbolic form (color, shape, geometric structure, . . .), while the brain is used as an decision-making instrument that elaborates this symbolic information to reconstruct and identify the objects in the scene. A system that presents these functions, that is equipped with sensory organs that, without direct contact with the environment, detects and identifies the objects present, it is called vision system or vision machine.
1.2 From Energy to Image In common meaning, the image represents the exterior form of objects perceived directly or indirectly through sight or other sensory systems, or represents an artistic reconstruction of a real scene or produced by fantasy. The informative content of an image varies in relation to the mechanism of interaction between the sensory system and the external physical world that, through a transduction process, various physical magnitudes are acquired that capture the essential characteristics of the observed objects of the world. For example, the image acquired by an artificial sensor such as a camera, or by the natural sensory systems (of a man, a cat, or an insect) that observe the same scene is not identical. It can be deduced that the informative content of the image depends on the complexity of the sensory system and the type of transducers used that can see or not see or be sensitive or not to particular electromagnetic radiation that make objects visible. The set of measurements detected by the transducers and adequately converted generate the image and constitute the indispensable informational content for the reconstruction and interpretation of the scene. In the psychological context, the interpretation of what is observed is achieved through a set of perceptual processes that are currently thought to be expressed in terms of information processing (computational model). This considered, for an artificial sensory system, there is a mutual relationship between the transduction process that leads to the formation of the image and the way in which the image will be correctly interpreted. The image derived from the transduction process can be seen as the result of the light–matter interaction, conditioned by the sensitivity (to the visible, infrared, etc.) of the sensors to the light emitted by the objects of the scene, by how the light is projected and focused on the sensitive surface (photoreceptors) and by the methods of converting light energy into chemical form (for example, a photograph) or in the form of electrical impulses used to generate the image (for a camera). These conditions determine the peculiar characteristics of an artificial sensory system to be adapted for the various application contexts. We do not yet know much about the mechanisms by which our brain reconstructs its own model of the external world starting from the stimuli of the photoreceptors of our visual system. In some not very complex natural organisms, it would appear that photoreceptors stimulate neural network cells that perform the first levels of
1.2 From Energy to Image
3
processing. The analogy between brain and computer, although profoundly different, can lead us to study artificial sensory systems that are inspired by natural ones with all conceivable limits (transmission speed, number of processors, sensors, etc.) that will be highlighted in the following. Thus, for an artificial sensory system, it is necessary to convert the signal produced by the transducers into a numerical format which is the only acceptable form for digital processing. In other words, the artificial sensory system, through a process of digitization, which will be deepened in the next paragraphs, produces a digital image that is a numerical representation of the observed world. The tandem, artificial sensory system and digital processing component, are the basis for developing robust computational models for the automatic interpretation of a scene.
1.3 Electromagnetic Energy, Photons and Light A first formalism on the nature of light was given by Isaac Newton based on the corpuscular theory, probably due to the fact that light appears to propagate with straight rays in a medium with uniform density. At the same time, Christiaan Huygens gave a different formalism based on wave propagation in all directions of light. With Maxwell, however, since 1800, the electromagnetic nature was imposed as we describe later. The light propagation model is important to study and understand how it interacts with the environment [1–4]. For example, a light source propagates luminous energy in the form of light beams that vary in intensity and wavelength.1 The propagation of light into the vacuum can easily be modeled with straight rays while it becomes more complex when it has to cross medium, even if transparent, such as air and water. In the latter cases, absorption phenomena occur when the photons collide with the particles of the medium releasing their energy and fading away. The photon constitutes a unit quantity of electromagnetic radiation, which travels at the speed of light. Einstein theorized the photons by calling them quanta2 of light to
1 The wavelength is the distance that a wave travels as it completes a complete cycle, i.e., for the field
to reach the initial value after exceeding a maximum and a minimum. More concisely, a wavelength is a measure of distance between two identical peaks (high points) or troughs (low points) in a wave (a repeating pattern of traveling energy like light or sound). The wavelength λ is linked to the frequency (the number of complete cycles per second) ν by the relation λν = c where c is the speed of light. Often in the mathematical treatment, it is advisable to introduce the wave number k, defined as k = 2π/λ. In essence, k is the inverse of the wavelength λ. 2 Singular of quanta is quantum and means the minimum amount of any physical entity (physical property) involved in an interaction. For example, a photon is a single quantum of light (or of any other form of electromagnetic radiation), and can be referred to as a light quantum, or as a light particle.
4
1 Image Formation Process
explain the phenomenon of photoelectric emission (emission of photoelectrons from materials invested by low intensity light). This phenomenon is explained only by thinking that the energy of an electromagnetic radiation is not uniformly distributed in space, but propagates localized in granules or photon.
1.3.1 Characteristic of Electromagnetic Waves From physics, it is known that an electromagnetic field E is generated by an electrostatic charge or a time-variant magnetic field. Similarly, a magnetic field B is generated by the electric current or by a time-variant magnetic field (see Fig. 1.1). The physical–mathematical formulation expressed in Maxwell’s equations describing electromagnetic waves show three characteristics: the general perpendicularity of E and magnetic B electric fields, their interdependence, and their symmetry. These characteristics are the basis for studying the phenomena of light propagation. When an electric charge is stationary it is associated with a radial and uniform electric field that propagates toward the infinite. As soon as the electric charge is moving, the electric field E is altered in the vicinity of the charge and this alteration propagated toward the outer space at a certain speed. This time-variant electrical field induces a magnetic field as predicted by Maxwell’s equations. The same electrical charge, being in motion, induces a magnetic field time variant. Consequently, the magnetic field generates an electric field E according to Maxwell’s theory. The process continues over time with the vector fields E and B that are simultaneously cooperating and varying, generate an impulse (or perturbation electromagnetic) or better an electromagnetic wave that propagates from its source, and independent of it toward the outside. In essence, both the time-varying electrical and magnetic fields, which can ideally be considered as a single entity, regenerate each other in an endless cyclic process. Electromagnetic waves have been coming to earth from the sun for thousands of years.
(a)
B
(b) E
E
B
Fig. 1.1 Electric and magnetic field: a the electric field E is produced by a varying time magnetic field; b the magnetic field B is produced by a variant time electric field
1.3 Electromagnetic Energy, Photons and Light
5
Fig. 1.2 Plane electromagnetic wave in motion toward the x axis with speed c
E
c z
B
x
In summary, electric and magnetic fields can be considered as two aspects of a single physical phenomenon, i.e., the electromagnetic field whose source is an electrical charge in motion. The symmetry characteristic that persists in Maxwell’s theory suggests that electromagnetic waves propagate in the direction that is symmetrical to both the vector fields E and B. In the empty space, electromagnetic waves propagate in the transverse direction to both fields E and B (see Fig. 1.2). Once a perturbation in the magnetic field is generated, it is propagating as a wave, moving away independently from its source. Fields E and B vary in time and space, and both regenerate in an endless cyclical way. The physical and mathematical aspects describing the propagation of electromagnetic waves in the vacuum are expressed in vectorial terms by the following Maxwell differential equations: ∂2 E ∂2 B and ∇ 2 B = 0 μ0 2 (1.1) 2 ∂t ∂t where μ0 is known as magnetic permeability in the vacuum (it expresses a characteristic of the material to magnetize in presence of magnetic field) and 0 is called the permittivity (also called dielectric constant) in the vacuum (also called free space) describing the behavior of a dielectric material in the presence of an electric field (a parameter that characterizes the tendency of the material to counteract the intensity of the electric field). The Laplace operator (also called Laplacian operator) ∇ 2 is applied to the Cartesian (x, y and z) components of the E and B fields, thus generating 6 scalar equations. Maxwell’s theory demonstrates that if the propagation of electromagnetic waves takes place in the direction of the x-axis, the variation of fields E and B depend only on x and t, and in every instant ti these fields have the same value in each point of the planes perpendicular to the x axis, as shown in Fig. 1.2. Equation (1.1) ∇ 2 E = 0 μ0
6
1 Image Formation Process
describe in general the dynamics of propagation of a wave in the vacuum at velocity √ c = 1/ μ0 0 along the direction of the x axis and without distortion: 2 2 2 ∂2 E 2∂ E ∂ B 2∂ B = c = c (1.2) ∂t 2 ∂ x 2 ∂t 2 ∂x2 Maxwell also assessed the speed of light propagation by skillfully using all the experimental results made at that time by several researchers and theoretically formulated that the velocity of propagation of electromagnetic waves in space would result: 1 v=√ ≈ 3, 108 ms. (1.3) μ0 0
This theoretical result is exceptionally in accordance with the speed measurements of light estimated with previous experiments (315,300 km/s). The theory of Maxwell’s electromagnetism validated with various experimental results has turned out to be one of the most important discoveries of all time. Subsequently, the scientific community has designated the symbol c, derived from the Latin word celer (fast), to indicate the speed of light in a vacuum. Currently, after the new definition of the meter, the speed of light in the vacuum was found c = 299,792,458 · 108 m/s. Maxwell’s theory is not valid only in the hypothesis of propagation of plane electromagnetic waves but it remains valid also for perturbations described by impulsive or continuous waves. Waveforms of particular interest are the harmonic waves. In the particular case of harmonic waves of frequency ν = ω/2π and wavelength λ = 2π/k, the electric fields E(x, t) and magnetic B(x, t), which propagate in time t, along the axis of the x, are described by the equations: E(x, t) = E 0 sin k(x − ct) = E 0 sin(kx − ωt)
(1.4)
B(x, t) = B0 sin k(x − ct) = B0 sin(kx − ωt)
(1.5)
where E 0 (x, t) and B0 (x, t) are the relative amplitudes of the two waves, ω is the angular frequency of the wave (expressed in rad/s), and k is the wave number representing the number of wavelengths in the distance 2π . The parameters characterizing the propagation dynamics of electromagnetic waves are in relation to each other as follows: c = λν and ω = ck (1.6) where ν is the frequency of the wave (expressed in cycles/s also called Hertz) that indicates how the electromagnetic perturbation, described by fields E and B, varies in each point along the x axis. In this context, the wave propagation velocity c represents the phase velocity of the wave with spatial period 2π/k of the waveform, i.e., that waveform repeats itself for every λ wavelength. The oscillation period T at each point is given by 1/ν. Remember that sometimes the wave number k is defined as the inverse of the wavelength (1/λ) or k/2π which corresponds to the number of wavelengths in a unit of length (m−1 ). The repeatability of the waveform is also justified by the relationship c = λν, which indicates how λ = c/ν = cT showing that the wavelength λ can be seen also as the distance traveled by the waveform in a period T .
1.3 Electromagnetic Energy, Photons and Light
7
Y
Fig. 1.3 Electrical and magnetic fields of a harmonic plane electromagnetic wave
E
B Z
c X
From this, it follows that in the sinusoidal wave motion there are two periodicities: one over time, expressed by the period T , and one in the space expressed by the wavelength λ, between them linked by the relation λ = c · T . Figure 1.3 shows graphically how an electromagnetic wave can be considered as the simultaneous overlap of two sinusoidal waves representing the oscillations E y (x, t) and Bz (x, t), respectively, of the electric fields in the y-x plane and magnetic in the z-x plane. Basically, they correspond to linearly polarized waves or in a plane. In some areas of space, different waves can be concentrated, the effect of which produces the overlap of harmonic wave motions having frequencies ω, 2ω, . . . , nω or periods T, T /2, . . . , T /n. These composite wave motions can be modeled with sinusoidal waves described in terms of Fourier components, as we will see in the following chapters.
1.4 The Energy of Electromagnetic Waves One of the significant properties of undulatory motions, as in the case of electromagnetic waves, is that such waves carry energy. The light coming from the sun, continues since the birth of the solar system, travels thousands of kilometers to reach the earth transporting significant amounts of energy that will continue for thousands of years. For a given electromagnetic field in a region of space, it is reasonable to define an energy density, or a radiant energy per unit of volume. The energy intensity of an electromagnetic wave (i.e., the energy passing through the unit area in unit time Watt/m2 ) (see Fig. 1.4) propagating in the direction of wave propagation is given by the contribution of the energies induced by fields E and B and is expressed with the Poynting vector: − → − → (1.7) I P = c2 0 E × B
8
1 Image Formation Process
Fig. 1.4 Flow of electromagnetic energy
A
The energy flowing through a surface A in the unit of time is obtained by calculating the flow of the Poynting vector passing through A (see Fig. 1.4), i.e. − → − → dI/dt = c2 0 ( E × B )dA (1.8) A
called radiant power or radiant flow. The quantity, expressed by the vector product − → − → E × B , in the case of harmonic waves, varies cyclically from a maximum value to a minimum, and at optical frequencies (with ν = 10−14 to 10−15 Hz), the flow intensity I varies rapidly as a function of time with the consequent impossibility of measuring its instantaneous value. For this reason, it is considered an average value that is measured like the energy absorbed in a certain time interval using, for example, films or photoreceptors. The mean value of the module intensity of the Poynting vector is a significant measure of the irradiance, i.e., the energy flow per unit of area per unit of time: c2 0 2 (1.9) E I ∼ = = 2 0 where we remember that E 0 is the maximum amplitude of oscillation of the electric field, c is the speed of light in the vacuum, 0 is the magnetic permeability, and with is indicated the average value of I P .
1.5 Sources of Electromagnetic Waves The sources of electromagnetic waves are the same as in the magnetic field, i.e., the electric charges in motion. In general, electromagnetic radiation is generated by the electromagnetic field that the charges produce during their oscillation. The frequency of oscillation determines the type of electromagnetic radiation emitted. These radiations, propagate with the same velocity c in the vacuum, are characterized by the frequency ν and wavelength λ. In nature there is a continuous exchange of energy between atoms, molecules, and electromagnetic radiation.
1.5 Sources of Electromagnetic Waves
9
A source, whose charges oscillate with a systematic relationship of the phases, is said to be coherent. If the charges oscillate independently, i.e., the relationship between the phases is random the electromagnetic radiations produced by such a source are said to be incoherent. Ordinary light sources used in the optical field are incoherent, such as tungsten lamps, fluorescent lamps, etc. These sources do not generate trains of flat and continuous electromagnetic waves but rather pulsed, i.e., trains of short-term waves and between them randomly displaced in phase. The laser sources are instead an example of coherent light sources as well as radio sources and microwaves. These latter sources are characterized by low-frequency coherent light generated by electronic oscillators using various amplification devices.
1.6 Light–Matter Interaction Previously, the mechanisms of propagation of electromagnetic waves in the vacuum produced by moving electrical charges were considered. For aspects of image formation, it is important to understand the inverse mechanisms, i.e., the phenomena that occur when a beam of light interacts with the matter. In the light–matter interaction, the most important phenomena that occur are the emission and absorption of electromagnetic radiation, caused by the atomic structure of matter, consisting of electrons (negative particles) surrounding central nuclei (positive particles). When a beam of light interacts with atoms, the electric and magnetic fields disturb the state of the electrons of each atom. Under the influence of electromagnetic fields, electrons are forced to oscillate with a frequency much greater in relation to the amount of energy absorbed by the electromagnetic field that has invested it. Quantum theory has shown that matter (atom and molecules) absorbs electromagnetic radiation more easily when the frequency of the beam of light coincides with one of the frequencies of the emission spectrum of the atom or the molecule (see Fig. 1.5). The absorption of energy by an atom or molecule has the consequent effect of placing the electronic motion in equilibrium at a new energetic state of the atom (or molecule) said excited state. This new state of the matter, with the electrons forced to oscillate behave like electrical charges that emit (diffuse) in turn electro-
Initial spectral distribution of the incident radiation
Energy absorbed by the atom
0
ω
Spectral frequency
(b) Radiation intensity
Radiation intensity
(a)
Spectral distribution of transmitted radiation
0
ω
ω
Spectral frequency
Fig. 1.5 a Intensity of radiation absorbed and b the transmitted radiation in the passage through a substance
10
1 Image Formation Process
magnetic radiations with the same frequency of the electromagnetic waves incidents. Essentially, the so-called diffusion phenomenon occurs, the radiation emitted is the diffuse wave that transmits the energy emitted by the atom which had absorbed it previously, through the phenomenon of absorption, from the incident wave through the electrons tied to the atom itself. Experimentally it is finally observed that the flow of incident energy, due to the phenomenon of diffusion, is attenuated because the absorbed energy is reemitted in all directions with the result of having a flow of energy of less intensity than the incident flow. In other words, a substance that is found in a certain thermal equilibrium in the environment, emits so much radiant energy as it has absorbed. A substance with good energy absorption capacity also has good emission capacities. It has also been proven that the intensity of diffuse radiation depends on the frequency of incident radiation and from the diffusion angle. For a measure quantitative of this intensity, it is necessary to assess the magnitude of perturbation induced on the motion of the atomic electrons by the incident radiation. Quantum theory has analyzed in detail these quantitative aspects of the diffusion phenomenon. An experimental result that is found is that the diffused radiations they are more intense when the frequency of the incident radiation is equal to one of the frequencies ω1 , ω2 , . . . of the emission spectrum of the considered substance. This susceptibility of materials has a considerable influence on their optical behavior. This behavior of the substance is known as a luminescence phenomenon. In the visible, the brightness induced in a substance by absorption and by consequent emission of radiation is called fluorescence when the delay of time between absorption and emission is less than 10−8 s (the emission takes place after the exciter agent has been removed). When the duration of the phenomenon is greater than 10−8 s we speak of phosphorescence. In some substances the luminescence can last even hours and even a few days. The frequency of fluorescent and phosphorescent radiation is normally different from one another.
1.7 Photons In the previous paragraph, it was highlighted that the phenomenon of diffusion is a two-phase process. In the first phase, the electrons of the atoms of a substance quickly absorb the energy of the incident electromagnetic wave and immediately in the next phase it radiates such energy again as diffuse radiation. Let’s now see the phenomenon of the diffusion of electromagnetic radiation in the context of free electron unlike what happened when an electromagnetic wave invested electrons bound to atoms of a substance. This phenomenon can be explained by remembering that an electromagnetic wave transports energy E, and a momentum p, and the special theory of relativity defines the relationship p = E/c associated to the electromagnetic wave. In the diffusion process, if energy E is subtracted from the incident electromagnetic wave, a corresponding momentum p = E/c must also be subtracted from the wave. It follows that a free electron cannot absorb a certain
1.7 Photons
11
amount of energy E (which must be reemitted to justify the diffusion) and at the same time acquire the momentum pe = E/c (this is because we know that a wave transmits energy and momentum) without violating the principle of conservation of momentum and energy. It should be concluded that a free electron cannot absorb electromagnetic energy in analogy to what happens to a bound electron, where the energy and momentum absorbed are divided between the electron and the remaining structure of the atom, the ion (with much greater mass than the electron). In fact, in the case of free electron there is no other particle with which the electron can divide the energy and the momentum, and it would not be possible any phenomenon of absorption or diffusion. But the reality is another. Experimentally it is observed that when a monochromatic radiation invests a region with free electrons (a metal contains many free electrons not bound to atoms), it is observed that in addition to the incident radiation another frequency radiation is manifested different than that incident. This phenomenon was discovered by Compton A. H. (Compton effect), which interpreted this new radiation as a diffuse radiation from free electrons. In particular, he observed that the frequency of diffuse radiation was less than the frequency of the incident radiation and with the consequent higher wavelength. Compton also established experimentally that the wavelength λ of the diffuse radiation depends on θ , the angle between the incident radiation (with wavelength λ) and the direction in which the scattered radiation is observed. It is useful to imagine, that the phenomenon of diffusion by an electron is an impact between the incident wave and the electron to justify the exchange of energy and momentum. The incident wave propagates with velocity c and its energy–momentum relationship is E = c · p, equivalent to that of a particle with nothing resting mass, and such diffusion phenomenon is equivalent to an impact where a particle has mass resting nothing and is moving with velocity c, and the other particle is the electron with resting mass m e involved in the impact. After the impact, the energy of the zero rest mass particle becomes E other than E, the energy before the impact (E > E ). The analogy between the phenomenon of diffusion discovered by Compton and that of the dynamics of the impact between two particles of which one with zero rest mass and the other, that of the electron, leads to connect the frequency ν of the incident wave with the energy E, i.e., E = h · ν, where h is a universal constant describing the proportionality between the frequency of the electromagnetic wave and the energy associated with it in the impact. This universal constant is called Planck’s constant appearing for another context at the end of the nineteenth century when the German physicist Max Planck studied the intensity of electromagnetic radiation in equilibrium with matter. Given the effect of Compton and the phenomenon of diffusion by a free electron, we are now able to formalize the concept of photon based on the following: (a) The diffusion of electromagnetic radiation from a free electron is considered as an impact between the electron and a particle with zero rest mass.
12
1 Image Formation Process
E=hν
Scattered photon E’=hν’ p’=h/λ’ θ
p=h/λ
Incident photon
Free photon
Φ Electron after
pe, Ek
Fig. 1.6 Relationship of energies and momentum in the Compton diffusion
(b) The electromagnetic radiation (incident) is seen as a particle with zero rest mass that is called photon for simplicity. The energy and momentum of the photon (the dual aspect between particles and electromagnetic radiation is highlighted) are related, respectively, to the frequency and wavelength of electromagnetic radiation: h·ν 1 E = =h· (1.10) c c λ Figure 1.6 illustrates the complete phenomenon of diffusion, as experimentally discovered by Compton, when a photon with frequency ν invests an electron at rest. It is observed that the energy E = h · ν and the momentum p = h/λ of the incident photon are partly transferred to the electron and after the impact with the same electron, the energy of the diffuse photon E = hν is less by effect of the lower frequency ν with respect to the frequency of the incident photon. Experiments have shown that the momentum of the electron after diffusion is equal to the difference in the momentum of the incident photon and the diffuse one. In the diffusion process, the electron absorbs the energy and momentum of the incident photon, reemits as diffuse radiation the energy E = hν and the momentum p = h/λ , and acquires kinetic energy E k = E − E and an amount of momentum pe = p − p . The evidence of experimental results leads us to consider the photon as the quantum of electromagnetic energy and momentum absorbed or emitted in a single process by a charged particle. The concept of photon is also extended in contexts in which the incident radiation interacts not only with the single free electron but in general with the atomic structures of the matter. This concept of photon is part of the fundamental laws of physics that characterizes all radioactive processes concerning electromagnetic fields and charged particles. The concept of photon is deepened by the theory of quantum mechanics (also known as quantum physics, or quantum theory) to explain the phenomena of emission and absorption of electromagnetic radiation in atoms, molecules, and nuclei. The energy of matter (atoms and molecules) is quantized, and can only assume discrete values associated with stationary states or energy levels. The concept of photon connects the aspects of energy quantization with the ability of an atom or molecule to absorb or emit electromagnetic radiation only for certain frequencies E =h·ν
and
p=
1.7 Photons
13
(characteristic of a substance). In fact, if an atom in the stationary state of energy E absorbs electromagnetic radiation with frequency ν and passes in another stationary state of higher energy E , the variation of energy absorbed by the atom is of E − E, which for the conservation of the energy must be h · ν, equal to the energy of the absorbed photon. The relationship E − E = h · ν was formulated by the Danish physicist Niels Bohr in 1913. This relationship also applies when the stationary state of an atom passes from energy E to E, a lower state of energy. In image-capturing applications, the quantization of radiation energy plays an important role when using radiation-sensitive detectors that measure the energy absorption of a single photon that can have a minimum value of h · ν (considered ν the frequency of radiation) and with a level of uncertainty due to the background noise of the detector.
1.8 Propagation of Electromagnetic Waves in Matter Experimentally it has been shown that the velocity of propagation of electromagnetic waves in matter is different from the velocity of propagation in the vacuum. Maxwell’s equations for the electrical and magnetic field are still valid if the substance considered is homogeneous and isotropic, with the exception that the constants 0 and μ0 are replaced, respectively, with the electrical permittivity and the magnetic permeability μ, characteristics of the substance. Consequently, the phase velocity v, i.e., the rate at which the phase of the electromagnetic wave propagates in space now becomes 1 v=√ (1.11) μ The ratio of the velocity of an electromagnetic wave in vacuum c with the velocity of propagation v in the matter is called absolute index of refraction n given by c μ n= = (1.12) v 0 μ0 which is useful for characterizing a material with respect to how light or generally an electromagnetic wave propagates in it. If we consider the ratios / 0 = r and μ/μ0 = μr , respectively, the permittivity (or dielectric constant) and the relative permeability, the absolute index of refraction n becomes √ n = r μr (1.13) In optical (nonmagnetic) materials, i.e., in transparent materials that normally transmit electromagnetic waves in the visible, the relative permeability μr is slightly different from 1, and in this case the index of refraction becomes √ n = r (1.14)
14
1 Image Formation Process
Table 1.1 Index of refraction towards the square root of permittivity √ Substance n r Air (1 atm)
1.0002926
1.000295
CO2 (1 atm)
1.00045
1.0005
Polystyrene
1.59
1.60
Glass
1.5–1.7
2.0–3.0
Fused quartz
1.46
1.94
Water
1.33
9.0
Ethyl alcohol
1.36
5.0
known as the Maxwell Relation. Table 1.1 shows of n and r values for some substances. There is a good concordance of r and n values for some substances while this does not occur for other substances such as water and alcohol. It is shown that r and n depend on the frequency of the electromagnetic wave and its wavelength, in particular, for transparent optical materials. Consequently the phase velocity ν = c/n of the electromagnetic wave in the matter depends on the frequency of the radiation. This variation of the absolute index of refraction with the frequency leads to the phenomenon of dispersion. This phenomenon justifies the dispersion of the waves (at the phase speed v = c/n) when they propagate in the matter, and in the case of wave train with different frequencies will be distorted, and each component will propagate with different speeds. The dispersion in the glass is responsible for the decomposition of the white light into its color components when it crosses a prism (experiment done by I. Newton). The phenomenon of dispersion can be explained by analyzing the mechanisms of the interaction of the incident electromagnetic wave with the atoms constituting the substance.
1.9 The Spectrum of Electromagnetic Radiation Electromagnetic waves are characterized by the frequency of oscillation of the source that produces it. A source can emit radiation with different frequencies and wavelengths. A set of radiation, ordered from the wavelength or frequencies, is called the spectrum of radiation. The full spectrum of a source includes all frequencies or wavelengths that the source itself can emit. There is no universal source that can emit radiation for any frequency. Different sources instead cover the different regions of the electromagnetic spectrum. The devices used to detect a region of the spectrum are called spectroscopes. A spectrum is continuous if it includes all frequencies or discrete if it includes some. A region or restricted range of the spectrum is also called spectral band. The various
1.9 The Spectrum of Electromagnetic Radiation
15
Penetrates Earth's Atmosphere?
Radiation Type Wavelength (m)
Radio 103
Microwave 10−2
Infrared 10−5
Visible 0.5×10 −6
Ultraviolet 10−8
X-ray 10−10
Gamma ray 10−12
Approximate Scale of Wavelength
Buildings
Humans
Butterflies Needle Point Protozoans
Molecules
Atoms
Atomic Nuclei
Frequency (Hz) 10 4
1012
10 8
Temperature of objects at which this radiation is the most intense wavelength emitted
1K −272 °C
100 K −173 °C
1015
1016
10,000 K 9,727 °C
1018
1020
10,000,000 K ~10,000,000 °C
Fig. 1.7 The spectrum of electromagnetic radiation (from Wikimedia) Table 1.2 Unit of measurement of λ Micron
µ
10−6 m
Nanometer
nm
10−9 m
Angstrom
Å
10−10 m
spectroscopes differ in relation to the region of interest of the entire electromagnetic spectrum. Similarly for the acquisition of images, there are several sensors operating in different regions of the electromagnetic spectrum for the various applications. For example, the so-called optical region extends from far infrared to far ultraviolet as shown in Fig. 1.7. This region includes the region of the visible. The optical region is characterized by the fact that electromagnetic radiation is focused, directed, and controlled by mirrors and lenses, and by the fact that they can be decomposed into a spectrum of radiation when they cross a prism for the phenomenon of dispersion. Unlike thermal radiation (emitted by metallic bodies that emanate radiation with continuous spectrum), the radiation emitted by the excited atoms and by molecules of optical materials (glass, gas, air, liquid, etc.) have various discrete frequencies. A spectrum line is used as a term to indicate a given frequency or wavelength corresponding to a radiation. The wavelength units, commonly used in the optical region, are given in Table 1.2. The various regions of the spectrum do not have strictly defined limits because the sources can emit radiation in regions of the overlapping spectrum. The microwaves have wavelengths varying from 1 mm to about 30 cm. These radiations are able to penetrate the Earth’s atmosphere and are particularly useful for monitoring the Earth’s surface.
16
1 Image Formation Process
The infrared, also called I R, extends into a range of the spectrum that It is normally considered divided into four regions: near infrared 780–3,000 nm (in the sense of close to the visible region), intermediate infrared 3,000–6,000 nm, far Infrared 6,000–15,000 nm, and extreme infrared from 15,000 nm to 1 mm. It is observed that the extreme infrared is adjacent to microwave radiation, and therefore the radiant energy can be generated either by a microwave sensor or by an incandescent source (molecular oscillator). Approximately half of the electromagnetic energy coming from the sun is infrared type and the common bulb lamps emit infrared light radiation in the range 3,000–10,000 nm. Extreme infrared sensors are used in particular for the detection of heat sources (for example, locating the heat dispersion of a building), to see at night, to observe from satellite the state of the vegetation on the ground, and to observe in the space particular phenomena of astronomical interest. They are also particularly used to determine some temperature differences of the objects of the observed scene. Infrared cameras are now available that produce thermographic images. The ultraviolet, also called U V , is immediately adjacent to violet 455–390 nm, the last region of light, extends to the X-ray region. Ultraviolet radiation with a wavelength of less than 290 nm destroys microorganisms. People cannot observe the ultraviolet rays well as the cornea absorbs them, especially those with small wavelengths, while the crystalline one absorbs those with wavelength above 300 nm. In fact, people suffering from cataracts by eliminating the crystalline can see UV rays with wavelengths above 300 nm. Some creatures have developed UV-sensitive sensors, and use UV rays from the sun to navigate even on cloudy days. For several researches in the field of astrophysics, UV sensory systems have been developed. Laser, film, and UV microscopes are also made for different applications. The X-rays have very small wavelengths, less than 1 nm, with high electromagnetic energy. They are particularly used for the 3D reconstruction of objects and for the 3D reconstruction of the internal organs of living beings by means of appropriate methodologies based on tomography (for reflection, emission, or measure flight time—also with ultrasound) and with more complex methodologies based on nuclear magnetic resonance (NMR). Tomography uses radiation that penetrates an object from different spatial directions. The object can be penetrated by radiation emitted by a point source placed in different positions or by a beam of parallel radiation (see Fig. 1.8) and once the object is crossed, these radiations are projected on a screen in the same way as the image formation process for an optical system. The radiation that strikes the screen will be attenuated on the basis of the path traveled starting from the source and due to the absorption caused by the object that one wants to reconstruct from the different projections. Sensory systems can also be classified in relation to the methods of reconstruction of the observed objects.
1.10 The Light
17
(a) Detectors Attenuated radiation R measured by detectors
(b) dia Ra
Radiant beams
nt be am s
Detectors
De Attenuated tec to radiation R rs measured by detectors
Fig. 1.8 In tomography the radiation can be emitted in a parallel projection and b by using a point source
1.10 The Light Of the entire electromagnetic spectrum, light is defined as the radiation that can be detected by the human eye. Figure 1.9 represents the sensitivity of the human visual system of an observer chosen as a sample exposed to radiation of various wavelengths and highlights a peak at 555 nm (1 nanometer = 10−9 m) that produces the sensation of the yellow-green color. It represents the center of the visible area, whose limits are not evident given the asymptotic nature of the sensitivity curve. The process of genetic evolution by interacting with solar radiation has characterized the visual system of man with photoreceptors sensitive only to electromagnetic radiation within such limits that delimits the region in the visible spectrum. Appropriate devices are to be used for capturing images in contexts outside that region. For example photographic plates, infrared cameras, etc. A visible object appears to the observer more or less bright in relation to the greater or lesser amount of light radiation that it absorbs and reflects. In the case of opaque objects, the brightness of the surface can be estimated considering only the light source, although it should be considered the contribution of lighting of the light reflected by the other nearby objects. The light is produced by atoms or molecules as a result of an internal rebalancing of the motion of the outermost electrons.
18
Relative Sensitivity
1 Image Formation Process
1.0 0.8 0.6 0.4 0.2 0 400
450
500
550
600
650
700
Wavelength (nm) Fig. 1.9 Relative human sensitivity function
1.10.1 Propagation of Light The process of image formation is closely linked to the physical principles of the interaction between light and matter. In general, when electromagnetic radiation interacts with objects these, in turn, emit radiation that can be detected by transduction from suitable devices. In the case of a camera, the transduction process consists in converting the luminous energy emitted by the objects and the surrounding environment into electrical energy in the form of current or voltage in proportion to the amount of energy characterized by photoreceptors. It can be deduced that the acquired image depends on the type of transduction used and on the limits of the detected transduction measures that may be affected by errors. These depend essentially on the following physical phenomena of propagation of electromagnetic radiation: reflection, refraction, absorption, and scattering (dispersion or diffusion). The phenomena of absorption and diffusion (scattering) have already been analyzed in the light–matter interaction and we have observed how the optical characteristics of the substance are closely linked to their ability to absorb and reemit light when invested by incident light. The phenomenon of diffraction is another characteristic of the propagation of light and manifests itself when the light crosses an obstacle (transparent or opaque) undergoing a disturbance in terms of direction, propagation, and intensity. This perturbation becomes sensitive when the wavelength of the incident light is comparable with the size of the obstacle. Normally the diffraction phenomenon is not easily visible to the naked eye as the obstacles are almost always large compared to the wavelength of the incident light of about 5 × 10−7 m. A visible effect of the diffraction phenomenon is the appearance of sky blue color due to the following. The molecules of the air go in resonance when they are hit by the component of ultraviolet light coming from the sun, and consequently they absorb
1.10 The Light
19
(a) Solar energy
Atmosphere
(b)
Fig. 1.10 Scattering of sunlight. a Caused by the propagation of light between media with different densities; b rainbow phenomenon caused by the optical dispersion of sunlight through the raindrops
and reemit in all directions by diffusion (scattering) energy, i.e., the blue component, leaving unchanged the other components of the visible spectrum. In the absence of atmosphere the sky would be black as well as appeared to the astronauts that of the moon which, as we know, has no atmosphere. Scattering is also fundamental for the phenomena of reflection and refraction. When the light propagates through a medium with a given optical density to another with a larger optical density, the latter travels with less speed. In this context, the two phenomena of reflection and refraction are experimentally found. These phenomena occur in the separating surface from which the reflection light emerges which propagated backwards in the same material in which the incident light was propagated, and the refractive light propagates (transmitted) into the denser material (see Fig. 1.10a). The incident light is divided into that reflected and refracted, in relation to the optical characteristics of the material. When the reflected light represents almost all the incident light, the reflecting surface is a mirror. When the transmitted (i.e., refracted) light represents the totality of the incident light, it is a transparent material that is normally used to realize optical components for the acquisition of optical images. These components (lenses made of glass or special plastic) have the optical characteristics of projecting (transmitting) in the image plane most of the luminous energy emitted by the objects of the scene. A natural event due to the dispersion of sunlight occurs with the phenomenon of the rainbow (see Fig. 1.10b).
20
1 Image Formation Process
(a)
(b) Medium 1
Incident plane waves
plane waves
Si
Sr θr Medium 1
θi θt Medium 2
Medium 2
St Refracted/Transmitted plane waves
Refracted/Transmitted plane waves
Fig. 1.11 Plane waves incident, reflected, and refracted (transmitted)
1.10.2 Reflection and Refraction Figure 1.11 shows what happens when a flat wave of monochromatic light propagates in the transparent medium (medium 1) and invests the separating surface relative to the other transparent medium (medium 2) in which the light continues to propagate. It is observed that the incident light represented by the vector Si striking the separating surface of the two media, it breaks down by propagating in two directions: the reflected light represented by the vector Sr which propagated in the medium (1) and the refracted light represented by the vector St which is propagated, i.e., it is transmitted, down in the medium (2). Experimentally it has been shown that the direction of the light propagation vectors (i.e., the wavefront3 ) defined by the angles θ i , θ r , and θ t , respectively, called angles of incidence, reflection, and transmission (refraction), are linked together by the following laws of reflection and refraction: (a) The directions of incidence, refraction, and transmission all lie in a plane normal to the surface of separation, i.e., all the rays of light lie in the plane of incidence. (b) The reflection angle θ r is equal to the angle of incidence θ i , that is, θr = θi .
3 The form of any wave (matter or electromagnetic as in the case of light) is determined by its source
and described by the shape of its wavefront, i.e., the locus of points of constant phase. If a traveling wave is emitted by a planar source, then the points of constant phase form a plane surface parallel to the face of the source. Such a wave is called a plane wave, and travels in one direction ideally represented by a vector.
1.10 The Light
21
(a)
(b)
Incident ray
θi
θr
Si
ray Sr
Medium 1
St Medium 2
θt
Refracted ray
Fig. 1.12 a Incident, reflected, and transmitted rays. b Case of a ray of light incident in a block of glass that reflects and transmits into it
(c) Snell’s law (also known as the Snell–Descartes law and law of refraction) states that for any angle of incidence θi , the ratio between angles of incidence and refraction angles is equivalent to the ratio of refractive indices, which is simply a constant experimentally determined for the two physical media:
i.e.,
n i sin θi = n t sin θt (symmetrical form)
(1.15)
sin θi nt = = n ti sin θt ni
(1.16)
expressing that the ratio between the sine of the angle of incidence and the sine of the angle of refraction (transmission) is constant, and coincides numerically with the ratio of the absolute indices of refraction of the two media (1) and (2). In other words, nti is the relative index of refraction of the medium (2) with respect to the medium (1) which depends on the nature of the light and the optical properties of the two media. Figure 1.12 represents the same Fig. 1.11, which without loss of generality, the wavefront of the incident light is represented by a vector indicating the direction of the incident energy flow. In essence, this vector (or ray of light) ideally represents a beam of light (or light brush) resulting as the summation of so many elementary vectors orthogonal to the direction of wavefront motion. In a homogeneous isotropic medium, the trajectories of light propagation are rectilinear and because the velocity of propagation of light is identical in every point of the medium, the spatial separation between two wavefronts measured along a trajectory must be the same (see Fig. 1.13). Every ray of light (A, B, ...) of the wavefront , whatever the optical path, crossing also different homogeneous medium, employ the same time whatever the wavefront configuration in the various time intervals (in Fig. 1.13 are highlighted with 3 different instants and 3 wave configurations). The process of image formation is strongly influenced by the laws of reflection and refraction, and considering that the optics used are normally made of glass (n t ≈ 1.5) and that the refractive index of the air is n t ≈ 1.0, Fig. 1.14a highlights how the
22
1 Image Formation Process
S
Ω
Fig. 1.13 Wavefronts and rays of rectilinear light N
(b) (a)
N
Incident Ray
Incident Ray
θi
N
θr
θi ni
Refracted nt > ni Ray θ’i θi > θt
nt > n i θt
R
ni
Refracted Ray
θ’i θ’t θ’t > θ’i
ni < nt θ’t
Refracted Ray
(c)
Fig. 1.14 a The incident ray changes direction in the separation interface of the two optical media with different refractive indices; b incident polarized light it is entirely transmitted and not reflected when the angle between reflected and transmitted rays is 90◦ ; c real image of an incident light beam on a glass block, doubly reflected and transmitted
rays of light entering a medium with a larger refractive index n t /n i > 1, coming from a medium with a lower refractive index, are refracted more toward the normal (θi > θt ) and vice versa (θi < θt with n t /n i < 1).
1.10 The Light
23
This implies that the rays of light emitted from one point of an object crossing the lens converge at a different point in the image plane according to Snell’s law in the double air–lens and lens–air passage as shown in Fig. 1.14b. In fact, it is observed that in the first air–lens passage one has (θi > θt ) since the ratio between the indices is n t /n i > 1, it must be for Snell’s law that sin θi > sin θt . From this it follows that necessarily must be θi > θt considering that both angles range from 0◦ to 90◦ with the consequent increasing value of the sine function. In the second passage (see also Fig. 1.14c), lens–air, the opposite situation occurs, i.e., the ratio is less than 1 because n the refractive index of the medium where the ray is refracted (air) is equal to 1.0 while the refractive index of the medium where the ray (air) 1 = 1.5 < 1. Consequently, now becomes incident (lens) is equal to 1.5, i.e., nni t (lens) the angle θt of the second refraction is greater than the θi incident angle (θi < θt ). In Fig. 1.15 a particular situation is shown, when the light propagated by the lens– air interface, with the ratio between the refractive index of the air (the medium where the refraction occurs) and the refractive index of the lens (medium of the incident light) defined by the relative refractive index n ti = nnit < 1. In fact, in this context, from law of Snell (1.15) we have nt sin θt =⇒ θi < θt (1.17) sin θi = ni ∼ 1.5 (glass), n t = ∼ 1.0 (air) and consequently n ti < 1. where n i > n t because n i = It will be necessary that the angle of refraction θt will be greater than the angle of incidence θi as confirmed by (1.17) and shown in Fig. 1.15. In fact, from the illustrations it is observed with the increase of θi , the angle θt of the transmitted light beam increases more and more, until it turns out to be grazing with the interface with θt = 90◦ , thus verifying itself a limit case for which sin θt = 1 and consequently it will have (1.18) sin θi = sin θc = n ti indicating that the direction of the refracted light direction is parallel to the separation surface of the two glass–air medium when the angle of incidence assumes a particular value θi = θc , and θc is called limit angle or critical angle. For angles of incidence greater than or equal to θc , all the light is reflected totally in the same medium from which it comes (medium of incidence, in this case the glass) and this phenomenon is called total internal reflection. Mathematically, this situation is explained even if we consider that with the constant n ti < 1, and θi > θc it should be sin θi > n ti and consequently you should have sin θt > 1 that is absurd (violating Snell’s law) and this explains the phenomenon of total reflection. A useful application of the total reflection is, for example, to deviate of 90◦ a light beam through a 45◦ optical prism (see Fig. 1.16). The limit angle θ c in the case of air–glass is about 42◦ , and therefore normally a beam of incident light on the sides of the prisms indicated in the figure always has the incident angle θi > 42◦ , and will always be reflected internally as shown in Fig. 1.16.
24
1 Image Formation Process
(a)
(b) θt
θt
nt
nt ni>nt
ni
θi
θi θr
(c)
(e)
ni θr
(d) nt
θt =90°
nt ni
ni
θi > θc
θi = θc θr = θc
θr = θi
Fig. 1.15 Internal reflection and critical angle of reflection θc . The propagation of light occurs through the lens interface (medium of the incident light) and air (medium where the light is refracted). This situation is one that you would observe a slide resting on a glass plane illuminated by a vertical source (maximum brightness). The figures a and b represent the situation in which the source begins to tilt and the transmitted light begins to decrease. As the source is tilted the brightness of the slide decreases until it is completely dark when you reach the critical angle θi ≥ θc (which depends on the relative refractive index n ti ) as shown in figure (c) and (d) when the total internal reflection occurs. e Real representation of the total reflection that takes place starting from the fifth inclination of the light beam
(a)
(b) 45° 90° 45° 45°
45°
90°
Fig. 1.16 Total internal reflection. a Different methods for reflecting 90◦ or 180◦ a beam of light using glass prisms at 45◦ . b Typical application of the total internal reflection used for the binoculars where a pair of prisms is employed at 45◦ which performs the function of optimal reflectors as an alternative to mirrors. The mirrors reflect only 95% of the light reflected by the prism through the total internal reflection. Prisms also prevent the distortion of refraction that can result when using a mirror with reflective coating
1.10 The Light
25
(a)
(b)
(c)
(d)
Scattered rays
Fig. 1.17 a Specular reflection; b diffuse reflection due to surface structures and scattering phenomena; c scene with dominant specular reflection and d scene with dominant diffuse light
In the phenomena of reflection and refraction studied, the beam of reflected light is the same size as the incident one, i.e., it has not undergone any deformation. In other words, the reflecting surface has been assumed to be specular (mirror-like). In Fig. 1.17a, a flat surface is shown, with a superficial geometric irregularity that is kept below the wavelength values λ of the incident light. In such conditions the incident light is reflected only in a single direction θi = θr (specular reflection). Conversely, there is widespread reflection when the surface has a significant irregularity or roughness as shown in Fig. 1.17b. In many applications glass with rough surface is used to diffuse the incident light in all directions. In these cases, the law of reflection and refraction remains valid in the sense that it must be considered applied for each small superficial element that has a flat surface. Between these two extreme situations, specular reflection and diffuse reflection, in reality one encounters different intermediate situations (see Fig. 1.17c, d).
26
1 Image Formation Process
1.11 The Physics of Light In this section, there are formally summarized some aspects of light physics, fundamental for the study of the mechanisms of image formation, and for the study of the algorithms of the artificial vision. A system of image acquisition (e.g., a camera) determines a measure of the light energy (intensity of light) reflected by the objects in the scene and projected onto the image plane, expressed as a gray or color value. In the preceding paragraphs, we have analyzed how the sensory system can be influenced by lighting conditions and the properties of materials to reflect, diffuse, and absorb light. The field of physics that quantitatively analyzes the light energy (responsible for the visibility of objects in the scene) in the process of light–matter interaction is radiometry. The devices used to measure the various radiometric values such as energy intensity, energy flow, directionality, spectral distribution, etc., are the radiometers. A sensory system is stimulated by a quantity of light, i.e., energy, coming from the sun or from an artificial source, transported through the electromagnetic waves (radiation). The magnitude of the stimulus is related to the wavelength λ of the considered light (for example, the visible is between 350 and 780 nm) and by the energy transported (or converted) per unit of time (power expressed in Watts).
1.12 Energy of an Electromagnetic Wave Electromagnetic radiations that propagate through waves (changing the state of motion of the atomic structure of matter) produce work, and therefore energy that is propagating in the direction of motion of the wave (i.e., of perturbation). If this energy is supplied over a period of time, a limited disturbance or impulse is produced. If instead it is interested in producing continuous energy, a wave train must be generated continuously. In a region of space, an electromagnetic wave transports radiant energy per unit of volume also called density of electromagnetic energy produced by the contribution of the two electric and magnetic fields. If u is the electromagnetic energy density per unit of volume, from Fig. 1.4 it should be noted that the overall flow of radiant energy U , which travels at speed c, in a time interval t, will cross the transverse section A and is given by U = uc A t
(1.19)
expressed in Watt s (Joule). Radiant flow or radiation power: dU (1.20) dt represents the energy transported in the unit of time, expressed in Watts. A light source is characterized by the distribution of its spectral components C(λ) each expressing the energy emitted in the unit of time and per unit of wavelength (also called spectral power or radiant flux per wavelength, expressed in Watt/m). P=
1.12 Energy of an Electromagnetic Wave
27
The total radiation spectral power emitted by the radiant source is calculated from the integration of the radiant flow of its spectral components to the various wavelengths: ∞ P = C (λ) dλ (1.21) 0
In the case of harmonic electromagnetic wave, we have already defined the vector that calculates the average intensity of the wave, i.e., the associated energy it transports. We now define the radiant energy that passes through an area A in the unit of time and per unit of surface. This quantity is called irradiance (irradiation) or radiant flow density: dP (1.22) I = dA expressed in Watt/m2 . While the radiant power P expresses the idea of the propagation speed of the radiant energy flow (i.e., radiant energy per unit time), it is useful to consider the radiometric magnitude I which expresses the radiant power per unit of area (radiant power incident on a surface expressed in W/m2 ) also called irradiance. The irradiation or the radiant flux density can refer to the energy that hits or passes through the surface of a body and in this case the radiant flux density is known as irradiance or the radiant power emitted or reflected from the surface of a body, and in this case the radiant flux density is called radiant emittance or radiant exitance or radiosity. Reflectance is defined as the fraction of incident light that is reflected. In Fig. 1.18 it is noted that if the irradiance incident is given by I i , the incident power on surface A is Ii A cos θi , where A cos θi corresponds to the projected area of surface A in
Fig. 1.18 Beam of reflected and transmitted light of an incident beam
Beam of incident light
Acosθr
Acosθi θi
θr
A Medium i Medium t
θt
Beam of transmitted light
Acosθt
28 Fig. 1.19 Electric and magnetic field in the incident wave, reflected and refracted by polarization perpendicular to the plane of incidence
1 Image Formation Process
y Plane of Incidence
Medium1 1
Ei
θi
θr
Bi
z
Er θt Bt
Br x
Et
Medium 2
the direction θi from normal to the incident power vector. The same is true for the reflected flux density (Ir A cos θr ) and transmitted (It A cos θt ). The reflectance in this case becomes Ir Ir A cos θr = (1.23) R= Ii A cos θi Ii Likewise, transmittance is defined as the fraction of incident light that is transmitted, given by It cos θt T = (1.24) Ii cos θi By definition, the polarization of an electromagnetic wave is conventionally determined by the direction of the electric field. If the incident wave (see Fig. 1.19) is polarized parallel (or perpendicularly) to the incident plane, the electric field vector will be parallel || (or perpendicular ⊥) in the same incident plane that includes the direction of the incident, reflected, and transmitted flow densities. In these conditions, the process of reflection and refraction of electromagnetic waves is expressed by the Fresnel equations which relate to each other the reflectance, the refractive indices of the materials and the angles of incidence, reflection, and refraction, and the polarization state of the incident flow. In essence, while the angles of reflection and transmission of an incident light beam (related to the indices of refraction of the materials), are described by Snell’s law (see Sect. 1.10.2) which provides no information on the intensity of light reflected and transmitted, these latter aspects are defined by the Fresnel equations.
1.12 Energy of an Electromagnetic Wave
29
The Fresnel equations are derivable from the electromagnetic theory through Maxwell’s equations. If the light beam4 is polarized in the incidence plane ( ), the ratios between the intensity of the reflex beam I r and that incident I i , and between the intensity of the beam transmitted by refraction I t and that incident is given by the following general Fresnel equations: Er n t cos θi − n i cos θt = r = Ei n i cos θt + n t cos θi Et 2n i cos θi t = = (1.25) Ei n i cos θt + n t cos θi where r and t are, respectively, the amplitude of the reflection and transmission coefficients, E i , E r , and E t are, respectively, the amplitudes of the electrical field components in the direction parallel to the incident, reflected and transmitted light beam by refraction. If the incident light beam is polarized perpendicularly to the plane of incidence (⊥), the general Fresnel equations are as follows: Er n i cos θi − n t cos θt = r⊥ = Ei ⊥ n i cos θi + n t cos θt Et 2n i cos θi (1.26) t⊥ = = Ei ⊥ n i cos θi + n t cos θt where r ⊥ and t ⊥ are, respectively, the amplitude of the reflection and transmission coefficients. The Fresnel equations can be expressed using Snell’s law and assuming nonmagnetic optical devices (magnetic permeability is neglected). Therefore, if the beam is polarized parallel to the plane of incidence, the relationship between the intensity of the reflected beam I r and that incident I i, , and between the intensity of the beam transmitted by refraction I t and the incident one, are given by the following Fresnel equations expressed by Snell’s law become: tan(θi − θt ) tan(θi + θt ) 2 cos θi sin θt t = sin(θi + θt ) cos(θi − θt )
r =
(1.27)
Similarly, in the case in which the incident light beam is polarized perpendicularly to the incidence plane, the Fresnel equations, according to Snell’s law, are sin(θi − θt ) sin(θi + θt ) 2 cos θi sin θt t⊥ = sin(θi + θt )
r⊥ = −
4 By
(1.28)
convention, the polarization direction of a light wave is that of the electric field vector (nonmagnetic).
30
1 Image Formation Process
1.13 Reflectance and Transmittance Let’s now see at the physical meaning of the Fresnel equations. Let’s first analyze how the amplitudes of the coefficients varies when the θi angle of the incident light beam changes. Subsequently, it is important to calculate the fraction of light reflected and transmitted by refraction and the corresponding flow densities. With incident light perpendicular to the interface, we have θi ∼ = 0, it follows that cos θi = cos θt = 1 and also in the previous Fresnel equations we can replace tan θ with sin θ . Substituting appropriately, the first equation of (1.27) becomes sin(θi − θt ) (1.29) r = −r⊥ = sin(θi + θt ) θi =0 From the first equation (1.27), after appropriate substitutions by applying the trigonometric formulas of addition and subtraction of the angles for the sine function and considering Snell’s law (Eq. 1.15), we arrive at the first Fresnel equation (1.25). Considering a limit case with the value of θi tending to 0, it follows that cos θi = cos θt = 1, and the first equation of the Fresnel equations (1.25 and 1.26), becomes nt − ni r = −r⊥ = per θi = 0 (1.30) nt + ni For the air the refraction index is ni = 1.0, for the glass is nt = 1.5, with incident light beam perpendicular to the separating surface of the two media, the reflection coefficient is equal to ±0.2. In this example, with nt > n i and applying Snell’s law one has that θi > θt and it follows that r ⊥ is negative for any value of θi (see Fig. 1.20a). For r , according to the first equation of the (1.27), the opposite is true, with θi = 0 initially is positive and decreases slowly up to zero when θi + θt = 90◦ because tan(π /2) is infinite. When this happens, the particular value of the incident angle of the light beam is indicated with θp and is called polarization angle. When the incident angle exceeds the polarization angle, r assumes increasingly negative values reaching −1.0 at 90◦ . When θi reaches about 90◦ with the reflection coefficient equal to −1.0 the reflection plane behaves like a perfect mirror. With normal incident light beam, θi = θr = 0, from the general Fresnel equations (second of equations 1.25) and (second of equations 1.26), we obtain: 2n i (1.31) nt + ni From Fig. 1.20a, it is noted that with n t > n i (external reflection) the reflection coefficient r⊥ is negative according to the first equation of (1.28), for any value of the angle of incidence θi , which varies in the range from θi = 0◦ (normal incident light) to θi = 90◦ (grazing incident light). Furthermore, the reflected light component always undergoes a phase change of 180◦ . From the figure, it is also observed that r > 0 for θi < θ p where we recall that θ p is the Brewster angle (described in t = t⊥ =
1.13 Reflectance and Transmittance
31
(a) 1.0
(b) 1.0 t Δφ=π
t
Δφ=0 Δφ=π
r
r
Angle of Brewster
r
θp
Δφ=0
r
Δφ=0
-1.0 0
30 60 θi in degrees Angle of incidence
90
Angle of Brewster
θ’p θ c
Critical Angle
-1.0 0
30 60 θi in degrees Angle of incidence
90
Fig. 1.20 Amplitude of the reflection and transmission coefficients as a function of incident angle. a Case of external reflection at air–glass interface n t > n i (n ti = 1.5) with the corresponding Brewster angle θ p = 56.3◦ (zero reflection for parallel polarization). It is observed that the reflection coefficient r⊥ is always negative for any value of θi and the reflected light undergoes a phase change of π while r > 0 for θi < θ p . The similar behavior of the transmission coefficient functions is also noted. b Case of total internal reflection with light propagation between denser and less dense medium (glass–air interface); both reflection coefficients reach the value +1 at the critical angle θc = 41, 8◦ , while at r = 0 the Brewster polarization angle is obtained θ p = 33.7◦
Sect. 1.13.1). Moreover, since the refracted light coefficients t⊥ and t are always positive, the transmitted wave does not undergo any phase changes with respect to the incident wave. Finally, the figure shows that the graph of the functions of the coefficients t⊥ and t are very similar. Figure 1.21 shows an illuminated corridor with incident light perpendicular to it (θi = 0) and the typical reflections of a specular surface are noticed, even though the walls and floor are not of materials with good reflector characteristics. Another phenomenon to consider, by examining the Fresnel equations, is with the internal reflection that is obtained when the light is propagated from a denser medium to the less dense one, i.e., with n i > n t . In this case, according to the first of equations (1.28), we have that θt > θi and r⊥ > 0 for any value of the angle of incidence θi (see Fig. 1.20b). If we consider the glass–air interface from (1.31) we have the initial value of [r⊥ ]θi =0 = 1−1.5 1+1.5 = −0.2 while [r ]θi =0 = −[r⊥ ]θi =0 = +2. From the figure, it is observed that both coefficients reach +1 at a critical angle θc associated with a value of the angle of incidence θi where the phenomenon of total
32
1 Image Formation Process
Fig. 1.21 Example of scene illuminated with incident light θi = 0 where the surface appears with specular reflections despite the material of the walls and the floor is not a good reflector
reflection of light5 occurs (zero light is transmitted) for which θt = 90◦ . It is also noted that the r function passes by zero at θi = θ p , that is the Brewster’s angle, in this case referred to the context of light propagation associated with the phenomenon of internal reflection. The value of the r and r⊥ coefficients for the θi > θc can be interpreted as complex values. In real applications of artificial vision, it is more useful to analyze the light–matter interaction in terms of reflected and transmitted intensity (of radiant flux) rather than evaluating the amplitudes of the reflectance and transmittance. We now evaluate the reflected and refracted radiant flux density affecting a surface as shown in Fig. 1.18, where an incident circular light beam is displayed on a surface of area A. From Sect. 1.4, we know that the power per unit area of light energy produced by a beam of light striking transversely a surface is given by the amplitude of the Poynting vector which expresses the irradiance or radiant flux density, expressed in Watt/m2 , given by c0 2 I = (1.32) E 2
reflection occurs with n i > n t and for a value of θi ≥ θc where θt = π2 . So, for Snell’s law you should have ni π nt =1= sin θi =⇒ sin θi = n i sin θi = n t sin θt =⇒ sin θt = 2 nt ni
5 Internal
With n i = 1.5 and n t = 1 in order not to violate the Snell’s law (i.e., the critical angle results: sin θi = 2/3
=⇒
ni nt
sin θi > 1), the value of
θi ∼ = 0.73 radiants ∼ = 41.8◦ ≡ θc
thus obtaining the total internal reflection for θi ≥ θc .
1.13 Reflectance and Transmittance
33
where it is recalled that 0 is the electrical permittivity in the vacuum, c the speed of the light, and E the intensity of the electric field. The intensity of the light expressed by the vector of Poynting, as we will see later, is useful to express it also in function of the refraction index of the medium n where the light is propagated. From (1.13) √ we know that the refraction index is n = r μr where r = 0 is the relative electrical permittivity, while μr = μμ0 is the relative magnetic permeability that in √ optical (nonmagnetic) materials we can assume it equal to 1, thus obtaining n = r . From (1.12), we have that the velocity of light propagation in a medium is v = c/n. Therefore, considering also = r 0 and replacing appropriately in (1.32) we obtain the intensity expressed as a function of the refractive index, given by nc0 2 (1.33) I = E 2 The two previous equations of intensity, with I express the amount of average luminous energy per unit of time (i.e., the power) that normally affects the unit surface of A. In other words, the luminous power that arrives on the surface is proportional to the product of the amplitude of the Poynting vector and the area of the cross section of the incident light beam. If the beam of light I i affects the same surface with an angle of incidence θi compared to normal, the area of the surface concerned with the incident, reflected, and transmitted light beam is, respectively, Acosθi , Acosθr , and Acosθt. Consequently on surface A (see Fig. 1.18), the incident, reflected, and transmitted light power is I i Acosθi , I r Acosθr , and I t Acosθt. The reflectance R is defined as the ratio between the reflected power and the incident power: R=
Ir Ir A cos θr = Ii A cos θi Ii
(1.34)
Similarly, the transmittance T is defined as the ratio between transmitted power and the incident power: It cos θt (1.35) T = Ii cos θi We observe that the propagation of the incident and reflected light takes place in the same medium (see Fig. 1.18). It follows that the velocity of the incident and reflected propagation is the same, so we have vr = vi having them indicated with v instead of c referenced for the propagation of light in the vacuum. For the same reason also the permeability r = i is equal. By substituting the equation of the Poynting vector (1.32) in the (1.34) and simplifying, the reflectance equation R becomes 2 Er = r2 (1.36) R= Ei thus reducing to the square of the amplitude of the reflection coefficient (reflectivity), defined previously by the first equations of (1.27) and of (1.28), respectively, for polarized light parallel to the plane of incidence and perpendicularly, so we have R = r 2 =
tan2 (θi − θt ) tan2 (θi + θt )
(1.37)
34
1 Image Formation Process
2 R⊥ = r ⊥ =
sin2 (θi − θt ) sin2 (θi + θt )
(1.38)
Similarly for transmittance T we can replace the equation of the Poynting vector (1.33) in (1.35) and simplifying, the transmittance equation T becomes n t cos θt 2 n t cos θt E t 2 t = (1.39) T = n i cos θi E i n i cos θi From (1.39), we observe that for the transmittance the equation is more complex for the following reasons: (a) Differently from the reflectance, the equation of the amplitude of the Poynting vector was used, which also includes the refraction index n, since the medium where refraction occurs has a refractive index n t that is different from the medium where the incident light propagates with a refractive index of n i . It also achieves a different propagation velocity in the two media. (b) The intensity is calculated per unit of the wavefront area. The incident and the transmitted wavefront are inclined with respect to the interface with different angles, respectively, θi and θt , unlike reflectance where the reflection of the beam occurs at the same angle as the incidence θr = θi . For polarized light parallel and perpendicular to the incidence plane, the transmittance equations T and T⊥ , by virtue of (1.39) are T =
(n t cos θt ) 2 t (n i cos θi )
(1.40)
T⊥ =
(n t cos θt ) 2 t (n i cos θi ) ⊥
(1.41)
where t and t⊥ are given, respectively, by the second equations of (1.27) and of the (1.28). The transmittance given by (1.39) may also be useful to express it in terms of the indices of refraction and only with the angle of incidence, as follows6 :
6 According
to Snell’s law, the relationship between the incident and the transmitted angle is
ni sin θi nt
Replacing sin θt calculated in the previous expression in cos θt = 1 − sin2 θt (the latter derived from the fundamental relation of trigonometry), we obtain 2 ni cos θt = 1 − sin θi nt n i sin θi = n t sin θt
=⇒
sin θt =
which is replaced in the equation of transmittance (1.39) we have (1.42).
1.13 Reflectance and Transmittance
⎡ T =⎣
35
n 2t − n i2 sin2 θi n i cos θi
⎤ ⎦ · t2
(1.42)
It is evident that with a beam of incident light perpendicular to the surface, situation of considerable interest in the practical reality, we have for θt = θi = 0, and according to Eqs. (1.34) and (1.35), the reflectance R and the transmittance T assume both values equal to the ratio of the relative irradiations, respectively (I r /I i ) and (I t /I i ). The relationship between the incident light power and the reflected and transmitted light power, through the interface of two media with a different refractive index, can be derived considering that for the conservation of the energy, the incident power must balance with reflected and transmitted power, i.e., Pi = Pr + Pt . With reference to Fig. 1.18, we observe that the conservation of energy expressed in terms of incident flow density I on area A requires the following equality: Ii A cos θi = Ir A cos θr + It A cos θt
(1.43)
Substituting the intensity I given by (1.33) and simplifying, the previous expression becomes n i E i2 cos θi = nr Er2 cos θr + n t E t2 cos θt =⇒ 1 =
Er Ei
2
2 Et n t cos θt n i cos θi Ei
+
(1.44)
T
from which, in the hypothesis of zero absorption, we have R+T =1
(1.45)
It is shown that the incident power can be expressed in separate components of the parallel electric field and perpendicular to the incident plane (i.e., Pi = Pr + Pt and Pi⊥ = Pr⊥ + Pt⊥ ), arriving at the result of (1.45), i.e., R + T = 1 and R⊥ + T⊥ = 1 Figure 1.22 shows the reflectance and transmittance curves, respectively, for the components of parallel electric field (Eqs. 1.37 and 1.40) and perpendicular (Eqs. 1.38 and 1.41) to the incident light plane for an air–glass and glass–air interfaces. The agreement with the equations R + T = 1 and R⊥ + T⊥ = 1 can be observed so that for each component of the electric field, the sum of transmittance and reflectance is always 1, thus guaranteeing the conservation of energy. It is noted that for θi = π2 , we have light incident grazing with reflectance equal to 1 and transmittance equal to zero. In other words, the light is entirely reflected as it happens in the case of a specular surface (perfect mirror). In the propagation of light with interface from denser (more refractive) medium to less dense (less refractive) we have to consider the incident limit angle (see Sect. 1.10.2) above which is generated the total reflection for which R = 1 and T = 0 as shown in Figure. For normal incident light, with θt = θi = 0, the area A of the incident and transmitted light is the same so that the factor of transmittance area no longer appears (see Eqs. 1.40, 1.41, and 1.44). It follows that, in accordance with the formulas of the
36
1 Image Formation Process
Air-Glass interface Parallel Polarization
1.0
0
0
30 60 θi in degrees Angle of incidence
Perpendicular Polarization
1.0
0
90
30
60 θi in degrees Angle of incidence
0
Parallel Polarization
90
Perpendicular Polarization
1.0
1.0
T
0
0
30
60
90
0
30
60 θi in degrees Angle of incidence
0
θi in degrees Angle of incidence
90
Glass-Air interface Fig. 1.22 Reflectance and transmittance in relation to the angle of incidence for an air–glass interface (n ti = 1.5) and air–glass interface (n ti = 2/3)
reflectance coefficients (1.30) and transmittance (1.31) calculated by normal incident light, the reflectance and transmittance ratios are reduced to the following: nt − ni 2 R = R = R⊥ = (1.46) nt + ni T = T = T⊥ =
4n t n i
(1.47) (n t + n i )2 The propagation of the normal incident light at the air–glass separation surface (n ti = 1.5), with ni < n t , reflects backwards 4% of the light, as it results from Eq. (1.46) 2 4·1.5·1.0 = 0.04 and transmits T = (1.5+1.0) obtaining R = 1.5−1.0 2 = 0.96, while 1.5+1.0 propagates it internally when ni > n t , with the glass–air interface (n ti = 0.67). This means that 4% of the incident light will be reflected backwards and will be lost. The optics of a normal acquisition system (camera) can contain several lenses that will lose in transmittivity each equal to 4%. In Fig. 1.23, you can see how the transmittance varies for different materials as the number of lenses increases. To mitigate this loss of transmissivity, the lenses are treated with special antireflection emulsions.
1.13 Reflectance and Transmittance 100
nt = 1.5
80
Transmittance “%”
Fig. 1.23 Transmittance through a number of surfaces in the air (n i = 1.0) with perpendicular incident angle
37
60
nt = 2.0 nt = 2.5
40
nt = 3.0
20 0
0
2
4
6
8
10
1.13.1 Angle of Brewster A particular case occurs when θi + θt = 90◦ , i.e., refracted and reflected light are mutually perpendicular (θr + θt = 90◦ ). In this case, r and R become zero (see Figs. 1.20 and 1.31, respectively), for a particular angle of incidence θi , as the denominator tan(θ i + θ r ) in the Fresnel equations (the first of 1.27 and 1.28) becomes infinitely great. This means that at this angle θi of the non-polarized incident light (with the perpendicular component E ⊥ and parallel component E of the electric field) is reflected only the perpendicular component E ⊥ parallel to the interface, the parallel component E is null, while it is also transmitted in the two components in the second medium. Consequently, the reflected light is totally polarized in a plane perpendicular to that incident (see Fig. 1.24). The corresponding angle of incidence, when this happens, is called θp the angle of polarization. Therefore, with θi +θt = 90◦ and for θ p = θi we have that θt = 90◦ − θ p and sin θt = sin(90◦ − θ p ) = cos θ p , and for Snell’s law we obtain n i sin θ p = n t sin θt
=⇒
n i sin θ p = n t sin(90◦ − θ p ) = n t cos θ p
from which the polarization angle results: nt tan θ p = ni
=⇒
nt θ p = arctan ni
(1.48)
(1.49)
In other words, when the angle of incidence is chosen in such a way that it equals the arcotangent of the relative refractive index n ti , the total linear polarization occurs in the reflected light. This phenomenon is known as Brewster’s Law and θp is also called Brewster’s angle7 and in fact represents a method of polarization of light for reflection. With Brewster’s law, we have a concrete method to determine the polarization axis of a linear polarizer. Although energy conservation is maintained
7 In
this context, the flow of electrons in the incident plane will not emit radiation in the angle required by the reflection law.
38 Fig. 1.24 Polarization of incident light on dielectric material (glass, water, plastic) with reflection and refraction according to the angle of incidence of Brewster θ p , which occurs when θr + θt = π/2. When the angle of reflection and refraction are perpendicular to each other, the reflected light is totally polarized with the electric field Er⊥ perpendicular to the plane of incidence (i.e., parallel to the interface formed by the two medium)
1 Image Formation Process
y Medium 1
Plane
of Inc
Ei Ei
θi
idenc e
θr Er
z π/2
x
θt Et Medium 2
Et
and the reflected light is completely polarized, it is experimentally observed that the reflected energy can be very low compared to the transmitted energy (which increases with the increase of θt ), thus compromising to determine correctly the polarization direction. An estimate of the polarization level R p can be determined by evaluating R −R the ratio R p = R⊥⊥ +R
which, according to Brewster’s law, results in maximum value when the angle of incidence of the light beam corresponds to that of Brewster. In nature sunlight (non-polarized) is often polarized by reflection with a polarization level that depends on the inclination of the sun’s incident rays with respect to the scene and an observer using polarizing filters can reduce considerably the annoying flares generated by the reflection of the emergent sunlight, for example, from a pond, lake, sea, road (see Fig. 1.25). With the experimental determination of the Brewster angle, it is then possible to estimate with the (1.49) a measure of the relative refractive index n ti = nnit associated to the two media (see Fig. 1.20b). In applications where the non-polarized light affects the air–glass interface, the polarization angle is about 56◦ while in the opposite direction with the glass–air interface θ p = 33.7◦ . For the air–water interface we have θp ∼ = 52.5◦ with the water refractive index n t = 1.33. Simpler and more widespread techniques (in photography, sunglasses, 3D viewers, ...), as the Polaroid 8 are used to
8 The
polaroid polarizing filters are transparent plastic laminae on which a thin layer of gelatine consisting of organic macromolecules in the form of long parallel chains has been deposited. Such molecules have the property of transmitting the electrical field component that normally oscillates to the direction of the chains (direction representing the axis of the polarizer), while the other components of the input light beam are entirely absorbed. Polaroids are also used as antireflective filters, suitably oriented so as to allow only the R component of the rays reflected from the surface (that is, the component that is canceled at the angle of Brewster). The functionality of the filter depends on how much more light emerges at an angle very close to that of Brewster.
1.13 Reflectance and Transmittance
39
P
Non-polarized sunlight
with rotation to 90° Filtered glow Polarized light
3°
θp=5 Angle of Brewster
Fig.1.25 Polarization of light by reflection. Typical situation of natural light with angle of incidence around the angle of Brewster and reflected on a smooth surface (in this example on the sea but can be a pond, road, . . .) where a human observer perceives with even annoying glows (image of left). The dominant reflected light (the glow) normally oscillates to the incident plane (parallel at the air–water interface, in this case) and is linearly polarized. The flashes dominant can be attenuated, as shown in the right image, through a polarizing filter whose polarization axis is rotated by 90◦ with respect to its initial position, parallel to the interface, with which the left image was acquired
polarize the light. Lasers incorporate Brewster angle-oriented windows9 to eliminate reflective losses on mirrors and thus also the production of polarized laser light.
1.13.2 Internal Reflection Total internal reflection is a phenomenon that can occur when the light travels from an optically denser medium (greater refractive index) to a less dense one (minor refractive index), as it happens, for example, in the glass–air interface or water–air. In Sect. 1.10.2, we have already described the phenomenon of internal reflection
9A
Brewster window is an uncoated substrate that is positioned at Brewster’s angle within a laser, instead of external mirrors. This substrate acts as a polarizer, such that the component R polarized light enters and exits the window without reflection losses, while the component R⊥ polarized light is reflected. Brewster windows are fabricated using UV-fused silica that does not have laser-induced fluorescence, which makes it well-suited for applications that use ultraviolet to near infrared light. With the introduction of a Brewster window in the laser path, the laser light hits the window at Brewster’s angle, thereby producing a completely polarized light. In general, the reflectivity of any uncoated glass plate at normal incidence is several percent. However, with the use of an antireflection coating, the percentage can be reduced to as low as 0.2%. Brewster windows can have over 10 times lower losses, comparatively.
40
1 Image Formation Process
explained through the law of Snell n i sin θi = n t sin θt . In the context n i > n t for an angle of incidence θi sufficiently large, it would occur nt sin θi =⇒ sin θt > 1 (1.50) sin θt = ni thereby violating Snell’s law of refraction. This particular situation can be interpreted as the physical event in which there is zero refraction of light and all the incident light energy is totally reflected by the interface in the densest incidence medium. This physical event begins as soon as θi reaches a large value (see Fig. 1.15) and the refracted light is grazing at the interface. In this situation, we have that θt = 90◦ and the incident angle, at which sin θt = 1, is called limit or critical angle θc , given by nt (1.51) sin θc = ni With values of the angle of incidence higher than θc the total internal reflection continues, that is, all the energy incident in the densest medium is totally reflected in the same medium as shown in Fig. 1.15e. We summarize the phenomenon of internal reflection as follows: when the light travels from an optically dense medium to a less optically dense medium, the light refracts away from the normal. If the angle of incidence gradually increases, it will be noted that at some point the refracted ray deviates very far from the normal and initiates reflection in the same medium of incidence rather than refraction. This happens every time the refracted angle predicted by Snell’s law becomes greater than 90◦ . For the (1.50), we have that θt > θi and consequently, by virtue of the first of the Fresnel equations (1.28), the reflectivity coefficient r⊥ is always positive, assuming the initial value of zero (according to the 1.30), and reaches the value 1 at the critical angle θc , as shown in Fig. 1.20b. In the figure, it is also noted that the coefficient r assumes negative initial values, according to the Eq. (1.30) with θi = 0 and reaches the value 1 when θi equals the critical angle (i.e., θi = θc ) in strength of (1.25) (which is reduced to the unit for θt = 90◦ ). From Fig. 1.20b, it is also noted that θc > θ p . This relationship between the angles is validated considering the Brewster angle equation (1.49) and that of the critical angle (1.51) from which it is found that tan θ p = sin θc = n ti and tan θ > sin θ =⇒ θc > θ p . The total internal reflection allows the transmission of light inside thin glass fibers (optical fibers) for long distances (see Fig. 1.26b). The light is reflected internally by the sides of the fiber (consisting of a central core of about 10 µm diameter and the refractive index of 1.5, coated with material having a refractive index of less than about 1.45) and, therefore, follows the path of the fiber. The light can actually be transmitted by entering the central core of glass with an angle of incidence higher than the critical one and propagated by continuous reflections on the core and coating separation interface. Obviously the glass fiber must not have such folds to interrupt the continuous internal reflection of the light that strikes the interface always with major incident angles of the critical angle. The entire field of optical fibers, with its many useful applications, is based on this effect. Another phenomenon of internal reflection (see Fig. 1.26a) is when an observer placed in a denser medium (for example water with index 1.33) can see in the other
1.13 Reflectance and Transmittance
41
(a) Observer external
(b)
Air Water
Internal cylindrical core 2,5μm
(High index of refraction: 1.5)
θc
Light Observer Sub External mantle (Cladding) (Low index of refraction: 1.475)
Seabed
40μm
Fig. 1.26 Total internal reflection of light. a An observer placed inside a more refractive medium (in this case the water) can see in the other medium (the air) the portion of the sky included in the cone that has as the normal axis to the water–air interface and passing through the center of the point of observation, and at the vertex angle of the double cone of the critical angle θc ; as soon as the direction of view is external to this cone, we observe the mirror reflection of the bottom on the surface of the water–air interface, due to the total internal reflection of the light, reflected from the backdrop itself. b The optical fibers exploit the phenomenon of total reflection to propagate within them the light that enters the central nucleus at a certain critical angle θc and through a series of reflections on the nucleus–mantle interface (made with silica polymers with the densest characteristics for the nucleus and less dense for the mantle) is propagated to the other end of the fiber
less dense medium (for example, the air) only the objects of the scene that are inside a cone with normal axis to the water–air interface and passing through its eye, whose angle at the vertex is equal to the double of the critical angle. Outside of this cone, the whole interface of separation appears to him in fact like a mirror reflecting the scene below the observer and outside the cone. Typical scenario when you are underwater (in the pool or at sea) and you look up seeing the vault of the pool or the sky, while just moving your head to watch from a greater inclination the horizon you observe the reflection of the background or the pool walls. In essence, the rays of light reflected from the bottom of the sea or the pool hit the water–air interface with an angle of incidence greater than the critical angle, then the total internal reflection occurs and the surface of the interface appears to the observer like a mirror. Another effect originating from internal reflection is the twinkle (or sparkle) that is observed in the cut diamonds. The diamonds, for which the refractive index is 2.42 (very dense material), have a critical angle θc which is only about 24.4◦ (diamond– air interface). The facets on a diamond are cut in such a way that much of the incident light on the diamond is reflected many times by subsequent total internal reflections before they emerge. These facets together with the reflection properties in the diamond–air interface plays an important role in the brilliance of a diamond gem. Having a small critical angle, the light tends to become trapped inside a diamond once it enters. Most of the rays approach the diamond at angles of incidence greater than the critical angle (as it is very small) so a ray of light will typically be subjected to total internal reflection, several times, before finally refracting from the diamond.
42
1 Image Formation Process
1.14 Thermal Radiation In any incandescent material (including the filament of a lamp and the solar sphere), the electrons reach an excited state of random and continuous motion with frequent collisions. This state of the material leads to the spontaneous emission of electromagnetic energy, which is called thermal radiation. This radiation is composed of a continuous spectrum of frequencies which, depending on the type of material, can be more or less large. Many light sources are based on the mechanism of thermal radiation. On the contrary, the light produced by the lamps gas is based on the principle that an electric discharge that passes through the gas produces an excitation of atoms that emit electromagnetic radiation or light with spectral characteristics related to the particular gas used. Experimentally it has been observed that the spectral distribution and the amount of irradiated energy depends substantially on the temperature of the material (see Fig. 1.27). In particular, it is noted that if one measures the spectral distribution associated with a given temperature of the material, there is a particular frequency ν or wavelength λ at which the irradiated power has a peak. It is also noted that the wavelength corresponding to the maximum value of the thermal emission varies in direct relation with the variation of the absolute temperature T of the hot body according to the law of Wien: λmax T = 2.8977685 · 10−3 mK
(1.52)
where such a constant was found experimentally. With the increase of T the maximum irradiation value occurs for decreasing values of λ. The curve through the maximum energy peaks (see Fig. 1.27) represents Wien’s hyperbole (also known as Wien’s displacement law).
Radiance R( λ ,T)
Fig. 1.27 Black body radiation curves. The hyperbole passing through the points of peak corresponds to the law of Wien
1.14 Thermal Radiation
43
Fig. 1.28 Definition of solid angle
z
θ
1m
2
dΩ
1m
y x
At room temperature (300 K), the maximum emission occurs in the far infrared (10 µm), and therefore not visible to the naked eye. At higher temperature, the maximum thermal emission shifts to shorter wavelengths corresponding to the high frequencies. Around the absolute temperature of 500 K, a warm body begins to be visible emitting radiations that correspond to the sensations before the yellow and at a higher temperature becomes increasingly visible toward the white and then toward the blue-white. The incandescent filament of a lamp reaches 3000 K and emits radiation dominants corresponding to the near infrared (≈λ = 10 µm). A measure of the ability of a source or surface to emit (or reflect, transmit) radiation expressed in terms of radiant flow per unit of surface is the radiometric magnitude of radiosity M(λ) (also referred to as radiant exitance or radiant emittance). On the other hand, when it is interesting to have a directional measurement of the radiant flux, emitted by an extended source, the radiometric magnitude of radiance R(λ) is used, expressed as a radiant flow per unit of surface and for steradian, i.e., the solid angle subtended by 1 m2 of surface of a sphere with a radius of 1 m (see Fig. 1.28). Radiance is also used to express the directional radiant flow emitted, reflected, transmitted, or received by a surface. It has been experimentally observed that the radiation value R of a hot body is also closely dependent on the value of the temperature T . From experimental measurements was derived the law of Stefan–Boltzmann (R = σ T4 ) which establishes as radiance increases in proportion to the fourth power of absolute temperature of the hot body. The experimentally determined constant σ , called Stefan–Boltzmann constant, is equal to 5.6697 × 10−8 W m−2 K−4 . A body in order to be in thermal equilibrium with the environment must emit thermal radiations in quantities equivalent to those absorbed. A hypothetical body that emits as much thermal energy as it receives presents the characteristics of being a good absorber and a good emitter (and vice versa). When the energy emitted by a body is equal to that absorbed, it is said that the body is a good absorber and is called the black body. In other words, the black body absorbs all the incident energy, irradiated by the environment, independent of the wavelength. In the laboratory a black body (ideal radiator) is realized like a small cavity (produced
44
1 Image Formation Process
in a wall) well insulated and kept at a constant temperature and the thermal radiations emitted by this cavity are equivalent to those emitted by the ideal black body. We now want to calculate the thermal radiation emitted through a hole made to the ideal black body hypothesized with a spherical cavity. If S(λ) is the density of the radiant thermal flux for a given wavelength λ, also called spectral density, the total energy density for a hole with unit area results: ∞ (1.53) R = S(λ)dλ 0
This radiation is emitted in all directions at speed c. For a solid angle d , in the direction θ with respect to the normal of the hole surface, it can be considered a fraction d 4π of that radiation. In a unit time interval, the energy that emerges from this solid angle is R · c cosθ d /4π . The total energy emitted RT , passing through the hole, propagated in all possible directions, or included in the solid angle corresponding to the hemisphere 2π , is given by π 2π RT = θ =0 φ=0
R · c · cos θ sin φ R·c dθ dφ = 4π 4
(1.54)
expressed in units of time and per unit of area. RT is the total radiance of a black body, and it follows that the spectral radiance is R S (λ) = RT (λ)c/4 expressed in Watts per m2 and for bandwidth intervals centered in λ. Radiance is also expressed in Watts per unit of area and unit of solid angle. The normal radiation at the hole surface is given by R R(λ)c R·c R = = R (λ) = (1.55) π 4π 4π Planck was able to derive the relationship to calculate the radiation emitted by a black body in perfect agreement with the experimental results. He sensed that excited atoms behave like electric oscillators, absorbing and emitting radiation in amounts proportional to the available ν frequencies of the electromagnetic spectrum. He also observed that the energy emitted by the black body exists only in multiple quantities of a small quantity possible called quantum which is equal to h · ν (previously defined as photon). According to Planck’s hypothesis, the energy radiated by the black body depends on the number of photons and this quantized energy can take discrete values for each frequency present as 0, hν, 2hν, 3hν. . . where we recall that h is Planck’s constant calculated according to the experimental data (h = 6.6256×10−34 J s). In essence, the hot body, compared to a state of thermal equilibrium, absorbs and emits radiation as quantized values of energy proportional to mhν. Planck derived the following formula that calculates the distribution of spectral energy emitted by the hot body: 1 2π hc2 (1.56) R(λ) = ehc/λkT − 1 λ5
1.14 Thermal Radiation
45
where c is the velocity of light in the vacuum and k is the Boltzmann constant. The laws of Wien and Stefan–Boltzmann (S-B) can be deduced from Planck’s previous formula: ∞ 2π 5 k 4 R(λ)dλ = (1.57) RT = · T 4 = σ T 4 (Law of S-B) 15c2 h 3 λ=0
where σ ≈ 5.67 × 10−8 W m−2 k−4 is the constant of S-B. Likewise, we can derive the Wien law which calculates the maximum value of λ or of the frequency ν associated with the value of maximum energy radiated for a definite value of the absolute temperature T : λmax ∼ = 2.898 × 10−3 km/T νmax ∼ = 2.820 · kT / h The black body emits thermal radiation in equal energy in all directions or does not depend on the angle of observation θ (Lambertian radiator). The black body is an ideal radiator model. In reality, a hot body brought to a certain temperature T emits thermal radiation in a lower percentage than the absorbed one or does not behave like an ideal absorber and emitter. All bodies in nature heated by the sun or artificially emit thermal radiation that correspond to the infrared (IR) region of the electromagnetic spectrum. Living beings are also good infrared radiators. The human body emits IR radiation with wavelength from 3000 to 10,000 nm. For different applications, it is useful to acquire images of the environment with infrared sensors (for example, with a camera) especially when you want to observe at night structures (people, animals, heat loss from buildings, etc.) not easily visible with the normal cameras. Detectors that accurately measure thermal radiation are called thermographers. Infrared-sensitive films ( n 1 ). Concave (or divergent or negative) lenses, on the contrary, have a minimum thickness on the optical axis and tend to diverge the parallel light rays that cross it. As a result of refraction, the divergent light rays seem to come from a point of the optical axis (called the focus of the lens), where the virtual image is formed, located in front of the lens. In fact, the refracted rays come instead from a distant object compared to the lens. Concave lenses have the function of modifying the shape of a beam of light that crosses them in a controlled way based on its characteristic parameters (optical axis, curvature radii, and lens shape). In Fig. 4.7 are summarized the two types of lens, converging and divergent, with a graphic representation of the effects of the refraction on the parallel rays incident with the focal points generated. As can be seen, the parallel rays associated with the front of plane waves are refracted, converging in the focal point F and forming the real image behind the convex lens (Fig. 4.7a), while they are refracted diverging as if they came from focal point F located in front of the concave lens (Fig. 4.7b) where the virtual image is formed. It is also noted that the wave fronts are modified: they become both spherical, converging for the convex lens and diverging for the concave lens. This physical phenomenon, better explained later, occurs because the light propagates with less speed in the glass than the air and the rays of light are more delayed when they pass through the lens in the thicker areas than the thinner areas. We specify that, as in the case of mirrors, the real image is located in a plane of the space, where all the refracted rays converge coming from the object and it is
4.3 Refraction of Light on Spherical Surfaces
(a)
185
(b)
F
F
Fig. 4.7 Image formation by refraction: a Convex lens (positive) and b Concave lens (negative)
possible to observe it by placing a reflecting screen. Any observer positioned in line with the refracted rays would see a representation of the object. A virtual image is instead formed in the space where the refracted light does not arrive and it makes no sense to place a reflective screen to observe the virtual image. An observer would still see the virtual image of the object, analogous to the virtual images produced by mirrors, if it were positioned along the divergent refracted rays that appear as if they came from the focal point F where the virtual image is formed (front to the lens). A good lens must converge the light coming from the objects generating a clear image. Normally the image formed presents some deformations and errors caused by some lens defects, known as aberrations, due, for example, to the nonspherical or imperfect symmetry of the lens. Such aberrations can be controlled by correcting the lenses adequately, while those due to the diffraction phenomenon remain. Now let’s see what happens (see Fig. 4.8) when a ray of light coming from the point S of the object hits the spherical surface of the lens which has a radius of curvature R and centered in C. Let A be the point of incidence relative to the ray S A that is refracted (n 2 > n 1 ) intersecting P in the optical axis. To have a clear image of S in P, every ray of light emitted by S must converge in P in the same time interval as proposed by Huygens, or for the Fermat principle applied to refraction, any ray coming from S makes a more optical path short to reach P. If S is immersed in a medium with a refractive index n 1 and P in a denser medium with a refractive index n 2 (i.e., n 2 > n 1 ), we have the following relation: lo n 1 + li n 2 = pn 1 + qn 2
(4.9)
where p is the distance of the point S (object) from the vertex V , q is the distance from V of the point P (the image of S), l0 is the length of the segment S A, and li is the length of the A P segment. These last two segments are in fact the optical path of the ray drawn in the figure. Figure 4.8 shows how the rays entering a medium with a higher index of refraction refract (according to Snell’s law) toward the normal of the incident surface and toward the optical axis. It is also assumed that at the same point P converges any other incident light ray, having the same angle θi as shown in Fig. 4.9. The optical axis is given by S P and the points S and P are called conjugated points, that is, also P can become a source and its image becomes S. In general, given an optical system
186
4 Optical System
Fig. 4.8 Image formation by refraction on a spherical interface
r i
A
lo S
h
α
li
R Φ
V
B
t
β
P
C
p
q n2 n1
Fig. 4.9 Incident rays with the same angle
S
P
are called conjugated points the object point (in this case S) and the corresponding image point (in the example the point P) given by the optical system. From the physical point of view, Eq. (4.9) is explained as follows. The light coming from the object S passing through the lens, which is a denser material than air, travels at a lower speed than when traveling in the air. When from S the light rays pass simultaneously from the vertex V and from point A, traveling both in the same medium n 1 , it is verified that the ray gets in V earlier (with speed c/n 1 ) and then propagates with a lower speed c/n 2 in the denser medium. The ray that arrives in A instead runs at the same time a longer path (lo + li > p + q) but converging in P at the same time. Once the values of p and q have been defined, the relation (4.9) becomes (4.10) lo n 1 + li n 2 = costant = stationary optical path It can be concluded that if a point source is positioned at point S, on the optical axis of a lens, the light rays converge in the conjugated point P obtaining a sharp (in focus) image of the source S that can be observed by placing in P one screen perpendicular to the optical axis of the convex lens. The obtained image of the S source is called the real image. The location of the image is determined by the quantities p and q known the radius of curvature R of the curved reflecting surface and by the phenomenon of refraction according to Snell’s law which takes into account the refractive indices of the means in which the light is propagated. A simpler way to derive the formula of the optical system to locate the image is a procedure similar to that used for reflection with curved mirrors. We will derive the Descartes formula also for the refraction with curved lenses. With reference to the graphical representation of Fig. 4.8, we now make the following consideration: for very small values (no more than 20◦ ) of the
4.3 Refraction of Light on Spherical Surfaces
187
angles α and φ, i.e., for paraxial rays (point of incidence A very close to the pole V ), the image of S formed in P is still sharp. In such hypotheses, we can approximate the values of the sine and cosine of such angles with the first order of McLaurin’s development (see Eq. 4.1). From Fig. 4.8 is observed that the angle of incidence θi is outside the triangle SAC which has internal angles α and φ, while for the triangle APC the angle φ is external and the internal angles are β and θt . For the known relationship between external and internal angles of a triangle we obtain θi = α + φ
φ = β + θt
⇒
θt = φ − β
(4.11)
According to Snell’s law, we have n 1 · sin θi = n 2 · sin θt
(4.12)
Considering the triangles SAB, ABC, and ABP, we obtain the following: tan α =
AB
tan φ =
AB
tan β =
AB
(4.13) SB BC BP By virtue of the approximations for small values of the angles α, β, φ, considering relations (4.13), the following relationships can be considered valid: AB α∼ = tan α = SB AB φ∼ = tan φ = BC
AB β∼ = tan β = BP sin θi ∼ = θi
sin θt ∼ = θt
(4.14)
and considering the negligible V B distance, we also have SB ∼ = SV ∼ =p
CB ∼ = CV ∼ =R
BP ∼ =VP ∼ =q
The rewritten Snell’s law becomes n 1 · θi = n 2 · θt
(4.15)
and replacing in this last equation, relations (4.11) are obtained: n 1 · (α + φ) = n 2 · (φ − β)
⇒
n 1 · α + n 2 · β = (n 2 − n 1 ) · φ
(4.16)
from which, recalling the approximation relations (4.14), we obtain AB AB + n2 · = (n 2 − n 1 ) SB BP BC from which simplifying and eliminating the common factors, we obtain the following relation: n1 n2 n2 − n1 + = (4.17) p q R n1 ·
AB
known as the Descartes formula for the refraction on a curved surface. It should be noted that this formula remains valid throughout the paraxial region around the optical axis of symmetry, regardless of the position of the point of incidence A. It also remains valid for concave refractive lenses in compliance with the sign convention
188
4 Optical System
Table 4.1 Sign convention of optical magnitudes Magnitudes
Sign+
Sign−
Distance p
Real object (in front of the lens)
Virtual object (back to the lens)
Distance q
Real image (to the right of the lens)
Virtual image (to the left of the lens)
Focus f
Convex lens
Concave lens
Radius R
Convex lens (C to the right of V )
Concave lens (C to the left of V )
Image orientation Straight image (above optical axis)
(a)
Image upside down (under optical axis)
(b)
S
P C
C
V
fo
V
fi
(c)
(d)
P V
C
C
fi
P
V
fo
Fig. 4.10 Refraction on curved surfaces: a Rays emerging from the focal point of the object are refracted parallel to the optical axis—Convex lens; b Parallel rays from object to infinity are refracted and converge in the focus of the image—Concave lens; c Convex lens, parallel rays from object to infinity and virtual image with rays emerging from the image focus; d Concave lens, virtual object with convergent rays in the object focus
adopted. In the hypothesis of incident rays on spherical surfaces, from left to right, the convention of the signs of optical magnitudes are reported in Table 4.1. With reference to Descartes’ formula for refraction with spherical lenses, we analyze three special cases. In the first case (see Fig. 4.10a), the image is formed at infinity, i.e., q = ∞ and n2 is annulled, the singular distance of the object from Eq. (4.17) since the term ∞ called the first focal length of the object f 0 is taken, resulting n1 R (4.18) f0 = p = n2 − n1 and point S is called the first focus or focus of the object.
4.3 Refraction of Light on Spherical Surfaces
189
(a)
S
S
V
P
(b)
P’
S
V
Fig. 4.11 Refraction on a spherical interface: a S at a distance greater than the focus of the object and its image P moves away from the vertex V (drawn in red); S reaches the focus of the object and its image P is formed at infinity (drawn in green); b S approaches the vertex V and the image P is real but opposite to the object
In the second case (see Fig. 4.10c), the object is considered ad infinitum. In n1 is canceled and the image is formed in P at the disEq. (4.17), the first term ∞ tance f i , called focal length of the image, given by n2 R (4.19) fi = q = n2 − n1 and the point P on the optical axis is called second focus or focus of the image. Likewise, we have for the virtual image (the rays diverge from the focus of the image as shown in Fig. 4.10b) and for the virtual object where the rays converge toward the object’s focus (see Fig. 4.10d). The virtual object is formed to the right of the vertex V , the distance p becomes negative and the virtual image is formed to the left of V with negative radius R as required by (4.18) with the focal distance f 0 negative. In the third case, if a very large p is assumed, for a fixed constant value of n 2 − n 1 /R for (4.17) would be associated a very small q value. If p decreases, the distance q increases, i.e., the image P of S moves away from the vertex V (θ i and θ t increase, see Fig. 4.8), until p becomes equal to f 0 and f i = ∞. In this case, n 1 / p = (n 2 − n 1 )/R and consequently, if p becomes even smaller, the focus of the image is formed to the left of V with the negative q value and with the (4.17) that would still be valid (see Fig. 4.11).
4.4 Thin Lens After examining the refraction of light for individual convex and concave surfaces with the derivation of the Descartes formula, let us now examine the refraction for thin lenses with double convex and concave surfaces shown in Fig. 4.6. The conjugated
190
(a)
4 Optical System
(b)
Fig. 4.12 a Refraction in a biconvex spherical lens and b geometry of the conjugated points
points for a biconvex lens with an index of refraction nl immersed in the air which has an index of refraction n a (with nl > n a ) can be located considering still valid Eq. (4.17). In Fig. 4.12, the refraction process is displayed when a ray crosses both spherical lens interfaces. A ray incident on the first face of the lens is refracted by entering the lens and inclined toward the normal surface of the lens at the point of incidence, and when it emerges from the second face, it undergoes further refraction, and since nl > n a , it propagates in inclined toward the optical axis moving away from the normal to the second face. In the hypothesis of paraxial rays that cross the first face of the lens of the S source located at the distance p1 from the vertex V1 , the image is formed in P1 at the distance q1 from V1 and the relation of the conjugated points becomes na nl nl − n a + = (4.20) p1 q1 R1 The second surface sees the rays (propagating in the lens with refractive index nl ) coming from P1 , which considers it as an object at the distance p2 from the vertex V2 , forming the image in P at the distance q2 . Moreover, the rays arriving on the second surface are immersed in the medium with refractive index nl , and the object space for this surface which contains P1 has refractive index nl . In fact, the light rays coming from P1 are straight without being refracted as if the first surface of the lens did not exist. In this context, the relation of the conjugated points recalling the (4.17), applied for the second interface considering now the object P1 , becomes nl na n a − nl + = (4.21) p2 q2 R2 where p2 = −q1 + d, R2 < 0 and nl > n a (see Fig. 4.12). It is observed that Eq. (4.21) remains valid considering that the distance q1 is negative since P1 is to the left of V1 . Combining the (4.20) and (4.21) and eliminating p2 , we have the relationship of the conjugated points for a thin lens: nl d nl 1 1 na + + = (nl − n a ) − (4.22) p1 q2 R1 R2 (q1 − d) q1
4.4 Thin Lens
191
For a thin lens, with negligible lens thickness (d → 0), the last term of (4.22) is negligible and considering the air refractive index n a = 1, the equation of conjugated points for the lens thin becomes: 1 1 1 1 (4.23) − + = (nl − 1) p q R1 R2 For a thin lens the vertices V1 and V2 can coincide with V , that is the center of the lens and the distances p and q are considered calculated with respect to the center V of the lens. Recall that, even for a thin lens, if the object is at infinity, the image is focused at the distance f i from the center of the lens (second focus), so we would have 1 1 1 1 (4.24) for p=∞ ⇒ = (nl − 1) − = q fi R1 R2 If instead the image is infinitely focused, the focus of the object is at the distance f 0 from the center of the lens (first focus) and we would have 1 1 1 1 (4.25) for q = ∞ ⇒ = (nl − 1) − = p fo R1 R2 From these last two equations, we deduce that f i = f o and therefore, we can define a single focal distance f (for the image and for the object) as 1 1 1 (4.26) − = (nl − 1) f R1 R2 The relationship of the conjugated points for the thin lens becomes 1 1 1 + = p q f also known as the Gaussian lens formula.
(4.27)
4.4.1 Diagram of the Main Rays for Thin Lenses In this paragraph, we will determine the position and size of the image of an object also extended through the tracing of the main rays in analogy to what was done with the mirrors. We know the validity of the Gaussian formula in the conditions of paraxial rays and in deriving the Gaussian formula, we have neglected the thickness of the lens. Before applying the tracking procedure of the main rays, we have to define how an incident ray is influenced when it passes through the optical center of the lens. Figure 4.13 shows the graph where two points A and B are chosen at will, on the surfaces of the biconvex lens, but such that the two rays C2 B and C1 A are parallel. The T tangents are also traced to the surfaces of the lens and passing through the arbitrary points A and B. It emerges that the refracted paraxial ray that travels the lens AB enters A with a given direction and abandons the lens in B in the same direction of entry. In the figure, the incoming ray Q is visualized, which undergoes the refraction AB and exits with the same direction of entry, thus being parallel.
192
4 Optical System
Fig. 4.13 Optical center of a lens
T
Q
A C
V B
C
From the similitude of the triangles C2 V B and C1 V B, it follows that their sides are proportional obtaining the following: C2 V
C2 B
=
|R2 | |R1 |
=⇒ |R1 | · C2 V = |R2 | · AC1 AC1 V C1 from which it can be deduced that the position of the center V of the lens remains constant and is independent of the particular chosen ray Q, considering that also the values of the radii of the lens remain unchanged (C2 B = C1 A). We have, therefore, shown that any incident ray passing through the optical center V of the lens emerges without undergoing angular deviations. However, it is shown that they suffer a slight lateral deviation. We are now able to present the procedure of tracing the main rays to geometrically determine the location and size of the image of an object placed in various positions in front of a biconvex lens. Let us specify that although we have considered pointlike objects up to now, the phenomenon of refraction in the paraxial ray conditions remains valid even for extended objects considered constituted by a set of point sources located in a plane (object plane) perpendicular to the optical axis at the distance p from the optical center of the lens. In this case, the image produced by the lens due to the refraction process is formed in a plane perpendicular to the optical axis placed at the distance q according to the Gaussian formula. The image (image plane) is the result of the refraction of all the collimated rays coming from the object (object plane). The main ray diagram is shown in Fig. 4.14 useful to follow the optical path of the rays of light from the object (starting from the extreme vertical point of the object) to the observer, note the position of the foci of the lens (two focal points having the lens two interfaces). Considering a biconvex convergent lens and the incoming rays from left to right, the three rules of refraction for the three main rays are =
(a) Ray 1, this incident ray, parallel to the optical axis of the converging lens, passes through it, is refracted and passes through the focal point Fi localized behind the lens. (b) Ray 2, this ray first passes from the focal point Fo in front of the lens, then passes through it, and emerges refracted in a direction parallel to the optical axis.
4.4 Thin Lens
193
1
1 2 2F
2
3 Fi
3
2F 2F
Fi
2F
Fig. 4.14 Optical path of the main rays to determine position, size, orientation of the image for a positive and negative lens
(c) Ray 3, this ray affects the lens passing from the optical center and emerges in the same direction without undergoing any deviation. We repeat the tracing of the optical paths of the main rays refracted by a diverging lens always with the incoming rays from left to right. The three rules of refraction for biconcave lens with appropriate clarifications are the following: (a) Ray 1, this incident ray, parallel to the optical axis of the divergent lens, crosses it, is refracted and continues in a direction aligned with the focal point Fo (to the left of the lens) from which it seems to originate. (b) Ray 2, this incident ray moves toward the focal point Fi to the left of the lens, then crosses it, and emerges refracted in a direction parallel to the optical axis. The dashed line would be the path without the lens. (c) Ray 3, this ray affects the lens passing from the optical center and emerges in the same direction without undergoing any deviation. Following these three rules based on the refraction (Snell’s law) and tracing the path of the principal rays is observed from Fig. 4.14 as from the intersection of the three rays the extreme point of the object is located in the image plane. It is also observed that the image point of the object is also determined by the intersection of only two of the main rays. For the concave lens, the three refracted rays are divergent and to locate the image point with their intersection, it is necessary to extend them backward from the left side of the lens (the dashed part). As can be seen from Fig. 4.14, the three rays seem to diverge from the image point. This procedure is obviously applicable for any point of the object to obtain the complete image. In this way, in addition to the position of the image, the orientation and size are also determined. Its position with respect to the optical axis is also observed. In the figure, since the whole object was positioned above the positive axis, the image, in addition to being inverted, is also located below the optical axis (negative height measurement). The orientation, localization, and image size information with respect to the parameters known from the lens (focal distance f ) and the object (distance p from the lens and height y0 with respect to the optical axis) are easily determined by observ-
194
4 Optical System
Fig. 4.15 Position of the object and image for a thin lens
ing (see Fig. 4.15) that the triangles AOFi and PQFi are similar and follows the following relation: y0 f (4.28) = |yi | (q − f ) From the similitude of the SRO and PQO triangles follows the relation: y0 p = (4.29) |yi | q From the relations (4.28) and (4.29), one easily obtains the Gaussian equation for the thin lenses: 1 1 1 = + (4.30) f p q And finally, from the similitude of the SRF0 and BOF0 triangles, we obtain the relation: |yi | f = (4.31) y0 (p − f ) that combining with (4.28) and considering the distances Z and z from the foci, we can derive the following relation: Z ·z = f2
(4.32)
from which it is possible to determine the focal plane distance z = f 2 /Z . Equation (4.32) is Newton’s formulation of the fundamental lens equation. By convention, we assume positive values of Z if the object is to the left of the focus F0 , while z is considered positive if the image is formed to the right of Fi (see Fig. 4.15). Therefore, it must be remembered that the object and the image must be on opposite sides of the respective focal points, so that the distances Z and z are positive. The inversion of the image (i.e., the inverted image) is considered by evaluating the sign of yi , which if negative, means object turned upside down in the image.
4.4 Thin Lens
195
The ratio between the vertical distances of the image point P and of the object point S with respect to the optical axis, respectively, indicated by P Q and R S, is called magnification factor M introduced by the optical system (see Fig. 4.15), namely PQ yi (4.33) = M= yo SR and for equality (4.29), we obtain M =−
q p
(4.34)
The minus sign indicates that the image has been reversed. The distances p and q are always both positive for real objects and images. This implies that the image formed by a thin lens is always inverted. Considering Eqs. (4.28), (4.30), (4.31), and (4.32), we can derive some useful relations to the analysis of the optical system: z (4.35) M =− f f Z
(4.36)
M M +1
(4.37)
M =− f =p
q = f · (M + 1)
(4.38)
(M + 1) (4.39) M The value of M assumes a value of 1 (no magnification) when the object and the image have the same distance and from Eq. (4.30) of the conjugated points, we observe that this occurs when p = q = 2 f . In this configuration, the object and the image are located at a distance of 4 f which is as close as possible. In Fig. 4.16 are shown the other possible configurations varying the distance of the object with respect to the thin lens. These equations are useful for various applications and will be chosen based on the knowledge of the optical parameters to calculate the unknown quantities. After having defined with the previous equations the characteristic optical magnitudes (object position and image, magnification factor, focal distances, object orientation and image) of the image formation process, we can now verify through a graphic method with the main ray diagrams as it is formed the image of an object placed at different distances in front of a converging or concave lens. With reference to Fig. 4.16, we begin to analyze the process of image formation first with a biconvex lens in the different cases considered: p= f
196
4 Optical System
(a)
(b)
3 2F
2
1
Fi
2F
Fi
2F
Fi
2F
2F 3
(c)
(d)
Fi 2F
2F 2F
No image
Fig. 4.16 Image formation with a thin convex lens: in a a red tree is shown at a distance of more than 2F and a green tree at a lower distance of F; b Tree the same size as a but at a distance 2F; c at a distance between 2F and F; and d at a distance equal to F where the image is formed at infinity
1. Object at distance p > 2F (Fig. 4.16a red tree), the image is formed behind the lens at the distance between F and 2F, is inverted (yi < 0), reduced (M < 1), and Real (q negative). The object would be projected on a screen placed in the focal plane at the distance q. This configuration is typical for image formation in a camera. 2. Object at distance p = 2F (Fig. 4.16b), the image is formed behind the lens at the distance of 2F, it is inverted and real. The image remains the object’s dimensions (M = 1). The object would be projected on a screen placed in the focal plane at the distance 2F. 3. Object at distance between 2F and F (Fig. 4.16c), the image is formed behind the lens at a distance greater than 2F, is inverted, enlarged (M > 1), and real. The object would be projected on a screen placed in the focal plane at the distance q. This configuration is typical for a slide projector where generally the frame to be projected is very small. 4. Object at distance p = F (Fig. 4.16d), no image is formed according to the law of refraction. The rays refracted by the lens emerge parallel to the optical axis and cannot form the image (z = ∞). 5. Object close to the lens with p < F (Fig. 4.16a green tree), the image always forms in front of the lens at a distance greater than F, far from the object, is straight (yi > 0 and for Eq. (4.36) we have Z < 0 ⇒ M positive), enlarged (for Eq. (4.36) we have |Z | < f ⇒ M > 1). The image is virtual ( p < F and from (4.35) with M > 0 we have that z < 0), in fact the refracted rays diverge and the location of the image is found tracing backward the divergent rays (behind the
4.4 Thin Lens
197
Table 4.2 Convex thin lens Object
Image
Position p
Type
Position q
Orientation
Dimension
p > 2F p = 2F
Real
F < q < 2F
Inverted
Reduced
Real
q = 2F
Inverted
Unchanged
F < p < 2F
Real
q > 2F
Inverted
Magnified
p=F
No Image
p p
Erect
Magnified
object) and to an observer seems to emerge from the focal plane. Obviously, by placing a screen in the focal plane, no projection of the object would be seen, no light rays arriving. This optical configuration is typical when observing an object with a magnifying lens. Table 4.2 summarizes the optical sizes that characterize the image formation of an object when placed at different distances in front of a thin convex lens. Figure 4.17 shows the main ray diagram for a biconcave lens repeating the same procedure as convex lenses to calculate the position, size, and orientation of the image. From the examples considered, it emerges that for a divergent lens, the image of an object is always virtual, reduced, straight, and always localized in front of the lens on the object side. Contrary to converging lenses, divergent lenses do not modify the characteristics of the image with the variation of the object’s distance from the lens. It can be observed from the examples shown in Fig. 4.17 that with the approach of the object, the characteristics that modify the image are increasing dimensions but always reduced compared to the object, and also when approaching of the image toward the lens. From what emerged we deduce that an object observed with a divergent lens will always be seen as minified but always straight. In reality, there is no thin lens but convex and concave lenses with a very small thickness that act optically like a thin lens. On the other hand, commercial optical systems are composed of an assembled group of convex and concave lenses. An optical system with N lenses having focal lengths f 1 , f 2 , . . . , f N can be seen as so many contact lenses which constitute a single thick lens and with an effective focal length f such that it satisfies the following relationship: 1 1 1 1 + + ··· + . (4.40) = f f1 f2 fN
4.4.2 Optical Magnification: Microscope and Telescope In the previous paragraph, we have examined the optical configuration of a convex lens with object in front of the lens at a distance less than F with the relative main ray diagram illustrated in Fig. 4.16a. This configuration, corresponding to case 5,
198
4 Optical System
(a)
(b)
Fi
2F
2F
Fi
2F
(c)
Fi
2F
2F
Fig. 4.17 Image formation with a concave thin lens for three different object–lens distances: a shows object placed at a distance greater than 2F; b Object of the same size but at a distance 2F; c Object of the same size but at a distance of 2F and F
is known as magnifying glass or simple microscope. The virtual image generated is at the distance q, which can be evaluated with the equation of Descartes known the values of the focal f of the lens and the distance q of the object or with Eq. (4.37) if the magnification factor M is also known. This virtual image would be visible to an observer if q corresponded to the minimum distance of the distinct vision that for a normal person is about 25 cm. Figure 4.18a schematizes the optical configuration of the human eye considering the crystalline equivalent to a convex lens with adaptive capacity (change the focal distance) to focus on objects infinitely (far point, relaxed ciliary muscle) up to a minimum distance L (near point, 25 cm) ensuring a clear image. For distances below the next point, the image would be out of focus. The maximum magnification of the eye results from M=
2 qe = = 0.08 p 25
(4.41)
considering the geometry of a normal eye with qe = 2 cm and the near point with p = 25 cm. Observing the same object by inserting a convex lens as shown in Fig. 4.18b, we have the configuration of a simple microscope that allows to obtain an erect virtual image with a magnification of 8–10 times (8X–10X) higher than the naked eye even if the object is at a distance from the lower eye to the near point. To calculate the magnification M in this case, we will not use the equations of the thin lenses, as the crystalline lens cannot be considered as a thin lens. If we consider the angle α under which the eye would see the same object with the naked eye always placed at the near distance, the angle β under which the eye sees the
4.4 Thin Lens
199
(a)
(b) A α B
yi
yi
B
β
L= 25cm p
β
F
A
B A
p qe
f
q L
Fig. 4.18 Optical eye-magnifying lens configuration: a Object observed with the naked eye; b Object viewed through a magnifying lens 25cm L
fe fe
A
ai
a Fe
B
b
yi
bi
Fig. 4.19 Diagram of a compound microscope
virtual image with the magnifying lens, then the latter introduces a magnification angular measured as the following ratio: β Mang = (4.42) α We can consider p almost equal to f ( p ≈ f ), the angles α and β generally small and with good approximation, we have yi yo yo yo β ≈ tan β = ≈ ≈ α ≈ tan α = (4.43) q p f L The Eq. (4.42) considering the previous approximations becomes β L yo L 25 cm = ≈ · = (4.44) α f yo f f which shows us how the angular magnification Mang increases when the focal length f of the lens decreases. In practice, the focal length of the lens must be comparable with the value of the distinct distance (25 cm) to limit the chromatic aberrations (introduced by the different spectral frequencies of light) that we will discuss in the following paragraphs. Higher magnifications can be achieved by adding a further convex lens thus creating a compound microscope (see Fig. 4.19). In this optical configuration, the AB object to be observed is placed in front of the added lens called objective lens, at a distance greater than its focal length f o . The image ab supplied by this lens falls in Mang =
200
4 Optical System
case 3 of those shown in Fig. 4.16c. In fact, as foreseen by the relative diagram of main rays, it is enlarged, inverted, and real. Moreover, the optical configuration is such that this image is formed at a suitable distance from the lens and in front of the other lens, called an eyepiece lens, at a lower distance of the focal f e of the latter. The image ab becomes the object for the eyepiece that forms the final virtual image ai bi , inverted and very much enlarged with respect to the AB object. The configuration of the two lenses is controlled in such a way that the eyepiece is at a distance from the object equivalent to the distance of the distinct vision (about 25 cm). It should be noted that the focal length of the objective lens is normally smaller than the focal length of the eyepiece lens and both focal lengths are much smaller than the distance between the lenses. Considering the hypothesis of paraxial rays that M = −z/ f (Eq. 4.35), the magnification Mo of the image ab relating to the objective is given by ab L L − fo − fe ≈− (4.45) = fo fo AB where the approximation is motivated by the fact that L >> f o , f e . The magnification Me due to the eyepiece is given by Mo =
Me =
ai bi
≈
25 fe
ab The Mmc magnification of the compound microscope is given by MMC = Mo · Me =
ai bi
=−
(4.46)
L · 25 cm fo · fe
(4.47) AB where the values of L, and focal distances are expressed in cm. The total magnification MMC is negative with the final image inverted according to the rule of the sign adopted (incoming rays from left to right). With an optical configuration that includes a small focal lens (about 0.3 cm), an eyepiece with a focal length of 25 cm and an L value of 16 cm, there is a total magnification of MMC = 5X · 10X = 50X . In Fig. 4.19, the object AB is far from the focal point Fo . In reality, if the microscope is configured to have the ab image of the AB object in the focal point of the eyepiece, it is shown that the AB object must be placed in the focal point Fo of the objective. In fact, applying the Gaussian equation to the objective, we would have 1 1 1 = + fo po L − fe from which resolving with respect to the distance po of the object from the objective is obtained: 1 1 1 L − fe − fo 1 L = − = = ≈ po fo L − fe f o (L − f e ) L fo fo where the approximation is motivated by considering L f o , f e and therefore, the object will have to be placed near the focal point Fo from the objective lens, i.e., at the distance po ∼ f o . The real magnification of a compound microscope is conditioned by the resolutive power (i.e., the ability to see distinct two distant points of the object also in the image)
4.4 Thin Lens
201
Objective
fe
eyepiece
far object rays
A α
C
ai a
B b
f
Focal plane
G
D
E β
fe
distinct vision distance
bi
Fig. 4.20 Diagram of principal rays for a telescope. The black arrow AB is a representation of the image of the far object, the green arrow ab in the focal plane indicates the inverted real image of the far object acquired by the eyepiece, while the red arrow represents the virtual image which is formed on the retina of the observer where the object will appear very enlarged by a factor expressed by Eq. 4.49
of the optics in turn conditioned by the phenomenon of diffraction of light rays that will be examined in the next paragraphs. An optical telescope, sometimes also called a spyglass, unlike the microscope that magnifies small objects observed closely, has an optical configuration of the two lenses (objective and eyepiece) to observe objects from far away. Figure 4.20 shows the optical configuration of a refractive telescope or refractor consisting of an objective lens with a focal length that is normally very large. Given the considerable distance of the AB object, its real image ab is formed in the focal plane f o of the objective. The eyepiece convex lens has a much smaller focal f e and having as its object the image ab (formed by the objective lens) is positioned in such a way that the final image ai bi is created which is at a distance from the eyepiece equal to the minimum distance of the distinct vision. The eyepiece functions like a magnifying glass which allows the eye to see the far object very much enlarged. In fact, it is placed at a distance of its focal length f e from the image of the object ab formed by the objective lens, converting the divergent spherical wavefront of light into the retinal plane (observer). The eyepiece also increases the apparent angle of incidence (α → β), making the object appear larger in the retina (see Fig. 4.20). In this case, the adjustment is performed by moving only the eyepiece lens, as the objective lens cannot have any influence with its displacement (considering the object very far). For the telescope, linear magnification has a marginal value since the image is smaller than the object. The measure of the magnification of interest is the angular MTL given by the ratio between the angle β that is that with which the observer sees the entire object from the eyepiece (see in Fig. 4.20 the angle between
202
4 Optical System
the two rays passing through the first focus Fo ) and the angle α under which the observer sees the object with a naked eye. Therefore, we obtain β MTL = (4.48) α Given the optical operation with paraxial rays, i.e., with very small angles, the following approximations are obtained by considering the triangles Fo1 CD and Fe2 EG, and the relations with the tangents: α ≈ tan α =
CD
fo and replacing in Eq. (4.48), we obtain
β ≈ tan β =
EG fe
fo (4.49) fe From this last equation, it is observed that a considerable magnification is obtained with high focal lengths of the objective lens with respect to the eyepiece. Also, the functionality of the telescope remains conditioned by the resolutive power (i.e., the ability to see distinct two distant points of the object also in the image) of the optics, in turn conditioned by the phenomenon of diffraction of the light rays. MTL = −
4.5 Optical Aberrations Optical systems made with a group of lenses produce images that are not perfect because the light propagates through thick lenses violating the laws of propagation (reflection and refraction) of light, taken for a thin lens with the paraxial theory. To obtain a quantitative measure of how the optical system introduces deformations in the image, these defects are analyzed which are called optical aberrations. Therefore, we have 1. monochromatic aberrations, which occur for any type of light (color or single frequency) and produce unclear images due to spherical aberrations, astigmatism, coma, field curvature, distortion; 2. chromatic aberrations are due to the different index of refraction of the lenses with respect to the different monochromatic radiations that cross them (see Fig. 4.21). Chromatic aberration occurs when white light, composed of various spectral components, passes through a lens and suffers a dispersion, i.e., each spectral component is refracted with a different angle of refraction. In Fig. 4.21a, we can observe the effect of light scattering when it passes through a convex lens that actually decomposes as a prism the various spectral components that are refracted with different angles of refraction, not converging in the focus as expected from the equation of thin lenses, that here we rewrite 1 1 1 = (nl − 1) − f R1 R2
4.5 Optical Aberrations
203 Blu
Axial chromatic aberration FR
Red
FB
FR
FB B R
Blu
(a)
(b)
FR
FB
Transversal chromatic aberration
F
(c) Fig. 4.21 Chromatic aberration of the lenses. a Convex and concave lens; b axial and transverse aberration; c Chromatic aberration correction
In fact, it occurs that the focal length f of a lens varies with the variation of the wavelength λ of the spectral components of the light passing through it as the refractive index nl (λ) varies. The blue spectral component has a greater refraction than the red and the rays of each component intersect the optical axis in a range (place of the focus of each spectral component) called axial chromatic aberration. In reality, it happens that the components of the spectrum with shorter wavelength (side of the spectrum toward the violet) are refracted more than those with longer wavelength (toward the red), causing blurring on the focal image plane. The effect of this aberration manifests itself by generating a fringe of colors on the contour of the image (image not in focus). Another source of chromatic aberration concerns the vertical displacement of the refracted rays at different wavelengths, introducing slight variations in the dimensions of the images (Fig. 4.21b). To reduce this axial chromatic aberration, optical systems are made up of converging and divergent lenses (made with various types of glass) which reduce the dispersion of the various refractions (as illustrated in Fig. 4.21c). In other words, we try to realize achromatic optics that are particularly useful in the photographic sector. Spherical aberration occurs in optical systems with circular symmetry (spherical mirrors, normal lenses, etc.) for which the rays coming from the peripheral areas and the central areas of a lens or mirror (called the paraxial zone or near to the axis of the lens) do not converge in one point, but their envelope presents a figure of revolution called caustic refraction for convex lenses and caustic reflection for concave lenses. To mitigate the effects of spherical aberration normally the diaphragm adequately fits the optical system (Fig. 4.22). Astigmatism aberration is obtained when the point-like object (or source) does not lie on the optical axis. The object and the optical system do not have a rotational symmetry with respect to the optical axis, it follows that also the emerging beam of the optical system is no longer conical, but has only one plane of symmetry represented by the object and the optical axis. The beam sections are no longer circular,
204
4 Optical System
Transverse spherical aberration
Caustics by refraction
Fi
Axial spherical aberration
Fig. 4.22 Spherical aberration of convex lens Sagitta
l plan
Tangential focal plane
e
Lens le Ray
Tangentia
e tal git Sa Optical axis io g g Ra
Circle of Confusion Sagittal focal plane
FT FS
Tangential plane
Tangenti
al plane
Point object
Po in
Sagittal plane
to
bje c
t
Fig. 4.23 Aberration of astigmatism
but are elliptically smaller and smaller as you move away from the optical system to a segment. This phenomenon is evident by adequately restricting the circular diaphragm. Once the segment is formed, the beam continues to increase in size and its elliptical shape widens indefinitely (Fig. 4.23). The coma aberration (a particular form of astigmatism) occurs when the object and the diaphragm are displaced with respect to the optical axis. The sections of the emerging beam are not elliptical in shape but assume the configuration of a comet.1 The aberration of field curvature generalizes the aberration of astigmatism. It is highlighted by placing an object that is no longer point-like, but plane, and for each point of the object there are two focal lines. The two corresponding enveloped surfaces are curved and constitute the places of the shaded images corresponding to the points of the object. The field curvature name derives from the curvature of these surfaces that are not flat. To reduce this aberration, try to make the curved surfaces coincide on a plane (Fig. 4.24). Distortion aberration is generated in optical systems for which linear transverse magnification is not constant, but may vary depending on the distance of the object
1 The
term coma derives from the comet name.
4.5 Optical Aberrations
205
Fig. 4.24 Aberration of field curvature
Fi
Image plane paraxial
Fig. 4.25 Distortion aberration: pincushion (the visible effect is that lines that do not go through the center of the image are bowed inward, toward the center of the image) and barr el (the apparent effect is that of an image which has been mapped around a sphere or barrel)
Barrel distortion
(negative displacement)
Pincushion distortion (positive displacement)
from the optical axis. For example, for a straight segment (not passing through the optical axis) the image produced is distorted as each element of the segment undergoes several enlargements. Pincushion distortions occur when the magnification increases with the distance of the object from the axis, vice versa the magnification decreases when the object approaches the optical axis and generates the barrel distortion (see Fig. 4.25).
4.5.1 Parameters of an Optical System The first two parameters, already discussed, that characterize an optical system are the focal length f and the diameter D of each lens. Together they determine the maximum amount of light that can reach the image plane (image brilliance). The amount of light that passes through the optical system (SO) can be controlled by the stop diaphragm (AS, Aperture Stop) while the stop field angle (FS, Field Stop) limits the image portion of the observed object, i.e., angle of view of the optical system (see Fig. 4.26). In an optical system with many lenses are also considered other diaphragms (physical and virtual), called entrance and exit pupils that limit the crossing of the light rays paraxial and not.
206
4 Optical System
(b)
(a)
Object plane
Entrance pupil
Image plane
jec t Ob
ne pla Solid angle ΩI A
P
α
Ai P’
p f
Ima
ge p
lan
e
AS
FS
Fig. 4.26 Magnitudes of the optical system. a Brightness of the image in relation to optical parameters; b Image light flux control diaphragms: AS aperture stop open and FS field stop
The amount of light striking the image plane is related to the luminous flux density emitted by the object’s surface and by the characteristics of the optical system. Let’s see now what are the quantities that affect the brightness of the image. If we indicate with E O the luminous flux incident on an extended object with surface A O , with L I the luminance of the image of dimensions A I , assuming that the optical system does not absorb luminous energy, we will have the following equalities of the luminances2 : LO = LI
=⇒
EO EI = O I
(4.50)
where O and I are the solid angles of the object and image, respectively, that define the luminous flux that passes through the optical system. As shown in Fig. 4.26 the light energy emitted by an element of the surface of the object illuminates the corresponding point in the image through the two solid angles which have as base of the cones the stop opening (with area A) coinciding with the diameter D of the lens. Note the distance p of the object from the optical system and the focal f , the solid angles are given by O = A/ p 2
and
I = A/ f 2 ,
respectively, for the object and image side. Considering the extension of the object (i.e., its area A O ), the area of the image A I (through Eqs. (4.35) and (4.34)) and replacing in (4.50), we will have EI EO = O · A O I · A I In the hypothesis of luminance invariance E O = E I that is the optical system (zero absorption of light) conveys on the image plane all the luminous flux emitted by the
2 We recall from Chap. 2 that the luminance is used to characterize emission or reflection of light from a surfaces. In particular, the luminance indicates how much luminous power will be detected by a sensor looking at the surface from a particular angle of view. Luminance is thus an indicator of how bright the surface will appear.
4.5 Optical Aberrations
207
object and replacing the solid angles, we have 1 A p2
· AO
=
1 A f2
· AI
;
⇒
p2 AO = ; f2 AI
⇒
p 2 · A I = f 2 · A O (4.51)
from which it emerges that the solid angles having in common the base area of the cones are different only for the different distance due to the refraction of light rays that has expanded the solid angle relative to the image. The ratio between the solid angles is given only by the ratio of distances p 2 / f 2 which expresses the power of the optical system, i.e., the ability to focus light in the image plane. If p becomes very large compared to the focal f , as in the case of telescopes or binoculars, the solid angle relative to the image becomes wider and the power of the optical system increases with decreasing focal length. The image area of an extended object is proportional to f 2 and the higher the optical power, the smaller the image. In geometric optics, two magnitudes are introduced to characterize an optical system: the f/number (f/#) and the Numerical Aperture (NA). The f/number (also called f /stop or f /ratio) or aperture opening or simply opening is given by the ratio of the focal f to the diameter D of the optical system: f f /# = (4.52) D where actually D indicates the diameter of the entrance pupil (effective aperture). Most lenses have an adjustable diaphragm, which changes the size of the aperture stop and thus the entrance pupil size. The aperture stop is the aperture setting that limits the brightness of the image by restricting the input pupil size, while a field stop is a stop intended to cut out light that would be outside the desired field of view and might cause flare or other problems if not stopped. The diaphragm is a mechanism to limit the amount of light that passes through the optical system to reach the image plane where the sensitive elements are located (photoreceptors of a CCD camera or photographic emulsion). The diaphragm consists of many lamellae hinged on a ring that rotate in a synchronized manner, varying the size of the circular opening, thus limiting the passage of light. f/number varies from a minimum value when the diaphragm is completely open (corresponding to the diameter D of the optic) to a maximum value obtained by maximizing the diaphragm or minimizing the diameter of the diaphragm. The scale of values of the aperture stop f/# varies according to a geometric √ progression of powers of the square root of 2 whose first value is f /1 = f /( 2)0 and continues with the values f/1.4, f/2, f/2.8, f/4, f/5.6, f/8, f/11, f/16, f/32, f/45, f/60, . . . in correspondence of which there is a luminance in the image plane (expressed in cd/m2 ) of E, E/2, E/4, E/8, E/16. . . This scale of values has been internationally standardized and all commercial optical systems (Fig. 4.27) have the same diaphragm scales as indicated above. All the objectives on the market show the focal size on the outer rings (example f = 50 mm) expressed in millimeters and the value of the maximum aperture of the lens itself (for example a = 2) which defines its brightness. The more expensive lenses
208
4 Optical System
Fig. 4.27 On the left an example of diaphragm and relative openings in terms of f /#. On the right, a 35 mm objective set to f /11 as indicated by the white dot. This objective has an opening range from f /2.0 a f /22
with the same focal length have a higher brightness, i.e., the maximum aperture with a very small f/# value. The scale is motivated by the fact that between a value of the opening and the next one, you should be able to close diaphragm in order to pass a quantity of light halved with respect to the current value (example f /# equal to 2). This is achieved by reducing √ the diameter of the aperture stop, through which the light passes, by a factor 2 = 1.4 which, as is known, the circular section is proportional to the square of the aperture itself. Normally, the optical system is dynamically configured to project a certain amount of light onto the sensitive surface of an image acquisition system, regardless of the type of sensitive device used, film or photoreceptors of a camera. This is accomplished by varying the f /# diaphragm and compensating with the exposure time set appropriately by the shutter. For example, the same amount of light is obtained by setting the acquisition system with f /2 e 1/1000 s, f /2.8 and 1/500 s, f /4 and 1/250 s, etc. It is observed how passing from a diaphragm value to the next one halves the passing light but doubles the exposure time of the sensitive area of the system to always project the same amount of light. Depending on the type and dynamics of the scene, the diaphragm-exposure time pair is selected appropriately. A scene with moving objects is acquired by selecting the parameter pair with very small exposure time if the blur effect in the image is to be avoided. On the contrary, a very large diaphragm is selected to increase the depth of field, i.e., increase the interval between the minimum and maximum distance of objects that are still in focus in the image plane of the acquisition system. The numerical aperture NA is given by the following relation: NA = n · sin α
(4.53)
where n indicates the index of refraction of the medium at the cone with half angle α formed by the emerging rays (marginal with respect to the optical axis) from the object and the optical system. With the help of Fig. 4.26b, we can see that the angle α
4.5 Optical Aberrations
209
depends on the position and diameter of the entrance pupil, from which then depends the quantity of luminous power that crosses the optical system and arrives on the plane of the sensor. The entrance pupil is the image of the aperture stop AS formed by the optical components between the object plane and the aperture stop. Under conditions of paraxial approximations (with very small α) being α∼ = tan α ∼ = sin α, we have 1 1 f ∼ = NA = n · sin α ∼ = =n·α nD 2n tan α 2nα and putting in relation f /# and numerical aperture NA, we obtain f /# =
NA =
1 2 f /#
(4.54)
Finally, we can point out that the power of the optical system can be increased by decreasing the value of f /# by increasing the diameter or increasing the numerical aperture NA, i.e., by increasing the angle of the beam cone entering the optical system by associating an index of refraction with highest value.
References 1. C. Velzel, A Course in Lens Design (Springer, 2014) 2. E. Hecht, Optics, 4th edn. (Addison-Wesley 2002)
5
Digitization and Image Display
5.1 Introduction So far we have analyzed the fundamental theories underlying the mechanisms of image formation: the paraxial theory of optical systems and the light–matter interaction for radiometric, photometric aspects including color theory. The set of these theories (discussed in the previous chapters) have led to a result that can be summarized in two words: from the Energy to the Image. Figure 5.1 schematizes this result: the optical system through the phenomenon of refraction or reflection projects the light energy of the scene onto the image plane. The image thus obtained is called Optical Image that we have seen can be real or virtual. We are now interested not only in the visual observability of the image, but of the real image; we are interested in acquiring it by saving it on different media (films) and electronic devices (magnetic media, solid-state memories, etc.) to be subsequently displayed and/or processed on a computer, to extract information of interest from the observed scene (for example, the object recognition). Basically, from the optical image, which contains the light intensity information of the scene, we intend to acquire an analogical or digital version to be usable on the various electronic devices. This task is carried out by the image acquisition system which, through the sensory and storage components, converts the optical image into analog and digital formats. We will analyze some systems of image acquisition starting from those of nature and then those developed over the years with the evolution of sensory technologies, memories and processing. Nature, through the visual systems of living beings, offers a broad spectrum of image acquisition systems with different levels of complexity. Almost all animals show a particular sensitivity to light radiation, even those without photoreceptor organs (Protozoa). There are visual organs with simpler forms that limit themselves to distinguishing the darkness from the presence of light, up to the most evolved ones capable of perceiving the shape and movement of objects. © Springer Nature Switzerland AG 2020 A. Distante and C. Distante, Handbook of Image Processing and Computer Vision, https://doi.org/10.1007/978-3-030-38148-6_5
211
212
5 Digitization and Image Display
Fig. 5.1 Pinhole camera geometry: with different values of the hole diameter, different degrees of image blur are obtained Pinhole
For example, the visual organs of some animals determine only the intensity of light. In others, the visual organs are made up (jellyfish) of multiple cells located in a cavity that are stimulated at different times by external light due to the movement of an object. When there is a deeper cavity with a very narrow slit without lens (pinhole model, see Fig. 5.1), an image can be formed on the opposite wall as in the so-called darkroom (prototype of a camera). The size of the hole affects the quality of the image obtained. A strong blurring of the image has a hole size greater than 0.5 mm due to the non-convergence of light rays even if the depth of field is good. Similar blurring occurs for much smaller holes due to the phenomenon of diffraction. This principle of image formation was known since the time of Aristotle. Leonardo Da Vinci describes it in his notes and Giovanni della Porta describes it in detail in “Magic Naturalis”. An evolution of the visual organs occurs when dioptric means are formed at the fissure, i.e., a lens, as found in vertebrates thus obtaining a brighter and higher quality image. A further evolution is found in the visual organs composed of Arthropods consisting of a multiplicity of small eyes each composed of its own dioptric means (cornea, crystalline cone) and photoreceptor elements (retina). In vertebrates, the visual system becomes even more evolved. Figure 5.2 shows a scheme of the elementary eye called ommatidium that assembled form the eye composed of many Arthropods. The eye consists of a bi-convex corneal lens of transparent chitin, under which four cells form a second lens, the crystalline, homogeneous, and transparent cone. It follows the retinula consisting of 7–8 photoreceptor cells with elongated and thin forms. Each of these latter cells transmit stimuli to the dendrite of the driver cell via the optic nerve to the brain. An ocular globe is found, with retina, a robust sclerotic, a transparent cornea, a pigmented choroid, and a crystalline lens (converging lens). The supporting muscular structures are also developed to guarantee the adaptation functionalities of the optical elements for each type of animal (fish, reptiles, birds, cetaceans, etc.). The biological visual systems existing on earth, even if they present different solutions in the optical system, present electrochemical processes of the very similar image transduction mechanisms.
5.2 The Human Optical System Fig. 5.2 Visual system of an insect
213
eal
n Cor
ets Fac
Lenses
Crystal Cones Transparent Rod (Rhabdom)
Ommatidia
5.2 The Human Optical System The human visual system can be thought of as constituted by a double convergent lens that forms a real image in the light-sensitive area (retina). The human visual system (see Sect. 3.2) integrates with the following components: optic system, optic nerve, and cerebral cortex. In humans, the optical system (the eye) has a consistent and elastic structure and is held in place as well as by the muscles, bands, nerves, and vessels that penetrate it. On the front, the outside of the cornea is transparent, in the center of which are visible the iris, variously colored, and the pupil, the hole through which the light passes and the images are received. The pupil dilates or shrinks (2÷8 mm) depending on the lower or greater intensity of the light stimuli. The back of the spherical structure is formed by the sclera, and is home to the convex component of the optical system. From this site of the sclera emerges the optic nerve wrapped in fibrous material that prevents reflection and refraction of light rays. Accident light rays penetrate the eye from the air–cornea interface and undergo the maximum deviation (the refractive index of the cornea is nc ∼ = 1.376). In scuba diving, the sight diminishes precisely because the light rays are not adequately refracted due to the refractive index of the water (na ∼ = 1.33) being very close to that of the cornea. The light, passing the cornea, which enters a hollow space containing the aqueous humor (na ∼ = 1.33), a colorless and transparent liquid, is only slightly deviated in the cornea–humor aqueous interface as the latter has a refractive index almost identical to that of the cornea. The iris is also found immersed in the aqueous liquid and performs the diaphragm functions that control the amount of light entering the eye through the pupil. The radial and circular muscles of the eye allow the iris to open in very poor light conditions and to close in good light conditions. Adjacent to the iris toward the inside of the bulb is located the crystalline lens of 9 mm in diameter and 4 mm in thickness, consisting of a multilayer fibrous mass (22,000 layers) and transparent, and wrapped by a thin and elastic membrane. We have already analyzed the biological architecture of the human visual system. Let’s now analyze some of its features more from the point of view of the optical image acquisition system. Unlike artificial lenses made of glass, the crystalline due to the characteristic of its lamellar structure can increase in size and has a refractive
214
5 Digitization and Image Display
index that varies from 1.406, in the central nucleus, to about 1.386 in the less dense parts. The crystalline lens, varying in shapes and sizes, allows an excellent mechanism of fine focusing by varying its focal length. From the optical point of view, the cornea–crystalline system can be thought of as a two-lens optical system with the focus of the object of about 15.6 mm (from the side of the cornea) and the image focus of about 24.3 mm behind the retina. A single lens setup is considered to simplify the optical system, having an optical center at 17.1 mm in front of the retina. After the crystalline body, the vitreous body and the ciliary body are found. The vitreous body consists of a transparent and gelatinous mass that fills the space between the posterior interface of the crystalline lens and the inner membrane of the ocular globe. This mass is called vitreous humor: transparent gelatinous liquid (refractive index nuv = 1.337) made up of amorphous substance, fibers, and cells, held together by the hyaloid membrane. In the innermost area adhering to the sclera is a thin fibrous membrane called a choroidal vascular and pigmented with melanin. A thin layer (0.1 ÷ 0.5 mm) of photoreceptor cells cover an extended area of the internal surface of the choroid which is called retina from the Latin network. The focused light is absorbed through an electrochemical process in this multilayer structure. In Sect. 3.2, we have highlighted that the human visual system has two types of photoreceptors: cones and rods. The retina is considered to be an expansion of the optic nerve, which, like a membrane, is applied to the choroid until ora serrata. The area where the optic nerve exits from the eye does not contain photoreceptors and is insensitive to light. The small area of contact between optic nerve and retina is called an optical pupil. The mechanism of focusing (power of accommodation) of the images on the retina for the human visual system is realized by the crystalline lens which, thanks to its elastic nature, can be more or less stretched by the ciliary muscles. The bigger the ironing, the greater the curvature of its faces becomes (decreases the radius R of the lens) and consequently decreases its focal length (as foreseen by Eq. 4.26) with the decrease of its dioptric power (unit of measure of the convergence of a centered optical system). The dioptric power of a lens is a measure of the ability of a lens to converge incident light rays. It is known that a wide-angle lens (with short focal length) has the ability to converge incident rays better than a lens with a large focal length. It follows that the dioptric power is inversely proportional to the focal length. The physical quantity associated with the measure of convergence is called dioptric power and is defined as the inverse of the focal distance of a centered optical system. With reference to Eq. (4.26) of thin lenses with refractive index of the lens nl in the air, the dioptric power of a lens is given by 1 1 1 (5.1) P = = (nl − 1) − f R1 R2 The focal distance is measured in meters and the convergence in diopter. For a lens with a focal length f of 1 m, the diopter is defined with the inverse of the focal length of one meter or diopter indicated with 1D = 1 m−1 . If a divergent lens has a focal length of −2 m, the diopter is −1/2D. For a converging lens with f = 100 mm,
5.2 The Human Optical System
215
the diopter value is 1/0.1D or 10D. Equation (5.1) is used for the calculation of the diopters of a lens, known the radii of curvature and the indices of refraction. If there are more lens, the convergence is given by the algebraic sum of the single convergences of each lens. For example, the dioptric power of two adjacent lenses, one converging with diopter Pc and the other diverging with Pd dioptres, is given by the sum of diopters of each lens, i.e., P = Pc +Pd . The degree of myopia, presbyopia is measured in diopter, referring to the convergence of corrective lenses. The dioptric power of the crystalline lens of the human visual system considered immersed in the air is +19D, while for the cornea it provides about +43D for a total of 59.6D considering the eye with the relaxed muscles. This value is estimated considering the optical system as a compound lens (crystalline + cornea) the center of which is 17.1 mm from the retina (1/f = 1/0.0171 ∼ = 59D). Normally when the eye muscles are relaxed, the crystalline does not undergo any stretching and operates with the maximum focal length and the objects that are at infinity are well focused on the retina. As the object approaches the eye, the ciliary muscles contract, generate a strong stretch on the crystalline lens (accommodation process) to focus on the retina of the neighboring object. The minimum distance of an object to be seen correctly varies with age. For young adolescents, for a normal eye, the minimum distance is 70 mm, for young adults, it is 120 mm and for an increasing age, the distance varies from 250 to 1000 mm. Progressive loss of accommodation power is called presbyopia. Many animals have the same capacity for accommodating man. Others only move the lens back and forth relative to the retina just to focus on the image as it is done in ordinary cameras. The mollusks instead make the adaptation by contracting or completely expanding the body of the eye to move away or bring the lens closer to the retina. Some birds of prey instead of constantly focusing their prey, while moving quickly even for a considerable distance, for their survival, have developed a mechanism of accommodation in a very different way by changing the curvature of the cornea. The optical system guarantees the process of image formation on the retina. The process of image perception (vision) is the result of the synergistic and joint activity of the eye and the brain. This activity develops in fractions of a second and includes the following phases: 1. Formation of the real image reduced and inverted on the retina in analogy to what happens in an image acquisition tool (camera, camcorder, etc.); 2. Stimulation of the photoreceptor located on the retina in relation to the luminous energy of the observed scene; 3. Transmission to the brain, through the optic nerve, of the impulses produced by the photoreceptors; 4. Reconstruction of the 3D scene starting from the 2D images produced upside down on the retina; 5. Interpretation of the objects of the scene. In Sect. 3.2, The human visual system has been described along with the sensory capabilities of it including aspects of color perception.
216
5 Digitization and Image Display
5.3 Image Acquisition Systems There are several devices available to capture digital images. The characteristics of these devices vary depending on how the images are formed. The first image acquisition device, already known to Aristotle, is the darkroom. It consists of a lightproof environment, varying in size from that of a small box to that of a room, where a small hole (pinhole) is drilled on a wall. The light rays of the scene cross through the hole and project on the opposite wall, where an inverted image of the objects scene is shown (see Fig. 5.1). Obviously, the quality of the image obtained is not good by missing the optical system. Leonardo da Vinci suggested replacing the pinhole with a lens to improve image quality in terms of clarity and brightness. For several centuries, it remained the only device for image acquisition. Galileo in 1609 was the first to produce optical images for scientific research (study of celestial bodies) making a telescope. Subsequently, the darkroom evolves by inserting a photosensitive surface onto the projection wall and introducing good quality lenses, the darkroom becomes the prototype of modern cameras and camcorders with the development of microelectronics. The first devices have both focal lengths with fixed focal length and to focus the image of the object, the optic must be moved closer or further away from the image plane (sensitive area of the film or photoreceptors). Recall that this is controlled by the parameters of the image acquisition system related to each other by Eqs. (4.29), (4.30), and (4.33) already discussed in Sect. 4.4 Thin lenses. We rewrite the (4.27) relation of the conjugated points: 1 1 1 = + f p q
(5.2)
and the (4.34) magnification relation: M =
Dim_vert_i q =− Dim_vert_o p
(5.3)
where f is the focal length of the optic, p is the distance of the object from the lens center, q is the distance of the image plane from the center of the lens, M is the factor of magnification or reduction of the image with respect to the object, Dim_vert_i and Dim_vert_o are the measured heights with respect to the optical axis of the image and the object, respectively. The negative sign indicates that the image is upside down. Of the observed scene, the optical system acquires only a circular portion thereof, known as the angular field of view α. The observed scene, included in the angular field of view, is projected through the lens in the 2D image plane shown in Fig. 5.3 with the circular area. Considering the object at infinity for the (5.2), the image is formed at the distance q which coincides with the focal distance. As the object approaches the lens, the image moves away from the lens and forms at distances q greater than f to be in focus, as explained in detail in Sect. 4.4. Normally all image acquisition devices have the rectangular sensitive area and only a portion of this circular area is acquired. The dimensions of the sensitive area, where the image is uniformly focused, are geometrically characterized by the larger diagonal of the rectangular sensitive area.
5.3 Image Acquisition Systems
217
Fig. 5.3 Angle of view of the optical system and image plane
α
Se
ns
iti
ve A
re
a
f
Typically, the largest diagonal of the sensitive area is chosen approximately equal to the focal length of the lens. It follows that α2 ∼ = 1/tg 21 and α corresponds to about ◦ 53 . A so-called normal objective for a 24 × 36 mm sensitive area camera has a focal length of around 50 mm and a field angle of about 53◦ . It is called normal because the characteristics of the images created with these optics are very similar in terms of perspective rendering to those generated by the human optical system. With shorter focal length, there is a wider angle of field that from 50◦ can reach values up to 180◦ (fish-eye with f < 6 mm). These lenses are called wide-angle and when pushed, they can produce very distorted images (barrel distortion, described in the Sect. 4.5 Optical Aberrations). Objectives with a focal length greater than 50 mm reduce the angle of view to some degree at focal lengths of ≈1000 mm (telephoto lenses). The quality of modern lenses is such as to produce excellent quality images, for all focal lengths and brightness, expressed with the symbol f/# introduced in Sect. 4.5.1. The sensitive area of modern tv-cameras is normally 10 × 10 mm2 and consequently the standard lenses have a focal length of around 15 mm. The optical systems of the considered devices, camera and tv-camera, produce an optical image of the objects of the observed scene. In other words, the optical system produces a physical image that is the spatial distribution of the intensity of luminous energy. When the physical image is observable by humans, it is said that it is an image in the visible. Physical images that are not in the visible may be, for example, images to represent magnitudes such as ground height, temperature, and humidity normally generated by nonoptical acquisition systems. Without losing in general, we consider among the physical images those generated by a camera or a tv-camera. The image acquisition devices (camera and tv-camera) make a correspondence between the objects of the scene and the physical images generated (Fig. 5.4). The film of a camera is the physical medium that captures the image corresponding to the optical image. In electron tube cameras, a photoconductive plate is the physical medium that electronically captures the image corresponding to the optical image of the camera. Unlike the film of the camera (on which a negative image is produced after its development through a physical–chemical process), a tube camera produces an electronic (analogical) image of the optical image formed.
218
5 Digitization and Image Display
Frames
Projection screen Optical Image
Analogic Camera
Capture Card
Fig. 5.4 Components of an image acquisition and display system
Until the introduction of CCD (Charge-Coupled Device) and CMOS (Complementary Metal Oxide Semiconductor) sensors in the late 1980s, which instead exploit the electrical properties of semiconductors, the Vidicon tube cameras (with similar technology, they are Orticon and Plumbicon) represented one of the main systems for transducing video signals into electrical signals. We have already introduced the CCD and CMOS solid-state image sensors in Sect. 3.26 for the description of color reproduction technologies. In this context, it is interesting to highlight that vidicon cameras produce analog images unlike digital images produced by solid-state ones. In Vidicon technology, a beam of electrons, by scanning, analyzes the distribution of light intensity of the image that the optical system forms on the front face, held at positive potential with respect to the mass of a photoconductive flat layer, on the opposite face, on which it is focused the electron beam (emitted by an electron gun), there is an accumulation of negative charges; in these conditions, the photoconductor layer behaves locally as a condenser with a loss resistance proportional to the local light intensity; therefore, there will be a local potential drop which will then be amplified by a video amplifier. Through the electronic scanning process, the optical image (available in the form of light energy), formed on the area called the target of the Vidicon tube, is transduced in the form of voltage variation, from electronic devices that produce a standard video signal (for example, RS170) to reproduce the acquired image simultaneously on monitors (monochromatic or color). The video signal contains the voltage variations obtained by the electronic scanning of the optical image projected onto the glass front plate of the Vidicon tube. As shown in Fig. 5.5, the video signal, generated line by line (with the appropriate horizontal and vertical synchronism signals to delimit the dimension of the lines in the image), is the result of the transduction of the light intensity in variation of tension obtained during the optical image scan.
5.3 Image Acquisition Systems
219
Conductor Layer Transparent
Optical Image plane Film Photoconductor Cathode
Glass Plate Video output
Electronic beam Vidicon Tube
Fig. 5.5 Diagram of Vidicon Camera and photos of the Vidicon tube
Recall that our primary goal, once the image is acquired, is to analyze the objects of the scene. We have seen how the human visual system processes the image acquired on the retina by propagating, through the optic nerve, the information of light intensity and elaborating it with a neural model for the complete perception (recognition, localization and dynamics) of the observed objects. In analogy to the human system, we need to process the information content of optical images acquired for example by the Vidicon camera and those recorded on the film. The possible processing systems are digital electronic calculators which are able to process only numeric data. It is, therefore, necessary to translate the information content of the scene acquired through the process of image formation with the various available devices into numerical format, that is, into digital format. When considering the image recorded on a photographic film, a mechanical or electronic scanning process is required, moving a light brush point by point on the transparent film and measuring the transmitted or reflected light as an estimate of the light intensity corresponding to the point of the scene observed during the image acquisition process. A photosensor measures the light transmitted or reflected point by point, and the electrical signal produced is adequately transduced by the digitizing device which converts the analog signal into a numeric form (see Fig. 5.6). If we consider the image acquired by the Vidicon camera, a digital conversion of the video signal generated with the complete electronic scanning of the sensitive surface of the vidicon is required. In particular, the European standard RS-170 provides for the scanning of the entire image (frame) in 625 horizontal lines with a frequency of 25 frames per second. Figure 5.7 shows the analog image (continuous) acquired as a video frame and the analog video signal corresponding to a horizontal line (Fig. 5.7b). For the conversion of the analog video signal into digital format, which can be managed from the computer, a hardware device called frame grabber is required. The latter converts the analog video signal into a digital image by sampling the continuous signal with a constant time interval. In Fig. 5.8 is shown the sampling process of a sampled video signal line through the instantaneous measurement of the electric signal value at constant time intervals. The value of the electrical signal, for each time interval, is converted into a numeric and stored value. This value is related
220
5 Digitization and Image Display Mobile x-y scanning system Transparent illuminated
Sensor Sensor Controller y x
Object to acquire
Electronic scan of the image
Capture Card
Fig. 5.6 Electronic scanning image acquisition system Line 1 Field 1
Line 1 Field 2
(a) Vidicon camera
(b)
Optical image On the sensitive plane
Voltage Signal of a Video line
Video Image: 640x480 Monochrome 760x576 Color 25 Frame/sec 15,6 kHz (lines/sec)
Time Video signal RS 170 Lines Odd Lines Even
Fig. 5.7 Scheme of the RS170 standard video image acquisition. a Scan two fields of a video frame (green and red lines) at two different times; b analog signal associated with a video line Line 1 Field 2
Digitized image (luminous intensity values)
Line 1 Field 1
Signal of a Video line 80 Signal from analogic camera
75 Video Image: 640x480 Monochrome 760x576 Color 25 Frame/sec 15,6 kHz (lines/sec)
60 30 60 80 ....... 75 37 35 37
30
35
Tempo Lines Odd Lines Even
Analogic signal sampled values (Conversion from Voltage values to light intensity values)
Fig. 5.8 Digital conversion of the video signal generated by an analog camera
to the light intensity value of the image at that point. This process is repeated for each line of the video signal of the whole frame and in the end, we obtain the digital image that can be processed directly from the computer.
5.3 Image Acquisition Systems
221
The sampled light intensity values are also quantized in digital values to meet the needs of numerical accuracy. This accuracy depends on the number of bits assigned to represent the luminous intensity information for each sampled point. Normally 8 bits are assigned, thus generating 256 levels of light intensity. The range of intensity levels is also called dynamic range and in the case of 8-bit quantized digital images, there is a dynamic range from 0 to 255. A digital image after processing can be further displayed. It is necessary to reverse the process of digitization which aims to reconstruct the continuous image starting from the numeric image previously digitized and processed. For this purpose, a digital-to-analog conversion hardware device is required. Figure 5.4 shows the complete image conversion scheme from digital to analog for image display. In the following paragraphs, the effects introduced by the digitalization process will be analyzed to qualitatively and quantitatively assess any loss of information with respect to the original image. It can be anticipated that these effects are now minimal considering the good quality of the digitization devices available today. The most recent cameras, those in solid state, CCD and CMOS, as indicated above, are made with image sensors (line or matrix of photodiodes) with the scanning electronics integrated in a single chip that includes the electronics necessary to determine the electric charge generated in the image plane. The sampling process takes place directly at the point where each sensor is located in the mono (linear sensor) or 2D sensor matrix (optical image is projected onto the photodiode matrix of the image sensor).
5.4 Representation of the Digital Image In the previous paragraphs, we have described the fundamentals of the image formation process by analyzing how the light energy of the environment interacts with the optical system that generates a 2D physical image of the 3D objects of the observed scene. Moreover, as with the continuous physical image, which contains the various values of light intensity of the corresponding points of the scene, it is described, through the digitalization process, a discrete digital image is obtained (see Fig. 5.9). This process of digitization takes place in two phases: sampling and quantization. The physical image continues f (x, y) generated by the optical system or in general by any other acquisition system (radar, X-ray, infrared, sonar, chemical, etc.) represents the information of luminous intensity (or levels of gray) at each point (x, y) of the 2D optical plane. The 2D matrix g(i, j) represents the light intensity values derived from the continuous physical image f (x, y) corresponding to the sampling grid defined by (ix, jy), where x and y represent the sampling intervals, respectively, along the x axis and the y axis. From the quantization process is generated the digital image I (i, j) derived from the digital values of light intensity g(i, j) reported for each element (i, j) of the image, called pixel (derived from picture element). The value of each pixel I (i, j) represents the discrete digital element of the digitized image. The value of the pixel
222
5 Digitization and Image Display x 0
i
Samplig
0
i
Quantization
0 g(i,j)
I(i,j)
f(x,y) y
age l im sica Phy
j
ge
pl S am
ma ed i
j
nti Qua
zed
ima
ge
Fig. 5.9 From the analogical image to the digital image through the sampling and quantization process
I (i, j) represents the average value of the light intensity corresponding to a grid cell with a rectangular surface of dimensions x × y. The set of pixels I (i, ∗) represents the i-column of the digital image, while the line j-ma is represented by the pixels I (∗, j). The digital image I (i, j) is a good approximation of the original continuous image f (x, y) if the following parameters are chosen appropriately: the sampling intervals x and y, and the range of the intensity values I assigned to each pixel in the phase of quantization. From these parameters depends what is defined as image quality, i.e., spatial resolution and radiometric resolution (or light intensity or color). Another parameter that defines the quality of the image is represented by temporal resolution as the ability to capture the dynamics of the scene in the case of moving objects. In this case, the frame rate parameter is considered, which defines the acquisition speed of one or more image sequences (number of frames per second).
5.5 Resolution and Spatial Frequency The acquired digital image is the result of the digitization process realized starting from the optical image. We can then state that the resolution of the digital image, which indicates how the entire acquisition system is capable of solving the smallest details of the objects of the observed scene, depends on the various components of the overall acquisition system: environment (lighting), system optical, and digitalization. To simplify, we can state that the resolution of the digital image depends on the resolution of the optical image and the resolution of the digitization system. Let’s assume we have a good quality optical image in the sense that it solves the spatial structures of the original scene well and we evaluate the effects of the digitalization process on spatial resolution. When we look at a picture of good quality, we have an example of image with continuous color levels and observing with the human visual system, the objects of the scene appear with a good geometric resolution with all the geometric details and without discontinuity. This happens for two reasons. The first is that the camera used
5.5 Resolution and Spatial Frequency Fig. 5.10 Spatial resolution of the human visual system
223 1,0
Relative sensitivity
Good lighting
Soft lighting
0,1
0,5 1 5 10 50 100 Spatial frequency (cycles/degrees)
and the paper support on which the image has been reproduced have a good spatial resolution and a good chromatic reproduction. The second reason concerns the high spatial resolution of the human visual system linked to the high number of photoreceptors (cones and rods) present, and to the small dimensions of the photoreceptors themselves. In other words, the image formed on the retina has a high number of pixels with dot-like dimensions producing the effect of continuous image. In Fig. 5.10, the spatial resolution of the human visual system is represented in good and limited lighting conditions. The graph indicates the relative sensitivity of the human visual system to varying lighting conditions and to varying spatial frequencies of the observed geometric structures. In order to make themselves invariant from the observation distance, the spatial frequencies are defined in terms of cycles by degrees referred to the latter with respect to the angle of view. The choice of the spatial resolution of the pixel and therefore of the entire digital image is closely linked to the various applications. There are numerous acquisition systems with image sensor characterized by the size of individual photoreceptors and the image size from 256 × 256 pixels up to 8K × 8K pixels and for special applications, the technology allows even larger dimensions. We now formulate the concept of resolution in the context of a digital image that must be processed and subsequently visualized (we hypothesize that the monitor is very high resolution). In a digital image, the geometric details of the scene appear as a variation of light intensity present between adjacent pixels. The geometric structures of the scene will not appear in the image if the corresponding pixels do not show significant variations in levels and we can say that the spatial resolution of the image is inadequate. The concept of spatial resolution is related to the concept of spatial frequency that indicates how quickly the values of pixels spatially vary. For simplicity, let’s consider Fig. 5.11 showing different images including only vertical black bars interspersed with white bars. The pixel intensity values of each image row have a zero value for black bars and a maximum value for white bars. It is observed that the pixels on each line vary cyclically from white to black repeatedly. The ν frequency indicates the rate of variation of the pixel values spatially. The example shows that the intensity value varies from zero (black bar) to the
224
5 Digitization and Image Display
(a)
I
(b)
Intensity Luminous
I
0
0 Space
Space
Q
P
ν=2
(c)
R
ν=6 T
ν=12
ν=24
Fig. 5.11 Frequencies in the spatial domain: a square wave generating the image P with low frequencies; b square wave generating image Q with frequency ν = 6, c image R with frequency ν = 12 and image T with high frequency ν = 24
maximum value (white bars). In Fig. 5.11 are shown bars with spatial frequencies as high as they correspond to increasingly finer geometric structures. In Fig. 5.12, a complex image is represented and the light intensity profile of a row highlights areas of the geometric space with uniform structures, with small variations in intensity, and other zones that vary more rapidly. In correspondence with the areas between the objects, there are few details and therefore little variability of the intensity and therefore low spatial frequencies. We have high spatial frequencies and therefore high intensity variability in correspondence with geometric structures with more details. Returning to the human visual system, it has been experimentally demonstrated that it perceives well the high spatial frequency geometric structures in good light conditions (observing on monochromatic contrasted monitors) up to 50 cycles per degree (cpg) and a better sensitivity that presents a peak around 4 cpg (see Fig. 5.10). For color images, regardless of intensity, spatial resolution decreases and varies in relation to the dominant color component (for example, yellow constitutes limited perceptions for the human visual system). In analogy to a vision system also for the human visual system, the spatial resolution is conditioned by the optical component (crystalline) and by the sensory component (dimension and sensitivity of the cones and rod photoreceptors) widely analyzed in Chap. 3 Color. We are now interested in adequately defining the parameters of the digitization process of a vision system in order to obtain a digital image that faithfully reproduces all the spatial structures present in the original optical image. This is accomplished by modeling the imaging process by applying the theory of linear systems and then applying the sampling theorem to appropriately determine the parameters of the analog-to-digital conversion process (number of pixels required and sampling space).
5.6 Geometric Model of Image Formation RGB Image
225 Gray level Image
Fig. 5.12 Color and gray image with high and low spatial frequencies. The color and gray scales are shown, while below the images are represented the profiles of the intensity levels of the RGB bands and the gray levels related to an image line (green)
5.6 Geometric Model of Image Formation In the previous paragraphs, we discussed the importance of the optical system for optical image formation. The 3D objects in the scene are projected from the optical system into the 2D image plane. We have already highlighted how the optical image produced is influenced by the effects of diffraction and the aberrations of optics. We can already state that the projection from 3D to 2D involves a loss of information of the observed scene. The simplest geometric model that projects a bright spot of the object in the image plane is the pinhole or perspective model. As shown in Fig. 5.13, the light point P(X , Y , Z) of the object generates a straight ray of light, passing through the center O of the optical system, hypothesized as an infinitesimal hole, which produces a luminous point p located in (x, y, di ) on the image plane. If we consider the objects referenced with respect to the reference system (X , Y , Z) with origin in the center of the optical system and the image plane (with axes x, y), perpendicular to the Z axis located at the distance di , the relations between spatial coordinates (X , Y , Z) of the objects and the 2D coordinates in the image plane (x, y) are given through geometrical considerations (see Sect. 3.7 Vol. II Perspective transformation) from di X = −sX Z di Y = −sY y=− Z
x=−
(5.4)
226
5 Digitization and Image Display y
Y
P(x,y) x
X
P( x, y, z)
Z
P’
p
q
Fig. 5.13 Geometry of the object and the image in the pinhole model Object
Shadow
Shadow
O
Surface
Fig. 5.14 3D points aligned as shown in the picture through the projection center O are projected onto the image plane while retaining the alignment of the points. In the perspective projection of more complex 3D surfaces, one has that points lying on the same ray passing through the center of projection are projected into the same image point (self-occlusion zones)
where the ratio s = d i /Z is the scale factor of the perspective system. It can be observed that Eq. (5.4) is not linear (due to the presence of Z in the denominator) and it is not possible to calculate the distance and the dimensions of the objects, with respect to the origin of the optical system. All points of the object that lie on the projection line are projected in the same location (x, y) in the image. For example, points P and P (see Fig. 5.13) located at different distances lie on the same projection radius and are projected both in p in the image plane in the same location (x, y). A straight 3D segment is, however, projected into the image plane generating a 2D segment. All the rays emitted by the points of the 3D straight segment lie in the same plane (Fig. 5.14). From these considerations, it follows that the projection of an opaque and geometrically complex 3D object generates a loss of information in the 2D image plane, reducing the origin space of one dimension. For the inverse projection, i.e., for the 3D reconstruction of the scene, described by the gray level function f (x, y, z) that represents the scene image, two 2-dimensional functions should be known: the inverse function of the image I (x, y) representing the gray levels in the image plane and the
5.6 Geometric Model of Image Formation
227
depth map function Z(x, y) representing the distance from the optical system of each point visible in the scene (projected in the point (x, y) of the image plane). The one to one correspondence between a point (X , Y , Z) of the scene and its projection in the image plane (x, y) is guaranteed, from the perspective projection model, only if each ray projects only one point of the scene. In the case of transparent objects and opaque objects with concave surfaces (in presence of occluded surfaces), this condition is no longer valid as more points are aligned on the same projection radius. The human system of visual perception reconstructs the 3D scene (modeled by a brightness function g(X , Y , Z)) through its binocular visual system acquiring two 2D images of the scene (stereo vision) and evaluating the Z(x, y) depth map through a neural process that merges the two images.
5.7 Image Formation with a Real Optical System The image produced by the geometric model of pin-hole projection is independent of the distance of the objects from the ideal optical system and assumes that the process of image formation is devoid of any unevenness. A real optical system, for example, consisting of a simple thin convex lens, projects a luminous point of an object in the image plane producing a 2D image (luminous spot with variable intensity) normally called Point Spread Function (PSF). If the object is at infinity, the bright spot is completely in focus in the image plane that is formed at the focal distance f and the optical system behaves like the ideal pin-hole system. In reality, objects are at a distance finite from the optical system, and, in the image generated by the projection of a luminous point, although appreciatively still in focus, we observe the following: 1. The image of the bright spot becomes a less luminous and slightly blurred circular area but still acceptable as sharpness. We know that this limited degradation is only valid in paraxial conditions. In fact, from the physical optics, it is known that the wave nature of light when it crosses the optical system (an opening) generates the phenomenon of diffraction even considering no optical aberrations. 2. The focal plane where the image is formed is no longer at the distance f from the lens but becomes q moving toward the optical system (i.e., q < f ), and with increasing the distance p of the object from the optical system, the image moves away (q > f ). Recall Eq. (4.30) of Sect. 4.4.1, i.e., the relationship that binds the optical parameters of the conjugated points: 1 1 1 = + f p q where f is the focal length of the lens.
228
5 Digitization and Image Display
3. The geometric model of perspective projection (pin-hole) can still be considered valid if the dimensions of the disk generated as a projection of the light point are kept as small as possible and Eq. (5.4) is still valid considering the coordinates of the disk center in place of the pin-hole model image point; for a light point F1 (X , Y ) in the object plane, the optical system produces (indicated with the symbol −→) a light point I1 (x, y) in the image plane, that is, F1 (X , Y )
−→
I1 (x, y)
and the system is defined as linear. 4. If a second light point F2 (X , Y ) of the object plane produces a light point I2 (x, y) in the image plane, the system is defined linear if, and only if, it occurs: F1 (X , Y ) + F2 (X , Y )
−→
I1 (x, y) + I2 (x, y)
In other words, a third luminous point, obtained as the sum of two luminous points, produces in the image plane a luminous dot that corresponds to the sum of the luminous points already produced in the image plane. 5. The brightness of the disk in the image plane increases proportionally as the brightness of the point object increases. This condition suggests that the process of image formation satisfies the homogeneity condition of a linear system, i.e., if k · F(X , Y ) is the luminosity in the object plane, the luminosity produced in the image plane is k · I (x, y): k · F(X , Y )
−→
k · I (x, y)
These aspects of the image formation, the PSF image and the diameter of the spot, begin to quantitatively define the spatial resolution of the generated optical image, which is strictly dependent on the characteristics of the optical system. With definitions 3, 4, and 5, it is stated that the optical system is a 2D linear system, and it is plausible to model the process of image formation with the theory of linear systems. We are then motivated to affirm the following: (a) two bright points of the object produce an image (in the image plane) in which the effects of the luminous circular areas are superimposed (add up); (b) considering points of light in different positions in the plane of the object (X , Y ), the image of the light spot moves to a new position in relation to Eq. (5.4) presenting the same result; in other words, the optical system produces the same result whatever the position of the light spot in the object plane. In symbolic form, for an optical system, this property tells us that if a light point f (X , Y ) of the object plane produces a bright spot I (x, y) in the image plane (definition 2) and if you move the point bright in the object plane of (−X , −Y ), in the image plane, the light spot is only moved without undergoing brightness changes as follows:
5.7 Image Formation with a Real Optical System
f (X − X , Y − Y )
229
−→
I1 (x − x, y − y)
(c) an optimal optical system generates images of light spots of the similar form if the distance from the optical axis is reasonably limited. This means that the PSF is spatially invariant. In other words, the statement (c) tells us that the optical system is stationary, i.e., changing the position of the luminous point (source) of input in the object plane, there is the change of position in the image plane (statement (b)), without there being a variation of the form of the PSF that describes the source light spot. If the previous conditions are maintained, it can be assumed that the optical system of image formation is a system so characterized: spatially invariant, linear, and the PSF represents the impulsive response of the same. The process of image formation for a bright 3D object can be thought of as the superimposition of the single spot images, described by the PSF of the optical system, generated in correspondence of all the luminous points that make up the 3D object. In essence, the overall image of the object is the spatial overlap of all the light spots described by the PSF corresponding to all the luminous points constituting the 3D object. In Fig. 5.15 is schematized the process of image formation considering the photometric aspects. The image is generated by a superficial element of the uniformly illuminated object with radiant flux f (X , Y ). The radiant flux (emittance, W/m2 )) emitted by the surface element X × Y of the object is given by f (X , Y )X Y . The optical system projects this luminous intensity not in a well-focused (x, y) point of the image plane but due to diffraction, it produces a slightly blurred light spot (see Fig. 5.16). This blurring effect, introduced by the optical system (circular aperture), is described by the PSF modeled from the Airy disc that represents the distribution of the irradiance in the image plane. The Airy disk is the effect of diffraction of a circular opening which distributes the irradiance (flux density) with the maximum brightness in the central circle and a series of concentric circular rings in relation to the wavelength of the light and the diameter of the circular aperture.
Fig. 5.15 Image formation process. Optical system that projects in the image plane the luminous flux radiated by a surface element of the object
Y dX
X
dY
y
x f(X,Y)
Obj
ec t P
lane
dy dx
lane
ge P
Ima
230
5 Digitization and Image Display
Fig. 5.16 Image formation process: the light energy radiated by a point source, through the optical system, generates in the image plane a blurred bright spot with variable brightness described by the point spread function (PSF)
Y X
y Point Spread Function
x
Obj
ec t P
lane lane
ge P
Ima
In quantitative terms, the illuminance that arrives on an elementary region of the image plane centered at point (x, y) is given by the following relation: I (x, y) = h(X , Y ; x, y)f (X , Y )X Y
(5.5)
where with h(X , Y ; x, y) is indicated the PSF describing how a source luminous spot in (X , Y ) in the plane of the object is modified in the image plane centered in the location (x, y). For simplicity, the projection of the elementary light source is supposed one by one without producing a magnification factor. The irradiance I (x, y) (expressed in W/m2 ) in the image plane, in each location (x, y), is given by the contribution of all the elementary areas of the object (thought as light elementary sources) through an overlap process (statement (b) of Linear systems) as follows: +∞ +∞ f (X , Y )h(X , Y ; x, y)dX dY I (x, y) =
(5.6)
−∞ −∞
We have already observed that in the context of spatially invariant system considering other bright areas in the object space (X , Y ), the only effect that is found is the shift in the image plane of the light spot, while the shape of the PSF remains unchanged, i.e., the projection mode at different points in the image plane of the source light spot. In reality, we know that this is not always possible, but considering the partitioned image plane in limited regions, the shape of the PSF can be considered invariant. Figure 5.17 shows the 1D graphs of the PSF corresponding to two light points X and X of the object plane. The value of the light intensity in the image plane can be observed in a point x derived from the overlapping of the contribution of the two PSF. Figure 5.18 shows instead in the 1D a more complex situation, i.e., how the luminous intensity in the image plane in each point x is the result of the superimposition of the intensity of each point of the object, the latter modeled by Dirac-scaled pulses and a continuous brightness function, both modified by the Gaussian PSF (an approximation of the Airy disk).
5.7 Image Formation with a Real Optical System
231
f(X’’)
x’
x f(X’)
x’’
Object Plane Image Plane
Fig. 5.17 Application, for two bright points X and X of the object plane, of the superposition principle; the light intensity at a point x in the image plane is derived from the superposition of the PSF contribution of the two light points considered f(X)
PSF
I(x) h(X,x)
X
x
Fig. 5.18 Light points in various positions on a line in the object plane modeled by the Dirac function A · δ(x − x ) with variable intensity projected in the image plane blurred by the Gaussianshaped PSF
In the spatially invariant system conditions, we can affirm that by shifting the luminous point in the object plane (X , Y ) according to Eq. (5.4), its image will change position in the image plane (x, y) without its modification, i.e., the PSF will not bring any change to the bright spot projected in the image plane. Consequently, it can be said that the PSF does not change whatever the position (X , Y ) of the luminous point in the object plane. With reference to Fig. 5.18, the example shows the identical effect of the PSF (modeled by a Gaussian function) in the image plane, for each luminous point of a line of the object plane: the distribution of the luminous intensity is always Gaussian and only scaled in relation to the brightness of the luminous points of the object. The independence of the PSF shape from the position (X , Y ) of the luminous point in the object plane and therefore of the invariance of behavior of the optical system allows to refer the spatial coordinates in the image plane only with respect to the position in which it is centered (x = X , y = Y ). From this follows: h(X , Y ; x, y) = h(x − X , y − Y ) (5.7)
232
5 Digitization and Image Display
In Fig. 5.16, the PSF is centered in the image plane because the light point in the object plane is on the optical axis with X = Y = 0. For a space-invariant optical system under Eq. (5.7), we can rewrite Eq. 5.6 as follows: +∞ +∞ I (x, y) = f (X , Y )h(x − X , y − Y )dX dY
(5.8)
−∞ −∞
which is known as the integral of CONVOLUTION. More generally, if the effects of optical degradation are neglected, the projection of the object in the image plane can be considered as the combined process of geometric projection (pin-hole modeled) and convolution through the function PSF characteristic of the optical system. The resulting image of the object, through convolution, is the result of the distribution of the luminous energy radiated from the object, projected in the image plane, and modulated in each point P(x, y) of the image weighing with the PSF the contributions of image points included in a P-around (domain of PSF). The process that modifies a function f (X , Y ) at each point in its domain through another function h(x − X , y − Y ) producing a new function I (x, y) is called a convolution operator and it is said that the function I derives from f convoluted with h. The convolution, in addition to modeling the process of image formation, we will see that it will play a fundamental role in the field of image processing. In the process of image formation, the convolution has the effect of smoothing the intensity levels of the image and consequently involves a reduction in spatial resolution. In Fig. 5.19 is shown how the intensity profile corresponding to a line of the image f (X ) is leveled when it is convoluted with the Gaussian PSF h(x − X). Let’s see now what happens when the PSF is reduced to an impulse, i.e., it corresponds to the delta δ(x, y) function. With a mathematical artifice, we can transform a Gaussian function into an impulse through a succession of Gaussian functions δσ (x, y) which have ever higher peaks and become narrower in (x, y) = (0, 0). We define a generalized delta function with the succession of Gaussian functions as follows: 1 x2 + y2 exp − (5.9) δ(x, y) = lim δσ (x, y) = σ →0 2π δ 2 2σ 2
5.7 Image Formation with a Real Optical System
(a)
233
(b) f(X)
(c) I(x)
h(x-X)
0
X
x
0
0
x
Fig. 5.19 Example of a 1D convolution between the function f (X ) which represents the brightness of an object plane line (a) and the Gaussian PSF function (b) which simulates the effect of the diffraction of an optical system producing as a result (c) a smoothed image line of the object
In order to treat the delta function in strict terms, one should refer to the theory of distributions that does not fall within our scope. In this case, it is interesting to use its properties given by the following integrals: 1. δ(x, y) = 0 everywhere except at the origin (0, 0) where it assumes the value ∞. The first property of the δ function is given by the following integral: +∞ +∞ ++ δ(x, y)dxdy = δ(x, y)dxdy = 1 −∞ −∞
(5.10)
− −
where is an arbitrary nonzero number. This property can indicate a pulse that has infinite amplitude and a null width with a unit volume. 2. Sifting property. For a continuous function f (x, y), the following integral is valid: +∞ +∞ f (x, y)δ(x, y)dxdy = f (0, 0)
(5.11)
−∞ −∞
which essentially assigns a value to the f function when the pulse is centered in the origin and with unit intensity. 3. Another important sifting property is to move the impulse from the origin to the coordinates (x0 , y0 ) and in this case, δ(x − x0 , y − y0 ) = 0 everywhere except in (x0 , y0 ) where it assumes value ∞. With this sifting property, it is possible to extract the value of f own at (x0 , y0 ) with the following integral: +∞ +∞
+∞ +∞
f (x, y)δ(x − x0 , y − y0 )dxdy = −∞ −∞
f (x + x0 , y + y0 )δ(x, y)dxdy = f (x0 , y0 )
(5.12)
−∞ −∞
Note the properties of the δ function, we return to its approximate Gaussian functions expressed by Eq. (5.9) that in the limit of this succession of functions would be valid the (5.11) rewritten as follows:
234
5 Digitization and Image Display
(a)
(b) f(X)
(c) I(x)
h(x-X)
0
X
0
x
0
x
Fig. 5.20 Example of an ideal monodimensional convolution between the function f (X ) which represents the brightness of a line of the object plane (a) and the ideal function PSF shaped by the impulse (b). The result (c) of the convolution is an intact copy of the object image
+∞ +∞ lim f (x, y)δσ (x, y)dxdy = f (0, 0)
σ →0 −∞ −∞
(5.13)
where δσ are to be understood as approximate functions for the limit process and not for the δ(x, y) pulse, which does not exist as a function. The Gaussian δσ have the characteristic of subtending the unit value of the volume since for σ → 0 the function become increasingly narrow around (0, 0) with an higher peak. We can now affirm that, by approximating the δ pulse with δσ (Eq. 5.9), considering it as PSF in the convolution equation (5.8), we can write on the basis of the sifting property +∞ +∞ I (x, y) = f (X , Y )δ(x − X , y − Y )dX dY = f (X , Y )
(5.14)
−∞ −∞
with the foresight of having used the object plane definitions f (X , Y ) and image plane I (x, y). Basically, Eq. (5.14) tells us that, for an optical system without aberrations and without the phenomenon of diffraction, an ideal PSF modeled by the Dirac function reproduces exactly the brightness of the object f (X , Y ), that is, it would have an ideal optical system with zero noise in the projection of a bright spot in the image plane I (x, y). In other words, an exact replica of the object is produced as an image (see Fig. 5.20). If we model the object as an impulse, that is, f (X , Y ) = δ(X , Y ) centered in (X0 , Y0 ) by virtue of the property of sifting 5.12, the convolution equation 5.8 becomes +∞ +∞ I (x, y) = K · δ(X , Y ) · h(x − X0 , y − Y0 )dX dY = K · h(X0 , Y0 ) (5.15) −∞ −∞
where K has the purpose of unit of measurement. In this case, as an image we obtain a replica of the PSF; for an optical system devoid of aberrations, we know that it models the diffraction phenomenon and corresponds to the Airy disk (see Figs. 5.16 and 5.18).
5.7 Image Formation with a Real Optical System
235
Finally, let us remember that we have examined the aspects of image formation in the conditions of noncoherent light for which the image is formed through the superposition of various luminous contributions coming from the imagined object composed of a set of elementary illuminating regions. In addition to the diffraction aspects in the convolution process, it is possible to consider other characteristics that can influence the image quality through the PSF. For example, if the object has spatial frequencies whose wavelength is greater than the domain of action of the PSF, the leveling effects of the convolution are limited. Conversely, if the intensity levels are spatially repeated with high frequency (small wavelength), the effects of the leveling on the image are significant, significantly reducing its spatial resolution. The mathematical tool that can quantify the effect of convolution is given by the Fourier transform, which, as we know, decomposes the intensity values of the images into periodic variations of the light intensity present in the image itself.
5.8 Resolution of the Optical System The quality of an image acquisition system depends on its ability to faithfully reproduce the spatial structures of objects, that is to say, preserving the brightness variations of the structures observed. Spatial and radiometric resolution detect quantitatively the expressed concept of system quality. Spatial resolution is closely related to the concept of spatial frequency, expressed in cycles or in lines per unit of length (see Sect. 5.5). The radiometric resolution is linked to the contrast and indicates the difference in brightness between different areas of an object and between different objects. The resolution of the image formation system depends on the combination of the two aspects, i.e., the contrast of the objects and the geometrical details at different scales. In the process of image formation, it has been seen that an element of the object assimilated to a point source of light projected by the optical system into the image plane becomes a circular light spot described by h(x, y) the PSF. The overall image of the object, through the convolution process, is obtained in each point as an overlap of circular light spots that add up linearly. Even with a high-quality optical system, devoid of aberrations, a bright spot of the object is projected into the image plane as an extended light area, due to the effect of diffraction phenomenon that limits the spatial resolution of the image. In Fig. 5.21 is shown the effect of the wave nature of light that when it passes through a circular aperture is diffracted by transforming a luminous point of the object plane into several circular and concentric luminous and dark disks on the image plane. The central area with the highest brightness is called Airy’s disk, named after the English astronomer, who modeled the distribution of irradiance on the central disk (first order of diffraction) and on the successive concentric rings with lower brightness (second, third, respectively), ... diffraction order).
236
5 Digitization and Image Display
d θ 2δ λ
f
Fig. 5.21 The phenomenon of diffraction through a circular aperture and distribution of the irradiance on the image plane modeled by the profile of the Airy disk—order zero—and by the concentric rings next—first, second, ..., diffraction order
To quantify the resolving power, that is the maximum possible resolution level of an optical system (hypothesized with zero aberrations), we consider two light points of equal intensity on the object. The ability to see the two light points separated in the image plane is evaluated, that is, what can be the minimum separation distance between the centers of the respective circular light areas, described by the relative Airy disks. For an optical system with focal f (for f d ), the first zero is with the first dark ring surrounding the Airy disk (see Fig. 5.21) at the radius δ: λf (5.16) d where d is the diameter of the lens, λ is the wavelength of light, and the numerical constant is linked to the geometry of the aperture. In the visible, the diameter of the light disk (the Airy disk) is approximately equal to the aperture f/# of the lens (introduced in Sect. 4.5.1). In correspondence with δ we have an angular measure θ given by θ = 1.22λ/d δ = 1.22
since δ/f =sinθ ∼ =θ (see Fig. 5.21). It is noted that on the central disk of Airy is projected 84% of the irradiance of the light spot that arrives on the image plane, considering also the second ring, the irradiance reaches up to 91%. The disks corresponding to the two bright points of the object will be visible and separated in the image plane (see Fig. 5.22) if their angular separation ϕ will be much larger than θ . Considering bright spots on the object closer and closer to each other, in the image plane, the corresponding Airy disks overlap and tend to become a single object, i.e., they are no longer discriminable (see Fig. 5.23). To minimize the aspects of subjectivity in the perception of two light points, we use the Rayleigh criterion which suggests
5.8 Resolution of the Optical System
237
Δφ
θ
θ
Fig. 5.22 Relationship between angular separation ϕ of two light points in the object plane and the corresponding angular measurement θ of the Airy disks generated in the image plane
δ
Δφ=2θ Well separated
Δφ= θ Just separated
Δφ=θ/2 No separated
Fig. 5.23 Distribution of the irradiance in the image plane generated by two luminous points positioned with different angular separation: as the angular separation decreases the maximum of the Airy disks overlap more until a single spot appears bright and more intense
two light points are still separated in the image plane if the center of a disk is on the first minimum of the disk corresponding to the other light spot. The minimum solvent angular separation is given by ϕmin = θ = 1.22λ/d
(5.17)
while the minimum distance δ describable by the two disks in the image plane is calculated with the (5.16) which is called the Rayleigh distance.
238
5 Digitization and Image Display
The quality of a system can be evaluated by considering a sample plane object where there are vertical and horizontal stripes, white and black at different thicknesses and with variable spatial frequency. Such known geometric structures can be expressed in cycles per millimeter or number of strips or (number of lines) per millimeter. To separate geometric structures with high spatial frequencies, the system must have a high resolving power. An alternative to the Rayleigh criterion, to evaluate the resolution of an optical system, is given by considering the Abbe distance r0 which roughly expresses the diameter of the Airy disk considered to be acceptably in focus in the image plane at the distance q from the optical center: λ·q (5.18) r0 ∼ = d It can be observed how r0 varies inversely to the diameter of the optical system, thus increasing the resolution. Likewise, to get a good resolution, light with a shorter wavelength λ can be used, for example, using ultraviolet light rather than visible light as it does in microscopy to highlight very small geometric structures. Summarizing, from the analysis of the phenomenon of diffraction, it emerges that the parallel light rays of a point source through a fissure (rectangular or circular aperture) due to the diffraction generate a larger and slightly blurred light spot. Even an optical system, despite assuming zero aberrations, focuses the parallel rays of a point source in the focal plane and the lens is in fact in the same physical conditions of the light crossing from an aperture undergoing the same phenomenon of diffraction that produces an extended light spot, i.e., the irradiance distribution described by the Airy disk. Consequently, the best an optical system with zero aberrations can do is to focus the light rays emitted by points of the object within the Airy disk in the image plane and in this case, it is said that we have an optical system with limited diffraction. This is a physical limitation of optical systems that have lenses with finite diameter dimensions. For example, a lens with focal length f = 200 mm, with diameter d = 50 mm, and operating with a dominant wavelength of λ = 540 nm (near at green color) would have an Airy disk with diameter: 2 × 1.22 × 0.540 × 10−6 × 0.20 m λf = = 5.3 µm d 0.05 m Therefore considering only the phenomenon of diffraction, the central disk of Airy would have a diameter of 5.3 µm even if small enough to consider it still a point for the geometric optics, in reality, an optical system cannot be idealized without considering at least the limited diffraction. In real optical systems, the resolution gets worse when optical aberrations must be considered. Even the human visual system suffers the phenomenon of diffraction. The diameter of the pupil is variable. Considering the diameter of 2 mm and λ = 540 nm, the minimum angular resolution is about 1 minute of arc (i.e., 1/60◦ ≈ 3 · 10−4 radians). If we assume a focal length of 20 mm, the resolution on the retina can be estimated around 6.7 µm almost twice the space of separation between two adjacent 2δ = 2 × 1.22
5.8 Resolution of the Optical System
239
photoreceptors on the retina (remembering that the cones and rods have a diameter between 3 ÷ 6 µm). A quantitative measure of the resolving power for an image formation system is given by the inverse of the separation distance (5.16) or by the inverse of the angular separation (5.17). Normally, the linear resolutive power, that is the ability to reproduce the details of the object in the image plane, is expressed in terms of lines per millimeter through the cutoff frequency Ft (known in the literature as optical cutoff frequency) given by 1 d = Ft = (5.19) r0 λ·q That determines the maximum frequency that the optical system can pass to contribute to the formation of the image. This implies that if you have filtered (cropped) significant spatial frequencies of the object, its image will not be resolved completely by obtaining an image with attenuated contours. In some contexts, if the magnification factor M is known, it is useful to apply the (5.19) in the object plane and in this case, you would have 1 d (5.20) Ft = M λ·q It is observed that the resolving power increases with the diameter of the optical system and decreases with the increase of the focal length. The cut frequency indicated is valid for incoherent light (for coherent light would be Ft /2).
5.8.1 Contrast Modulation Function—MTF The peculiarity of an optical system is to produce clear images not only in terms of spatial resolution but also in terms of contrast. This last aspect concerns the ability to reproduce the spatial distribution of the luminance of the object as much as possible. In the previous paragraphs, we have seen how the optical system, in conditions of limited diffraction, remodels the luminous energy of the object in the image plane and what are the limits of reproducibility of the spatial structures. Let us now propose to evaluate quantitatively how the initial contrast of the object is modified in the process of image formation. This is achieved by evaluating for the optical system the function of transferring of the contrast modulation known in the literature as Modulation Transfer Function—MTF. In reality, an approach is used that combines spatial and contrast resolution characteristics (that is, how the irradiance varies from point to point in the image). For this purpose, sample objects are used, that is, to say transparent test target that shows horizontal and vertical bars with different spatial frequencies alternating between light and dark or with gray levels modeled with sinusoidal functions. The contrast or modulation of light intensity is defined as follows: M =
Imax − Imin Imax + Imin
(5.21)
240
5 Digitization and Image Display Object Modulation M=0
Ii Output image Modulation M
1 1 and y > 2uc 2vc
(5.39) (5.40)
5.9 Sampling v
1/Δy
1/Δx
vs
Fig. 5.33 Under-sampled image spectrum
251
us u
If the relations (5.39) or (5.40) are satisfied, the periodic replicas of F(u, v) si sovrappono (see Fig. 5.33) obtaining a a distorted spectrum (I (x, y)) from which it is not possible to fully recover the spectrum F(u, v) and (5.36) is no longer satisfied by the sign of equality. This type of distortion introduced, due to incorrect sampling, is called the Moiré effect for images, while in the 1D case, it is called an aliasing error (see Fig. 5.34). One way to remove this error is to filter the image, with a lowpass filter, in order to cut the high frequencies, and to satisfy the Nyquist conditions. Figure 5.34a shows the effect of aliasing considering two sinusoidal signals B and R, respectively, with frequencies FR = 0.03 and FB = 0.13. By sampling with frequency FS = 0.14, we can make the following consideration. If we want to sample the two sinusoids with the sampling frequency, FS would be the sinusoid B sampled with FS = 0.14 < 0.26 = 2 · FB violating the Nyquist criteria. In these conditions, instead of having a correct reconstruction of the sinusoid B, the sinusoid R is obtained as an aliasing effect. The latter would instead be correctly reconstructed, since the Nyquist criterion was respected, in fact, it would be FS > 2 · FR = 2 · 0.02 = 0.04. In other words, if we consider the original signal, the sinusoid B, and we sample with FS , we obtain the sinusoid R, that is the effect of aliasing. If, however, we consider the signal to be rebuilt the sinusoid R, it is correctly reconstructed. In Fig. 5.34b, c, instead, the effect of Moiré is shown, simulating the sampling in the 2D with two grids R and V of overlapping vertical lines of which the R has constant frequency FB = 0.83 cycles/mm, whereas with the varying frequency V grid, sampling is simulated. The grid R to be reconstructed appears correctly only when superimposed on grid V , which has a sampling frequency satisfying the relation FS = 1.78 > 2 · FR = 2 · 0.83 = 1.66, according to the Nyquist criterion, which is instead violated in all the other examples, thus causing the effect of Moiré (it manifests itself with light and dark vertical areas). In Fig. 5.34d is shown instead the effect of Moiré caused by the aliasing generated in the discrete sampling of a grid by overlap with another grid rotated with different angles even if the parallel lines in both have the same frequency. In television and photography, this effect is normally seen when a person wears clothes with particular textures and weaving. This is due to
252
5 Digitization and Image Display
(a) R B FR=0,02 FB=0,13 0
1
2
3 4
5
6 7
8
9 10 11 12
FS=0,14 Δx=7mm
(b) FS=0,81
(c)
Δx=1,2mm FR=0,83
(d)
FS=0,76
FS=0,73
FS=0,70
FS=1,78
FS=0,87
8°
FS=0,91
FS=0,96
FS=1,02
10°
Fig. 5.34 Distortions introduced due to sampling effect which violates Nyquist’s conditions. a Aliasing effect for a sinusoidal signal. b and c The effect of Moiré simulated with grids of parallel lines, one of which with variable frequency. d The Moiré effect obtained with overlapping grids with different angles and the Moiré effect visible on an image acquired with a digital camera with inadequate spatial resolution
the process of scanning the interlaced video signal that reconstructs with a sampled data a signal with geometrical structures different from the original ones. We summarize the fundamental aspects of the sampling theory: 1. Finite-sized image, seen as a 2D function is assumed with limited bandwidth (the Fourier transform is zero in the outer range of the frequencies considered). 2. Sampling modeled as a replica of the original image spectrum. The sampled image is represented (see Fig. 5.29) from the samples f (kx, ly) obtained by sampling the original continuous image f (x, y) with a 2D grid with resolution (x, y). 3. The Nyquist conditions allow to faithfully reconstruct the original image starting from the sampled image if the latter has been sampled with limited band from Fc
5.9 Sampling
253
(maximum frequency present in the image and respecting Eq. (5.37) that binds the spatial range sampling rate (x, y) with the limit frequency uc (band limited). 4. The Nyquist conditions in terms of frequency impose that the sampling frequency (us , vs ) must be greater than or equal to the limit frequency (uc , vc ) of the band 1 1 x = us ≥ 2uc and y = vs ≥ 2vc 5. The aliasing phenomenon (Moiré effect in the 2D case) occurs if the conditions 3 and 4 are not maintained, i.e., us < 2uc e vs < 2vc , and produces distortions in the sampled image. 6. The phenomenon of aliasing produces distortions in images (weaving and annoying texture not present in reality) observable that cannot be eliminated by successive filtering. It can be eliminated only if before sampling the image is filtered to attenuate the high frequencies and satisfy the condition of point 4. 7. The reconstruction of the original image f (x, y) is obtained through an interpolation process described by Eq. (5.38). In reality, image acquisition systems (scanners, cameras, tv-cameras, etc.) do not operate in the idealized conditions indicated above. The image cannot be described as a function with limited bandwidth. The sampling grid is not always regular and this generates the aliasing phenomenon. For example, in a camera with CCD sensors that must sample the continuous optical image f (x, y) with a 2D grid with resolution (x, y), the electronic scanning mechanism that digitizes the image consists of a matrix of photoreceptors (for example of 1024 × 1024 as shown in Fig. 5.35, where each photoreceptor has dimensions 10 µm ×10 µm). Figure 5.35 shows that for physical reasons, there exists a space between photoreceptors and consequently, the spatial sampling period is smaller by violating the Nyquist conditions (Eqs. 5.39 or 5.40). This involves the Moiré phenomenon which eliminates the high spatial frequencies present in the image. The area occupied by each photoreceptor is centered in the 2D sampling grid nodes. Each photoreceptor transduces the incident irradiance on this area generating a sampling signal modeled by a rectangular impulse (expresses the sensitivity of the sensor) whose width is given by the photoreceptor dimensions. With the smaller sampling signal, you have the effect of reducing the high-frequency energy present in the image. We have already noted earlier that this problem can be eliminated before sampling (before acquisition) by defocusing, for example, the image, thus minimizing the effect of Moiré. Some modern digital cameras mount an optical filter, called an anti-aliasing filter, directly on the sensor, which aims to attenuate the spatial frequencies above twice the sampling frequency of the sensor itself. Obviously, this on the one hand eliminates the annoying phenomenon of Moiré on the image to the expense of the resolving power of the optical system and sensor. The technological challenge among various manufacturers of digital cameras is that on the one hand to increase the spatial resolution of the sensor in terms of photoreceptor density for a given format (for example, full frame) but at the same
254
5 Digitization and Image Display x
Sampling
i Photoreceptors
0
0 Δx
Rectangular pulses
Δy f(x,y) y
age l im sica Phy p S am
j led
ima
ge
Fig. 5.35 Electronic scanning for digitization with a camera with CCD sensors
time, reducing the size of photoreceptors (the current limits around 5 µm) electronic noise is generated with the consequent need to attenuate it with image smoothing techniques. In other words, it is necessary to balance the size of the photoreceptors with the electronic noise that are closely related: size reduction implies an increase in resolution but with the correction of noise, we lose in resolution. A technological solution consists in realizing large format sensors with a high number of photoreceptors. Finally, remember that the resolution of a sensor is also linked to the technology that encodes the color information. For example, the RGB color coding, according to the Bayer coding, requires that a pixel of the image includes a 2 × 2 matrix of photoreceptors (see Sect. 3.26).
5.10 Quantization While sampling has defined the optimal spatial resolution of the image, to complete the image digitization process (Fig. 5.9), it is necessary to define the appropriate radiometric resolution (brightness levels), i.e., with which numerical accuracy the pixels will represent the luminous intensity of the original object. In Sect. 5.3, we had already anticipated that the analog signal generated by a camera had to be converted into a digital signal through the quantization process (see Fig. 5.8). The sampling theory has shown that it is possible to faithfully reconstruct, under certain conditions, a sampled signal without losing information. The quantization process always involves a loss of information and therefore, an error on the signal. The quantization process converts the continuous intensity values, at each sampling point of the image, into a numerical value that can assume values included in a set of values in relation to the number of assigned bits. For a binary image (pixels represented with a value of 0 or 1), it is sufficient to assign 1 bit to each pixel. Normally, one pixel is assigned 8 bits and in this way, the input range of light intensity (Imin , Imax ) can be converted to 256 levels (28 levels,
5.10 Quantization
255
Fig. 5.36 Quantization of an image with different quantization levels: from the maximum value of 8 bits up to 1 bit corresponding to the binary image
normally between 0 and 255) by storing the pixel value in a byte in the computer memory. In Fig. 5.36, you can see the effect of quantization on a quantized image at different light intensity resolutions and it is observed that starting from the 8-bit image of good quality as the quantization bits decrease (i.e., levels of Intensity) degrades the image quality. If the quantization scale is composed of intervals of equal amplitude, the quantization is called linear or uniform, otherwise it is called nonlinear. The quantization operation, being a discretization process, involves assigning approximate values defined with the quantization error. In Fig. 5.37a is shown graphically the step transformation function to quantize the intensity levels of an image I with minimum intensity values Imin and maximum Imax . The subdivision of the input ranges (on the abscissa axis, s1 , s2 , ..., sL , sL+1 ) and output (on the ordinate axis, q1 , q2 , ..., qL−1 , qL ) which establish a first characteristic of the quantization function Q(I ) that is how the input values (light intensity) are quantized in output intervals. A second characteristic of the quantizer is highlighted by the uniform division of the input values that have been subdivided with equal intervals in input and output. This means giving equal importance to the intensity levels of the image. Basically, a quantization function Q(I ), for each input value I , which is in the interval (sk , sk+1 ),
256
5 Digitization and Image Display
(a)
(b)
Uniform Quantizer
Output levels
I’
I’ Levels of Reconstruction s 0
I Input levels
I Thresholds of Decision
Fig. 5.37 a Uniform step quantizer with 7 quantization levels and b nonuniform quantizer
maps a quantized value (a discrete value) of Ik associated with the output interval qk − mo: (5.41) Ik = Q(I ) if I ∈ (sk , sk+1 ) Normally, we assign to the quantized value Ik the central value of the interval qk s +s (qk = k 2k+1 ) imagining that the value of input I falls back to the center of the input interval (sk , sk+1 ). In this case, the quantization error, in the hypothesis of uniform quantization, would result in ε = Ik − I that would assume the maximum value of ε = (qk+1 − qk )/2 while it would be null if the input value would correspond to one of the output levels qk . It is pointed out that the quantization process is irreversible as there would not be the inverse function of Q(I ) which would faithfully reproduce the original signal. In Fig. 5.37b is shown a step quantizer with nonuniform distribution for input and output intervals. The sk input ranges define how to divide the input dynamics and are called decision (or transition) levels. The quantization intervals qk are called reconstruction (or significant) levels and in relation to their number (less bit implies more degraded discrete signal), the extent of loss of the original signal is characterized. For a B bit quantizer, the reconstruction levels are L = 2B . In Fig. 5.38 are shown the decision and reconstruction levels for the 2, 3, and 4 bit uniform quantizers. It can be easily understood that the quantization error is as small as it increases the value of N, the number of bits to discretize the output values. An estimate of the quantization error can be obtained from the difference between the original image and the quantized one by calculating the mean square error (MSE) (described in Sect. 6.11). Now let’s see how the input and output intervals are chosen, that is, the choice of decision and reconstruction levels, to keep the quantization error (also called quantization noise) minimal. Referring to Fig. 5.37, the function of step quantization maps the decision thresholds sk (along the axis of the input values I ) with the quantized output values (along the axis of the discrete values I ). If we consider the input values divided with equal intervals (with = sk+1 − sk for each k-mo interval),
5.10 Quantization
257
Bits = 3 Levels = 8
0
255
Decision intervals
255
Decision intervals
0
8 Intervalli di output 0 0
16 Levels Bits = 4 Levels = 16
16 Intervalli di output
Bits = 2 Levels = 4
4 Intervalli di output 0
4 Levels
255
255
255
Uniform quantization functions at 4, 8 and 16 reconstruction intervals 8 Levels
0
255
Decision intervals
256 Levels 1000
1000
1000
800
800
800
600
600
600
400
400
400
200
200
0
200
0 0
50
100
150
200
250
0 0
50
100
150
200
250
0
50
100
150
200
250
Non-quantized image histogram
Fig. 5.38 Step uniform quantizer with N = 2, 3, 4 bits. The estimated quantization error with the mean square error (MSE) is, respectively, MSE = 71.23, 34.08, 9.75 for N = 2, 3, 4 bits
the estimate of the mean quadratic error for a quantized value I of the value of input I , assumed with the same probability within the considered input range, is given by ε =E I
I −I
2
I +/2
2 I − I dI
=
(5.42)
I −/2
In this case, the uniform division of the output intervals gives an optimal quantizer. If, however, the input values I within ask interval are not equally probable, then the quadratic error must be weighed by the probability density function p(I ) for each interval. More generally we can think of I as a random variable with probability density p(I ), to evaluate the error introduced by the quantization process, we can use the Lloyd–Max method that minimizes the mean quadratic error given by ε=E
I −I
2
Imax 2 = I − I p(I )dI
(5.43)
Imin
For an L-level quantizer, Eq. (5.43) can be rewritten as follows: s M −1 N −1 L k+1
2 1 ε= I (m, n) − I (m, n) = (I − qk )2 p(I )dI MN m=0 n=0
(5.44)
k=1 sk
where qk is the k-mo significant quantization level, sk is the lower extreme for qk , and considering image I of size M × N . Assuming the input values I (m, n) independent and identically distributed random variables, the central limit theorem states that the distribution of the sum of all the values of I (m, n) converges asymptotically toward a normal random variable also called Gaussian distribution. With the Gaussian distribution of the probability density function p(I ), we have that s1 = −∞ and sL+1 = +∞. The minimization conditions of (5.44) are made by differentiating
258
5 Digitization and Image Display
with respect to sk and to qk , and then equaling to zero. Therefore, evaluating the ∂ε = 0, we obtain differentiation ∂q k ∂ε =2 ∂qk
sk+1 (I − qk )p(I )dI
per
1≤k≤L
sk
and solving with respect to qk , we have sk+1
I · p(I )dI s qk = k sk+1 p(I )dI sk
(5.45)
which is associated with the input interval (sk, sk+1 ). By differentiating the (5.44) ∂ε = 0), we get with respect to the decision threshold sk and equaling zero ( ∂s k ∂ε = (sk − qk−1 )2 p(sk ) − (sk − qk )2 p(sk ) = 0 ∂sk and solving with respect to sk , we have qk+1 + qk sk = (5.46) 2 considering that sk−1 ≤ sk . Equations (5.45) and (5.46) indicate that the optimal decision thresholds sk are exactly in the middle between adjacent qk quantization levels (significant levels) which, in turn, are at the center of mass of the probability density function in the decision thresholds. Figure 5.39a shows an optimal symmetric quantizer with 8 levels of quantization for data with Gaussian distribution and with uniform distribution (Fig. 5.39b). It can be noted that the mean square error ε in the Gaussian distribution (i.e., nonuniform distribution) and the levels of reconstruction qk are closer to each other in areas of higher probability, on the contrary, their distance increases in areas of low probability. For the uniform distribution of input data, the optimal quantizer results in a uniform division of the reconstruction levels.
p(I)
(b)
(a) p(I)
0
I
ΔI
0
sL+
I
Fig. 5.39 Optimal symmetric quantizer for data with: a Gaussian distribution; b modeling with rectangular parts of the probability density function
5.10 Quantization
259
Assuming a Gaussian distribution p(I ) with variance σ I and mean μI , if the L min , the quantization error quantization levels are uniformly spaced by I = Imax −I L I I ε is uniformly distributed in the − 2 , 2 interval. The mean square error is given by −I /2 2 1 I 2 ε= I − I dI = (5.47) I 12 I /2
If image I has a dynamic Imax −Imin range and uniform distribution, the variance σ I 2 is 2 min ) min given by (Imax −I . With a uniform quantizer with B bit, it results in I = Imax2−I . B 12 From this, it follows that ∈ = 2−2B ⇒ SNR = 10 log10 22B = 6B dB, (5.48) σI2 that is, the Signal-to-Noise Ratio (SNR) reached by an excellent average quadratic quantizer for a uniform distribution is 6 dB per bit expressed in decibels. The encoding information theory that in general calculates the amount of error introduced in relation to the number of bits associated per pixel, even if it does not provide the ideal quantization function, presents a guideline on optimal conditions to minimize the error quantization that can never be zero. In general, the signal-to-noise ratio for quantization is given by σI2 SNR = 10 log10 (5.49) ε In the digitization of the images for the purpose of visualization, the encoding of the intensities in 8 bits is sufficient to have a good quality of the image at gray levels quantizing it with a logarithmic scale that better adapts to the response of the human visual photoreceptors. In this way, a higher radiometric resolution is produced in correspondence with the lower light intensity where the eye has a good sensitivity to the weak brightness variations, and on the contrary, it has a lower resolution in correspondence of brighter pixels where in any case the eye has a worse sensitivity. In many vision applications, 8 bits are sufficient for image quantization. In other applications, a higher radiometric resolution is required, for example, in astronomy, or in astrophysics or biomedical, a higher resolution quantization with 12 and even 16 quantization bits is required. Modern digital cameras already have sensors that quantize at 12 bits to have better color rendering.
5.11 Digital Image Acquisition Systems—DIAS In the three previous chapters, we analyzed the phenomenological aspects (light interaction matter, phenomena of refraction and diffraction of light, radiometric and geometric model) of the process of image formation and mentioned some technological solutions of the various components (image sensor, optics, device of analogto-digital conversion) involved in a digital image acquisition system.
260
5 Digitization and Image Display
In this section, we will analyze the essential components of a generic DIAS system. The phenomenological aspects of the image formation process were considered to be more in-depth rather than the component technologies, considering the strong evolution and change of the latter. For the image sensor, it was necessary to introduce the technological solutions adopted that had a notable impact in particular for the color management and the image digitization mode. In defining the components of a DIAS system, the lighting component of the scene should not be underestimated considering the remarkable progress of the illuminant technology for the various applications. We briefly summarize the components of a generic DIAS. 1. Optical System (OS). Usually made up of a group of lenses, it conveys the radiant energy of the scene (illuminated with dominant light in the visible) and generates an optical image in the focal plane of OS. The latter behaves like a linear space-invariant system that models the irradiance distribution in the image plane through the optical transfer function (OTF) and the contrast transfer (MTF) transfer function. We know that although an OS is perfect and in the absence of external disturbances (e.g., atmospheric turbulence), the image quality can be influenced by the phenomenon of diffraction that can be modeled within the limits of limited diffraction (see Sect. 5.8). 2. Image sensor. The optical image, i.e., the irradiance incident in the image plane, is converted into an electrical signal by the image sensor through a matrix of small-sized photoreceptor cells. In Sect. 5.3, CCD and CMOS image sensors have been described that generate a continuous electrical signal (video signal) with electronic scanning of the photoreceptors (for example, line by line) from which the voltage value is read (value related to intensity of incident light). These type of sensors have totally replaced those based on electronic tubes (for example, Vidicon technology) or on chemical technology based on a triacetate support on which a layer of sensitive emulsion (gelatine composed of silver halides) is extended, i.e., the classic photographic film. The modern TV-cameras and cameras based on solid-state image sensors are equipped with microprocessors not only for the transfer of pixels to the frame grabber but can process the digital image for the reduction of thermal noise, the nonuniformity of pixels, noise caused by poor light conditions producing a high level of graininess in the image. They are basically equipped with firmware that pre-processes the image to improve its quality. In addition, solid-state image sensors to handle RGB color components use a filter on photoreceptors with the Bayer mosaic called the color filter array (CFA). In this case (see Sect. 3.26), the pixel is represented by at least 2 × 2 photoreceptors that capture the RGB color information. A demosaicing algorithm retrieves the RGB pixel value through an interpolation process (image resampling). 3. Digitizer. The continuous electric signal produced by the image sensor electronics is transformed into a digital signal by an electronic device called Frame Grabber through the sampling and quantization process thus generating the digital image I of N × M pixels where each pixel I (i, j) contains luminance information. In particular, in the case of cameras with vidicon tubes (see Sect. 5.3), the sampling takes place with the electronic scanning phase of the optical image that takes
5.11 Digital Image Acquisition Systems—DIAS
261
place line by line encoding the video signal according to the standard RS-170 (monochromatic signal) or RS170A (for color), or in the European CCIR standard with which the video signal provides 625 interlaced lines, 768 pixels per line, and 25 images per second (frame rate). The analog–digital conversion of the video signal is realized with the frame grabber that stores the digital image in a buffer memory (frame buffer) to be subsequently saved on mass storage or displayed on monitor or processed. In the case instead of CCD or CMOS image sensor, having already an array N × M of photoreceptors, these actually constitute the sampling points of the optical image. For compatibility reasons, an analog video signal is generated (RS170, CCIR, ...) thus losing the spatial association of the original sampling. Modern TV-cameras and cameras based on solid-state image sensors generate the digital video signal pixel-by-pixel transmitted with various standard encodings developed specifically (RS-422, IEEE-1394, Camera Link, USB, ...). In this case, the frame grabber has a suitable electrical control to drive the activity of the camera, capture the integral digital image or a portion of it, capture synchronously or asynchronously, and with different capture frequencies (frame rate). 4. Image processor. With the development of digital image sensors (CCD and CMOS) has become strategic this component of computing with the firmware integrated into the body of a TV-camera or camera. The main basic functions are the processing of raw data acquired by the sensor for the color management (Bayer coding and demosaicing), the reduction of the electronic noise of the sensor, conversion of color spaces, correction aberrations optical, image data compression, image data encoding in JPEG and MPEG format. The most advanced versions are in fact of the miniaturized multiprocessors that manage not only the conventional images and sequences of digital images but also the production of high-definition films (Full HD, 1920 × 1080) with temporal resolution of 30 images per second with appropriate encoding of video footage (MPEG, H264, AVCHD, ...) with the possibility of generating uncompressed digital video signal HDMI (High-Definition Multimedia Interface). Analizziamo ora alcuni parametri caratteristici e differenze significative dei principali sistemi DIAS sviluppati in varie applicazioni.
5.11.1 Field of View—FoV The field of view is the portion of physical space (of the scene: work area, landscape, etc.) framed by the optical system that can be acquired by a DIAS system. The measurement of FoV can be expressed in angular terms and coincides with the angle of view of the optical system (see Sect. 4.5.1) or defining the surface (work area) subtended by the solid angle covered by the optical system and projected into the plane image (see Fig. 5.40a). The effective focal length of the optical system (to be defined in relation to the distance of the main object of the scene from the optic) and the size of the image sensor (chosen in relation to the application) define the
262
5 Digitization and Image Display
(a)
(b) W α
α
H
h d
H
h
D w
f p
q
Fig. 5.40 a Field of view and work area; b relationship between angle of view and work area
FoV measurement. In the hypothesis of an optical system that generates images not spatially distorted, as they could be those produced with small focal lengths, considering that normally the dimensions of the image sensor are rectangular (from 4 × 6 mm format, PS—Point and Shoot, to 24 × 36 mm format FF—Full Frame), the angle of field α is defined as follows: α = 2 · arctan
Dim 2q
(5.50)
where Dim indicates the measurement of one side or the diagonal of the image sensor and q indicates the effective focal distance of the image plane from the optical system. Equation (5.50) is easily derivable considering in Fig. 5.40b the rectangle triangle which is based on the diagonal d /2 of the sensor, as height q the effective focal distance, and applying the following trigonometric relation, we obtain α d = q · tan 2 2
−→
tan
α d = 2 2q
(5.51)
For not very large angles α, the approximate field angle is given by α α d ≈ tan = 2 2 2q
−→
α=
d 180d (radians) α = (degrees) q π ·q
(5.52)
In the previous equations, for the calculation of the angle of field, we have considered the effective focal distance q that coincides with the focal length f of the optical system when the object is at infinity. If the object is very close to the optical system (for example, in macro photography applications) instead, the magnification factor M = q/p must be taken into account (see Sect. 4.4.1) and considering the focal length f of the optical system, the distance actual focal q is given by Eq. (4.38) which replacing in Eq. (5.50) gives the following formula for the field angle: α = 2 · arctan
d 2f · (M + 1)
(5.53)
In some applications, the field of view of the FoV is constrained by the distance distance p of the object from the optical system and its rectangular dimensions (H × W ) with respect to which the other variables will be defined: focal length and sensor size.
5.11 Digital Image Acquisition Systems—DIAS
263
5.11.2 Focal Length of the Optical System The focal length of the optical system is the main feature of a DIAS system. The choice is linked to the application context and to the availability of finding standard lenses with focal lengths compatible with the sensory component. In Fig. 5.40b, there is graphic evidence of the relationship between the rectangular dimension of the object (FoV), sensor size, effective focal length q, and the distance of the optics from the object to be acquired (working area). From the similar triangles whose base is a sensor dimension h/2 and a size of the H /2 work area and as the respective heights the effective focal distance q, and the object distance from the optic p, the following relationships are obtained: q h·p h = −→ q ≈ f = (5.54) H p H Having defined an estimate of the focal length, the optical system can be chosen from those available by reviewing the other characteristic quantities (for example, the lens brightness) to obtain images in focus and of good quality. To minimize optical distortions, it is better to choose focal lengths that are not very small. It is convenient to increase the distance of the object from the optic in order to be able to choose lenses with longer focal lengths.
5.11.3 Spatial Resolution of Optics The quality of the image acquired by a DIAS system depends on the different components and in particular, the optical system (described in Sect. 5.8). The resolution of the optical system defines the extent to which the geometrical details are reproduced (for example, black and white bars repeated with different spatial frequencies) and the contrast of the object in the image plane. Rayleigh’s criterion establishes the limit of the angular resolution with which two adjacent geometric elements of the object are considered distinguishable in the image. This criterion is linked to the numerical aperture of the optical system: it improves the resolution with the increase of the aperture. A close correlation exists between spatial resolution and the level of contrast between objects and background that affect the quality of the image. A good quality optical system operating under limited diffraction conditions can be evaluated by spatial variation of brightness in the spatial frequency domain by means of its OTF (optical transfer function): OTF(u, v) = MTF(u, v) · PTF(u, v) where MTF is the contrast modulation transfer function, PTF is the phase transfer function. For the DIAS system, only the MTF function that is graphically represented is effective in measuring how the optical reproduces the contrast with the variation of the spatial frequencies that characterize the object. In essence, MTF measures the relationship between spatial resolution and contrast transferred from the optical system. The MTF curves provide the characteristics of the optical system that are fundamental for the appropriate choice with respect to the type of application. In fact
264
5 Digitization and Image Display
from the MTF curves, you can examine whether to choose an optic that has a poor resolution but a good quality in reproducing the contrast or an opposite or balanced solution.
5.11.4 Spatial Size and Resolution of the Sensor The size of the sensor, generally indicated with its diagonal d or with the dimension of the horizontal side h, is important for defining both the field of view (see Eq. (5.50)) and the magnification factor (generated by the optical system) between the object plane and image plane. If we consider the horizontal dimensions indicated with H and h, respectively, on the plane and the sensor, the magnification factor is given by FING = h/H . The resolution of the sensor together with the resolution of the optical system is functional for the reproduction of the details of the object in the image. For a 2D monochromatic sensor with h × v photoreceptors, there is a 1 to 1 correspondence with the image resolution that is given by h × v pixels. The real resolution limits of the sensor are given by the sampling theorem which, according to the Nyquist limit, becomes half of the physical resolution of the sensor. For example, if you choose a sensor that has a horizontal dimension of 4.8 mm and 1.024 pixel/line, the digital image is acquired with a sampling rate of 214 pixel/mm. It follows that the Nyquist limit imposes a real half resolution of the sensor equal to 107 lp/mm. The minimum resolution of the sensor RisMinSens can be estimated as follows: H (5.55) RisMinSens = 2 MinDettaglio where H is the horizontal dimension in mm of the object (that is, of the field of view) and MinDetail is the horizontal dimension of the smallest spatial detail to be detected. For example, consider acquiring a test target of 200 mm in size which has white and black vertical bars with a thickness of 0.5 mm each. The spatial detail to be acquired has a spatial frequency of 1 lp/mm. According to the Nyquist limit, the sensor to be chosen must sample the image of the object with a double resolution of at least 2 lp/mm. Therefore, the minimum resolution of the sensor must be of RisMinSens = 2 · 200/1 = 400. The sensor of 1024 pixels/line considered previously is more than enough. If the target shooting distance p is 1 m, with Eq. (5.54), we can estimate the focal length of the optics to be used which is f = 24 mm. It achieves an magnification factor of FING = h/H = 4.8/200 = 0.024. The selected sensor has a pixel size of 4.8/1024 = 0.0047 mm = 4.7 µm and a real resolution of 2 · 4.7 = 9.4 µm. With the chosen sensor, in reality, we could solve spatial details of the object with a resolution of 9.4 µm/FING = 9.4/0.024 = 392 µm equivalent to FING · 107 (lp/mm) = 0.024 · 107 (lp/mm) = 2.56 (lp/mm). In the case of an RGB image sensor, the pixel size must be considered in relation to the Bayer coding and to the demosaicing method used to associate the RGB color information detected by the individual photoreceptors to the pixel.
5.11 Digital Image Acquisition Systems—DIAS
265
5.11.5 Time Resolution of the Sensor A DIAS system to capture a dynamic scene utilizes high-speed cameras that are frequently used in order to characterize events which happen too fast for traditional camera. The frequency of the images to be acquired depends on the type of sensor. Modern CCD and CMOS sensors are capable of producing video signals by capturing image sequences with different frame rates (up to several hundred images per second) in combination with various spatial image resolutions.
5.11.6 Depth of Field and Focus When you look at a picture or a movie, the main object is in perfect focus while the other objects in the scene are appreciably visible but are increasingly blurred as they move away from the main object in focus. This spatial interval, in which the objects of the scene are still acceptably visible by the human observer, is defined as Depth of Field—DoF. For an image acquisition system, it is useful to quantify the value of the depth of field that varies in relation to the application field (photography, astronomy, microscopy, etc.). An objective evaluation of depth of field is obtained by analyzing the aspects of geometric optics in the image formation and the subjectivity of the observer in evaluating the level of blurring of objects in the image. In Fig. 5.41 is highlighted the Gaussian optical geometry as the light points B and C are projected in the image plane, which are located far from the bright point A which is perfectly in focus. A is at the distance p from the lens and projected in the image plane at the distance q, the correct focusing distance according to the Gaussian equation of thin lenses. From the analysis of the tracing of the light rays for the points B and C, it is noticed that they would be in focus if the focal plane moved, respectively, in qb and in qc with respect to the lens. It is also observed that
D A
a c
F b
F
dC Object Plane
dB
C
f
pC p pB
Fig. 5.41 Depth of field for a thin lens
Image Plane
B
f
qb q qc
ε
266
5 Digitization and Image Display
the rays of B (the farthest point) diverge and when they reach the image plane (sensor plane) generate a blurred disk with diameter ε. The rays of point C (closer to the lens), however, before converging to the distance qc similarly generate a blurred disk in the image plane always of size ε. Basically, the points B and C, unlike the bright point A in perfect focus, are reproduced by the optical system in the image plane of the sensor as blurred light disks with diameters varying in relation to their distances with respect to the point A perfectly in focus. These blurred light disks are called Circles of Confusion— CoC. If the diameter of CoC remains within values compatible with the resolution of the human visual system, it is said that the objects of the scene are reproduced in focus and their distance is within the range of the depth of field distance. The concept of CoC must not be confused with the effects due to diffraction and optical aberrations which, as is known, contribute to reproducing a blurred light spot even when it is perfectly focused.
5.11.6.1 Calculation of the Diameter of CoC Before evaluating the DoF interval, it is necessary to define an approach that establishes the minimum acceptable CoC diameter limit for an image acquisition system considering also the subjectivity and the limits of the human vision. With reference to Fig. 5.41, we evaluate the diameter ε of CoC for a symmetrical lens of diameter D and focal length f . A light point A of the object at the distance p from the lens is perfectly in focus and is reproduced in a in the image plane (sensor plane) at the distance q from the lens. The light spots B and C of the object distant, respectively, pb and pc from the lens would be in focus, in points b and c, if the image planes were, respectively, at the distance qb and qc . Both cones formed by light rays, the one emerging from b relative to point B and the one converging in c relative to point C, intersect the image plane of the sensor at distance q producing both a blurred spot, the circle of confusion, of diameter ε. CoC projected backward in the object plane (the conjugate of the image plane) at the distance p is magnified in relation to the focal length of the lens. From the similitude of the rectangle triangles (see Fig. 5.41) based on the lens semidiameter and the CoC semidiameter of the object plane indicated with εo , and as the height the difference in distances (pb − p) and distance pb , we obtain the following equality: pb − p εo /2 = (5.56) pb D/2 from which it is derived pb − p εo = D pb that for a generic point K of the object, it results εo = D
|pk − p| pk
(5.57)
5.11 Digital Image Acquisition Systems—DIAS
267
From Eq. (4.34), we know that the magnification factor is given by M = −q/p (in this case, we will ignore the sign). It follows that CoC in the image plane is calculated with q ε = ε o · M = εo (5.58) p Resolving with respect to q, the Gaussian lens equation (4.27) is obtained fp q= (5.59) p−f Substituting in (5.58) the values of εo and q, respectively, from Eqs. (5.57) and (5.59), we obtain the diameter ε of CoC in the image plane as follows: f |pk − p| (5.60) ε=D · p−f pk Equation (5.60) may be useful to express it in terms of aperture f /# = f /D (defined in Sect. 4.5.1), which for simplicity in this context is indicated with N = f /D, and replacing we obtain |pk − p| f2 (5.61) · ε= N (p − f ) pk Equation (5.61) is valid for paraxial optical systems with the input and output pupil coinciding with the diameter D. If the magnification factor M is known together with the diameter D of the input pupil, M we can express it as a function of f of the distance p of the focused point remembering the Eq. (4.39) (see Sect. 4.4.1) from which it is obtained f (5.62) M = (p − f ) which replacing in (5.61), a new expression is obtained for the calculation of CoC given by |pk − p| (5.63) ε =M ·D pk If the focusing distance p (or pk ) has infinite value, the Eq. (5.60) becomes f2 fD = (5.64) pk Npk Moreover, if the p distance of object focus is finished while pk has infinite value CoC is given by the following: ε=
fD f2 = (5.65) p−f N (p − f ) It should be noted that the diameter of CoC depends on the parameters of the optical system, aperture N and focal length f , but also from the shooting distance that we will define better in the following with the calculation of the depth of field DoF. Recall that in the calculation of CoC, the optical aberrations and diffraction that in turn negatively affects the image with the increase of the aperture N were not taken into account. The minimum acceptable value of CoC varies with the application context and the type of image sensor technology. The main characteristic factors are ε=
268
5 Digitization and Image Display
(a) Visual acuity. When assessing the minimum value of CoC, it is necessary to know the resolution limits of the human visual system. Under normal lighting conditions, the eye achieves better visual acuity when the angle of view mainly affects the retinal foveal zone of about 1.8 mm corresponding to 5◦ , then decreases as the angle of view increases. In conditions of photopic vision with light dominating at 550 nm with open pupil up to 3 mm (to limit the phenomenon of diffraction), the eye solves 100 lp/mm which corresponds to the angular resolution of 1 . Observing at a distance of 25 cm, the eye resolves up to 1 corresponding to the separation of 75 µm. Resolution varies with changing environmental conditions: color, radiance, and accommodation. It resolves white black vertical bars interspersed with a spatial frequency of 60 cpd (cycles per degree) corresponding to 2 . It resolves two black dots on a white background if separated by 1 and the resolution is halved by inverting the colors. Two single vertical bars insulated on a white background are solved until the separation of 10 (Vernier or Nonius acuity).3 The minimum angle of resolution is 5 to appreciate the depth of an object. Resolves a single black spot on a white background with a diameter of 100 µm corresponding to the angular resolution of 82 . In order to make the acceptable size of the CoC compatible with the human visual resolution, the various manufacturers of image sensors use a heuristic criterion that consists in dividing the length of the diagonal by the size of the sensor for a constant that takes values between 1000 and 1750. For example, a sensor with dimensions of 24 × 16, with a diagonal of 29 mm, assuming 1500 as a constant would correspond a CoC diameter of 0.0194 mm; a 24 × 36 dimensional sensor with a diagonal of 43 corresponds to a CoC of 0.029 mm. (b) Observation distance and final image dimensions. Another aspect of choosing CoC is the distance with which the final image is observed and the size of the image. A heuristic criterion consists in calculating the observation distance by multiplying the focal length of the lens with the magnification factor between the acquired image and the final image to be displayed on a monitor or a printout. Normally to display a final image at a distance of 25 cm (angle of view of about 60◦ ), CoC is chosen with a value of 0.2 mm. It can take larger values for large final images to be observed from distances greater than 25 cm. The CoC value of 0.2 mm can be considered adequate to be adopted for the final image resulting 8.4 times higher than the CoC of the original image when acquired with a sensor of size 24 × 36.
3 Nonius is a method of measuring an angle with higher precision. Measuring tool used in navigation
and astronomy is named Nonius in honor of its inventor, Pedro Nunes (Latin: Petrus Nonius). This measuring system was successively adapted into the Vernier scale in 1631 by the French mathematician Pierre Vernier.
5.11 Digital Image Acquisition Systems—DIAS
269
5.11.7 Depth of Field Calculation Defined the CoC circle of confusion and correlation with the resolution of the human visual system, we can now estimate the limits of the depth of field DoF. Returning to Fig. 5.41, we consider the tracing of the light rays of points A, B, and C from the object plane side. It shows how the db distance of point B farther away is asymmetric than dc that of the nearest C point. The sum of these asymmetrical distances is precisely the distance of DoF to compute. According to Eq. (5.58), we know that the diameter εo of CoC in the object plane is magnified by the factor M compared to its CoC conjugate ε of the image plane. To simplify the calculation procedure of DoF, we consider an approximation of the diameter of εo as follows: ε εp εo = ≈ (5.66) M f having approximated M = q/p ≈ f /p and focused A at distance p with aperture N = f /D. With this simplification, we can consider the triangles that have as their basis the diameter εo of the CoC in the object plane and the diameter D = f /N of the lens, and for heights, respectively, dc and p − dc (height of the cone relative to point C closer to the lens). For the similarity of the triangles, we have the following equality: p − dc dc · f dc p − dc =⇒ = εp = f ε·p f /N f N
from which resolving with respect to dc , you get ε · N · p2 (5.67) +ε·N ·p Similarly, it is calculated db considering the triangles relative to the farthest point B acceptably in focus. Always for the similitude of the triangles, you get db p + db db · f p + db =⇒ = εp = f ε · p f /N f dc =
f2
N
from which resolving with respect to db , you get ε · N · p2 f2−ε·N ·p of the depth of field is given by db =
The total distance dtot
dtot = db + dc =
2ε · N · p2 f 2 f 4 + N 2 ε 2 p2
(5.68)
(5.69)
The term N 2 ε2 p2 can be ignored when the circle of confusion in the object plane is much smaller than the aperture (εo N ) and therefore, the (5.69) becomes 2 · ε · N · p2 (5.70) f2 It is highlighted how the depth of field dtot depends linearly on the opening N , explaining why a large numerical aperture must be set, for example, f /22 to have a dtot ≈
270
5 Digitization and Image Display
large depth of field. On the other hand, DoF varies with the square of the focusing distance p: this explains why focusing on more distant objects, a greater depth of field is obtained while keeping the numerical aperture fixed. Given the simplification introduced to derive Eq. (5.70), the calculation of DoF is not accurate for applications concerning macro shots. DoF is inversely proportional to the square of the focal length f , and this implies a strong reduction even with small variations of f and for large focal lengths. In order to keep the depth of field constant as the shooting distance varies, once the opening N and the confusion circle ε have been fixed, it needs to increase f (e.g., zooming) and reduce the shooting distance p by the same factor. DoF varies linearly with the variation of CoC which in turn depends on the size of the image sensor. Reducing the size of the sensor reduces CoC but to maintain the same angle of view, the focal length must be reduced, which produces a significant increase in DoF being inversely proportional to the square of f .
5.11.8 Calculation of Hyperfocal From Eq. (5.68), which calculates the db distance of the farthest object, acceptably in focus, if one imposes that it assumes infinite value (pb = ∞), the following relation must be valid: f2 (5.71) ≈ piper p≥ Nε The distance piper is called hyperfocal distance, i.e., the distance at which the optical system is focused to obtain the maximum depth of field possible for a given focal length f and a certain aperture N . If in Eq. (5.67), the focus distance p is replaced with the hyperfocal distance (i.e., p = piper ) given by Eq. (5.71), after appropriate simplifications, the (5.67) is reduced as follows: piper dc = (5.72) 2 This means that if you focus on hyperfocal distance, the depth of field range starts from the minimum distance piper /2 to infinity. If we know the minimum acceptable distance pc in focus, from (5.72) we obtain the hyperfocal piper = 2pc which substituted in (5.71), it is possible to estimate the aperture N given by f2 (5.73) 2εpc In conclusion, the depth of field is conditioned by CoC which in turn affects the size of the image sensor: these are the structural features of the image acquisition system. DoF can be adapted in relation to the applications by varying the shooting parameters instead: aperture, distance of the object in focus, and focal length. N≈
5.11.9 Depth of Focus While the depth of field is calculated, the minimum and maximum distance of the objects that are acceptably in focus, in the estimation of the depth of focus, once
5.11 Digital Image Acquisition Systems—DIAS
271
Depth of Focus
Ib D B
A
C
Ic a c
F b
F
ε
f
f
q
p
Image Plane
Object Plane
db dc
Fig. 5.42 Depth of focus for a thin lens
focused an object, is calculated what is the minimum and maximum distance of the image plane in which a point of the object is acceptably in focus (sensor plane). In particular, it is verified that the diameter of the shaded spot does not exceed a certain value called diameter limit of the confusion circle (DLCoC). With reference to Fig. 5.42, we consider the a point the projection in the image plane of a point A of the object placed in perfect focus where the image sensor is positioned. Points closer to the point in focus would fall, respectively, to a distance db and dc . At these distances, we have the intersection of the image plane Ib and Ic where they would have been in focus, respectively, the most distant and closest points. In the image planes Ib and Ic , it is indicated the circle of confusion with limit diameter DLCoC of acceptability to consider in focus the most distant and close points in relation to the points focused in a (plane of the sensor). From the similarity of the triangles (see Fig. 5.42) that have as base DLCoC and the diameter D of the lens, and for heights, respectively, db and the distance q of the image plane from the lens, we obtain the following relation: ε D = db q
(5.74)
Recalling the equation that relates the magnification factor M with the distance of the image plane q (see Eq. (4.38) of Sect. 4.4.1), introducing the opening N = f /D, substituting in (5.74), and solving with respect to the distance db , it is obtained as db = εN (1 + M )
(5.75)
The distance dc of the points closest to the lens is equal to db , considered the symmetry of the triangles (in particular for small values of db and dc ). The total depth of focus results (5.76) dtot = 2ε · N (1 + M ) We highlight the advantage of (5.76) for calculating the depth of focus dtot through determinable quantities such as the magnification factor M , the diameter of the confusion circle ε, and the aperture N , instead of directly using the db distance
272
5 Digitization and Image Display
which cannot be easily determined. In fact, from (5.74), we note that the calculation of the distance db depends on q which is also not easily measurable. The magnification factor M depends on the shooting distance p and the focal length f and cannot always be estimated. If the magnification factor could not be determined, for estimated values of very small M , the (5.76) would be reduced to the following expression: (5.77) dtot ≈ 2ε · N The dimensions of the circle of confusion ε depend on the application context (photography, astronomy, etc.) as already highlighted above. In the conditions of validity of Eqs. (5.76) and (5.77), it is useful to express the aperture N as a function of the magnification factor M or of the total distance of the depth of focus dtot , as follows: N≈
dtot 1 2ε 1 + M
N≈
dtot 2ε
(5.78)
5.11.10 Camera The formation of the image was known since the time of Aristotle. In fact, since ancient times we knew a device called darkroom (in Latin camera obscura) which consists of a light-tight box with a small hole (called pinhole) on a face from which light entered and the inverted image is projected on the opposite face. Basically, the hole behaved like a lens with almost zero diameter. In the eleventh century, the darkroom was used by the Arab scientist Alhazen for the study of vision theory later translated into Europe by the monk Vitellione with the work Opticae thesaurus Alhazeni arabis. Later Giovanni della Porta wrote a treatise (Magia Nauralis) to explain the formation of the inverted image. In the following centuries, the darkroom was used by various scientists to observe the solar eclipses and by artists (Canaletto) building darkrooms by the size of a room. In 1568, Daniele Barbaro describes in the treatise Practice of the Perspective a darkroom with lens to study the perspective then used by various artists to support their artistic compositions. With the use of a lens and replacing the image display screen with a sensitive surface, the first permanent image was obtained (in 1827 by Joseph Nicephore Niepce). The camera obscura was the prototype of modern cameras equipped with an optical system with the control of the aperture and the focus. The digital image sensor now replaces the supports with photosensitive film. Exposure times are electronically controlled by a shutter. As the focusing distance changes, the lens group of the optical system is shifted relative to the sensor plane. Almost all modern cameras are digital and the various components (optic, sensor, diaphragm, memory, shutter, exposure meter, etc.) are all controlled by internal microprocessors that allow the acquisition of image sequences and even video images with different resolutions and in different formats, with the encoding of sensory data in formats (compressed or not) usable toward external display and archiving media. A camera uses in particular optical systems that operate in the condition in which the distance p of the objects and the distance q of the image plane from the lens
5.11 Digital Image Acquisition Systems—DIAS
273
α
α
hSS hFF
hFF
fSS F
wF
f fFF
Fig. 5.43 Angle of field and size of image sensors
(Fig. 5.13) is normally much greater than the focal distance f (p q ∼ = f ) and consequently, for Eq. (4.34), the magnification factor is M ≤ 1. Modern cameras are now almost all made with digital technology and the essential components, optical system and digital image sensor, together with the control electronics are strongly integrated for the acquisition of images both in static and dynamic contexts capturing sequences of images with different temporal and spatial resolutions. A scene can be acquired with a variety of cameras with different characteristics of image sensor components and optical system such as to obtain equivalent images that would be difficult for an observer to trace the characteristics of sensor and optics used. Figure 5.43 highlights how the same scene can be acquired with a given angle of view by varying the f -focal and image sensor dimensions. The images would be equivalent, if the following are kept constant: the perspective conditions (same point of observation of the scene), shooting distance p (object-lens distance), angle of view, depth of field, effects of diffraction, exposure times and shooting speed, and the wavelength of the dominant light. We have already described these features and how they relate to each other. We summarize some essential aspects to be considered to select a camera (and in general an image acquisition system) in relation to the characteristic parameters of its components, in particular, the image sensor and optical system. 1. Field angle. From Fig. 5.43, we observe the relationship between the sensor size and the focal length that are inversely proportional to each other if we want to keep the angle of field constant. If fFF is the focal length of the sensor size 24×36 mm (Full Frame—FF sensor) and fss is the focal length of the smallest sensor in size 4 × 6 mm, to acquire an equivalent image, the smallest sensor will have a focal length given by (5.79) fss = fFF · x(ss)/x(FF) where x(ss) and x(FF) are the dimensions of one side or the diagonal of the sensor to be chosen and of the reference FF sensor, respectively. For a sensor with dimensions of 4 × 6 mm, the equivalent image would have used a focal length of fss = 4/24 · fFF ≈ 0.17 · fFF that would be scaled by a factor of 0.17 with respect to the focal length used with the FF sensor.
274
5 Digitization and Image Display
2. Depth of field. From Eq. (5.70), we know that DoF depends on the aperture N of the lens, on the diameter ε of CoC, on the square of the distance p of the object to be focused, and is inversely proportional to the square of the focal f . Once the sensor size is chosen, the CoC diameter is a structural constant of the camera. In reality, even the maximum aperture can be estimated to control the phenomenon of diffraction on which the quality of the image depends. With reference to Eq. (5.16), considering the dominant light with wavelength λ = 0.550 µm, a heuristic criterion for correlating the maximum aperture N with ε is given by the following relation: d gs (5.80) ε = 2.44 · λ · N ≈ 1600 where d gs is the length of the diagonal of the chosen sensor. From (5.80), the maximum opening Nmax is thus estimated according to the sensor: d gs/1600 d gs/1600 d gs Nmax ≈ = = (5.81) 2.44 · λ 2.44 · 0.550 2112 Selecting a Full Frame sensor (FF) d gs = 43400 µm, the maximum aperture value would be Nmax ≈ 43400/2112 ≈ 22 while the CoC diameter is ε = 43400/1600 ≈ 28 µm. For a size sensor of 4×6 mm with a diagonal of 7180 µm, the maximum aperture value would be Nmax ≈ 7180/2112 ≈ 3.4 and the CoC diameter is ε = 7180/1600 ≈ 4.5 µm. Therefore, once the sensor has been chosen, with (5.80) the diameter ε of CoC is estimated, with (5.81) the maximum aperture Nmax is estimated, and with (5.79) the focal length is calculated with respect to the focal fFF . It remains to choose the shooting parameter, that is the focusing distance p to determine the DoF distance and, if necessary, appropriate sensor sensitivity value need to be chosen. If we focus on an object at the distance p = 3 m and use a focal f = 50 mm with an FF sensor, we would have a distance DoF of 2εNp2 2 · 28 · 10−6 · 22 · 32 = = 4.43 m 2 f 0.052 If instead we use a 4 × 6 mm sensor to have an identical DoF, we have seen that the maximum aperture (to limit the diffraction) is Nmax = 3.5, the focal must be scaled to f = 0.17 · fFF = 0.17· 50 = 8.5 mm, the distance DoF results dtot ≈
2εNp2 2 · 4.8 · 10−6 · 3.5 · 32 = = 4.19 m 2 f 0.00852 As expected, an equivalent depth-of-field value is obtained using cameras with different sensors but scaling with respect to the FF sensor the parameters relating to the focal length, aperture, and diameter of the confusion circle. It should be noted that the opening difference, which is larger for the FF sensor, involves a different setting of the exposure time and sensor sensitivity parameters. For the FF sensor, it is required to operate with a higher sensitivity having to compensate for the highest aperture (f /22 against f /3.5 of the smallest sensor). The latter, on the other hand, can operate with low sensitivity to compensate for the smaller opening (i.e., exposure with a high amount of light) and consequently has the advantage of operating in conditions of reducing the signal-to-noise ratio. dtot ≈
5.11 Digital Image Acquisition Systems—DIAS
275
3. Optical-sensor resolving power. Normally, the linear resolutive power, i.e., the ability to reproduce the details of the object in the image plane, is expressed in terms of lines per millimeter through the cutoff frequency Ft (see Sect. 5.8). In photography, we can take the object very far from the camera and therefore, the approximation q ≈ F is worth and considering the opening parameter N = f /D, Eq. (5.19) for the calculation of the resolutive power of the optics becomes d 1 Ft = ≈ (5.82) λq λ·N Choosing an optic with maximum brightness N = 2.8 with a focal length of f = 100 mm the resolving power, in the hypothesis of dominant light with λ = 550 nm, would result 1 1 Ft ≈ = = 649 line/mm λ·N 550 · 10−6 · 2.8 The optics diameter would be D = f /N = 100/2.8 = 35.7 mm. A micro-camera with an optic of 1.2 mm in diameter and a focal length of 20 mm would have a resolving power of Ft = D/λf = 1.2/(20 · 550 · 10−6 ) = 90.9 line/mm.
5.11.11 Video Camera These electronic devices acquire sequences of images with different temporal resolutions and with different spectral bands (from visible to thermal infrared). The first cameras were based on the vidicon technology employing photosensitive tubes similar to cathode ray tubes (CRT) used for television. The image was in analog format and consisted of a video signal sent to a monitor or sent to a magnetic media for electronic image archiving. Modern cameras instead use solid-state sensors based on CCD or CMOS technology such as those used for cameras. The Light (band in the visible of the electromagnetic spectrum) affecting the sensitive area, integrated for the duration of the exposure, is passed to a set of signal amplifiers. In CCD technology, photons are accumulated in a local photosensor register during the exposure time. Therefore, in the transfer phase, the charges are transferred from register to register until they are deposited in the signal amplifiers, which amplify it and pass it to an ADC converter. In CMOS technology, photons affecting the photosensor directly modify its electrical conductivity (or gain), which directly determines the duration of exposure and locally amplifies the signal before it is read through the use of a multiplexed scheme. This latest technology most used in recent years because of its reduced cost of production has spread for various applications maintaining the same performance of CCD. Modern digital cameras have 2D and linear sensors specialized to capture highdynamic scene events (moving objects with high speed). Digital cameras are widely used in the field of industrial automation for the inspection and handling of manufactured goods and in the field of remote surveillance, in the field of medicine and biology in particular, integrated under the microscope, and in many other scientific
276
5 Digitization and Image Display
fields such as robotics, artificial vision, television, and cinematography compared to digital cameras, they have the peculiarity of the high speed of sensory data transfer considering the time frequency of images to be scanned: number of images per second (frame rate). Another peculiarity is the real-time electronic scanning of the optical image that is converted into a digital signal and makes the sensory data available immediately (with the inter-line transfer CCD architecture) for their transfer and processing snapshot of dynamic events. The data transfer architecture is linked to the type of application. The characteristics of high spatial resolution and temporal resolution are contrasted even if cameras with high spatial and temporal resolution are available. For the television sector, the CCD architecture with interlaced inter-line transfer is widely used, i.e., the matrix sensor first reads the odd rows in an exposure cycle (television frame) and in the next cycle, the even lines are read. In this way, the image on the screen is created with the overlap of the two frames (odd and even lines) with a slight temporal drift between the two frames. In TV applications, this delay does not generate problems considering the moderate dynamics of the events taken (repetition of the image in 0.03 seconds). In sectors with high dynamic events, the architecture used is a CCD sensor with progressive inter-line transfer. In this case, the optical image is acquired with the integral and instant exposure of all the photoreceptors and the sensory data of the 2D array are transferred to the frame grabber memory. In this way, the problems of the dynamics of the scene are solved having fully exposed the whole CCD sensor matrix (image derived from a single frame). Another peculiarity of digital cameras is the ability to transfer sensory data modulated by a clock that scans the frequency of data acquisition and at the same time can be controlled by external computer the synchronism, frequency, and portion of image to be acquired with respect to the dynamics of the scene. Progressive architecture is used to make vision machines for general purposes (industrial, medical, and scientific applications) even if it has the disadvantage of not being compatible with television standards. An advantage of progressive architecture is the ability to manage sensors with very high spatial resolution and different aspect ratios. Modern cameras, even if based on progressive architecture, are equipped with internal processors capable of encoding sensorial data in the standard video format (RS170). The sensor dimensions of the digital cameras are derived from those of the Vidicon cameras spread over the years for televisions. In fact, the vidicon tubes had a diameter of 1 , 2/3 , and 1/2 (measure indicated in inches) creating images with an aspect ratio of 4:3 according to the television standard RS-170A. CCD sensors built for digital cameras have inherited this format by adding sensors in the high-definition format with a 16:9 aspect ratio. A good resolution CCD camera uses photoreceptors from 5µm ×5µm in size. The size and format of the sensor varies with the application. The technology allows the realization of sensors smaller than the traditional of 1/4 that have a sensitive area of about 4.6 × 4 mm2 and larger than the Full Frame format. The resolution is the number of photoreceptors present on the chip, currently exceeding 4 megapixels and for particular applications, exceeds also tens of megapixels. The
5.11 Digital Image Acquisition Systems—DIAS
277
CCD line sensors exceed 8 megapixels. In addition to the 4:3 standard format, the new CCD and CMOS sensors are designed for high-definition television in 16:9 HDTV format with resolutions of 1920 × 1080 pixel (full HD) and ultra-high-definition television (ultra HD) up to 7680 × 4320 pixel. Also, a video camera uses in particular optical systems that operate in the condition in which the distance p of the objects and the distance q of the plane image from the lens (Fig. 5.13) is normally much greater than the focal distance f (p q ∼ = f) and consequently, for the equation (4.34), the magnification factor is M ≤ 1. When considering a lens with f = 16 mm and 1" sensor, that is, 12.8 × 9.6 mm with diagonal d gs = 16 mm, that focuses an object at the distance of p = 2 m and considering dominant visible light λ = 550 nm, following procedure identical as for the camera in the preceding paragraph, the acquisition parameters are calculated as follows: Calculation of the diameter ε of CoC ε = 2.44 · λ · N ≈
16 d gs = = 0.01 mm 1600 1600
Calculation of maximum aperture Nmax Nmax ≈
d gs/1600 d gs/1600 d gs 16 = = = = 7.6 2.44 · λ 2.44 · 0.550 2112 2112
Depth of field calculation (equazioni 5.67 and 5.68) dc =
ε · N · p2 0.01 × 10−3 · 7.6 · 22 = = 0.75 m f2+ε·N ·p 0.0162 + 0.01 × 10−3 · 7.6 · 2
db =
ε · N · p2 0.01 × 10−3 · 7.6 · 22 = = 2.9 m f2−ε·N ·p 0.0162 − 0.01 × 10−3 · 7.6 · 2
The depth of field interval starts from the 2 − 0.75 = 1.25 m distance up to 2 + 2.9 = 4.9 m. If you focus the object at a distance of p = 5 m, the depth of field is dc =
0.01 × 10−3 · 7.6 · 52 = 2.99 m 0.0162 + 0.01 × 10−3 · 7.6 · 5
0.01 × 10−3 · 7.6 · 52 = 15.3 m 0.0162 − 0.01 × 10−3 · 7.6 · 5 It is observed that the depth of field increases significantly with the increase in the focusing distance. db =
278
5 Digitization and Image Display
Fig. 5.44 Example of a thermographic image
5.11.12 Infrared Camera Infrared energy is only a portion of the electromagnetic spectrum that bodies emit and for which considerable advances have been made in the field of microelectronic technology for the development of image detectors. All the objects emit a certain amount of electromagnetic radiation in operation of their temperature. An infrared camera4 can collect this radiation as well as a conventional camera operates in the visible. It can work in total darkness since it is independent of lighting environments. The images that are acquired are monochromatic in that the sensor does not distinguish the different wavelengths in the infrared region. However, the images that most manufacturers provide us are a colors, or rather a pseudo-color, in order to make the image monochromatic more easily interpretable to the human eye.5 Generally white refers to very hot bodies, while blue to colder Parts, and red and yellow to intermediate temperatures. But a legend bar is needed as shown in Fig. 5.44 in order to tie the color to the temperature. The resolution of the cameras is generally 160 × 120 or 320 × 240 pixels for the economic ones up to 1024 × 768 pixels with a pixel size of 17 µm. The technology employs microbolometers, a special detector that changes its electrical properties (e.g., resistance) to receive infrared radiation in the range 0.75–14 µm. Compared to other infrared radiation detection technologies, microbolometers do not require cooling.
5.11.13 Time-of-Flight Camera—ToF So far we have considered passive 2D image acquisition systems, basically gray level or RGB cameras, where the sensor captures the scene through the light reflected by objects well illuminated by natural or artificial light. We now introduce a type of cameras based on active sensors able to obtain information of the scene, for example,
4 Called
also thermographic camera or thermal imaging camera. color or pseudo-color representation of a thermogram is useful because, although man has a much wider dynamic range in detecting intensity than overall color, the ability to see fine intensity differences in light areas is quite limited.
5 The
5.11 Digital Image Acquisition Systems—DIAS Fig. 5.45 Partial classification of distance measurement methods including the Time-of-Flight methodology
279
Non-contact distance measurement methods
Passive triang. Coded light
Active
Interferometry Time of Flight
Pulsed Light
Continuous waves Modulation
depth measurements, first radiating the scene with its own light and then measuring the radiation reflected by the objects of the scene. In this case, the acquisition takes place independently of the lighting conditions. ToF is the abbreviation of Time-ofFlight which characterizes the cameras based on the technology of remote sensing capable of producing depth images illuminating the scene with laser light or LED, and measuring the time taken by the light transmitted in the beam from emitter to object and in the reverse part of the light reflected from the object to the receptor sensor. Different optical and electronic methods for measuring distance are known: the basic principles are interferometry based on the principle of interference of two separate electromagnetic waves, a technique that for the overlapping principle the resulting wave maintains the properties of the original ones; triangulation, a technique that allows us to calculate distances between points using the geometric properties of the triangles; and precisely the ToF technology that calculates the distance of an object by measuring the elapsed time between the emission and the return of the signal sent (see Fig. 5.45). In fact, the ToF technology is that of the operating principles of RADAR (Radio Detection And Ranging) standard and of LIDAR (Light Detection and Ranging) also known by the acronym LADAR (LAser Detection And Ranging). These technologies differ in the fact that RADARs operate with radio waves while LIDARs use waves with much smaller wavelengths (in visible, ultraviolet, or near infrared) and can detect distances at a finer resolution than RADAR.6 Basically, they have sensors that cannot determine objects smaller in size than the wavelengths they use. With lasers instead having a very small light beam, it is possible to realize distance measurements with high precision. These systems perform distance measurements on individual points of the scene, albeit with high
6 These
types of sensors (normally high-cost), based on RADAR and LIDAR technologies, have been used (appropriately installed on rotating platforms) initially in satellite and aerial remote sensing applications for monitoring large areas of the Earth’s surface or particular extensive areas of the territory.
280
5 Digitization and Image Display R
Se(t)
Emitter
Emitter
R
Δφ Receiver
Receiver
Measure Δφ
(b)
sync.
Counter: Start/Stop
(a)
Sr(t) Object
Fig. 5.46 Methods for calculating the ToF: a method with pulsed light; b method with modulated light with continuous waves
precision. The ToF cameras, on the other hand, use the same principles of operation as the LIDARs, but capture a 2D depth map of the scene, simultaneously capturing the distance measurements of an array of points in the scene. In particular, cameras based on 3D ToF technology have sensors capable of acquiring depth images of points in the scene, illuminated simultaneously by a laser or LED source, and then analyze the reflected light to estimate distance measurements. Figure 5.46 shows the functional diagram of the depth measurement with ToF technology. Normally the scene is illuminated with a modulated infrared light source (about 850 nm, not visible to people) and the sensor (for each pixel) determines the measure of reflected IR light. The time τ (flight time) that the light takes to travel from the transmitter (illuminator) to the object and then reflected backward is received by the sensor has a direct correspondence with the distance R, and is given by τ=
2R c
(5.83)
where c is the speed of light (c = 3 ∗ 108 m/s). Apparently, the calculation of the distance R, once the flight time τ is determined, would be simple if it were not involved the speed of light requiring an electronic control with synchronism of the signals at a very high speed. From (5.83), we have that for an object at the distance of 2 m from the camera the light is reflected in the sensor with a delay of τ = 13.34 ns. It is noted that the maximum determinable distance is closely related to the duration of the lighting pulse. For a time pulse width of t = 60 ns, a maximum distance of c · t = 9m 2 can be determined. This demonstrates the short time required and the criticality of the lighting pulse control electronics and the need to use laser or LED light to generate very short pulses. The accuracy of the distance to estimate depends on this temporal characteristic of the circuits which must determine the time-of-flight τ . The measurement of the accuracy of τ can be solved with two different lighting techniques both pulsed light and continuous wave (CW ), as shown in Fig. 5.46. Rmax =
(a) Pulsed light method. A light pulse Se (t) is transmitted at zero time and the round trip time τ is measured directly upon receipt of the Sr (t) signal. Each pixel has
5.11 Digital Image Acquisition Systems—DIAS
281
an electronic measuring device that typically activates a time counter as soon as the Se (t) signal is sent and stops at the arrival of the reflected signal Sr (t) (see Fig. 5.46a). The receiving device must accurately determine the reflected signal, whose amplitude and shape not only depends on the distance of the object point and its reflectivity properties, but also on the attenuation in the air of the reflected signal during optical path especially for long distances. This method also requires high-precision temporal pulse generators (some gigahertz) because to achieve an accuracy in 1 mm distance measurement, it is necessary to fit in a margin of error of about 7 ps for the duration of the impulse.7 Therefore, the illumination takes place during this pulse which has duration t and at the same time a high-speed counter is activated by determining the delay time τ in which the reflected signal Sr (t) arrives at the pixel. Given the very tight delay times, the sensor matrix structure of the ToF cameras requires highspeed photoreceptors, the SPAD8 photodiodes are usually used. Alternatively, the transmitter illuminates for the duration t the scene, and as soon as the reflected signal is detected, it is sampled in each pixel. Thus, two measurements Q1 and Q2 of electrical charges are obtained, accumulated in two time windows offset with respect to the radiated signal. As shown in Fig. 5.47a, while the light is emitted, the sampled charge Q1 is obtained and, when not emitted, the sampled charge Q2 is determined. Basically, the transmitter stops responding as soon as it receives the reflected signal and the sampling of the reflected energy begins in the time span less than 2 · t. Determined such measures of the charges, the distance is estimated with the following: Q2 ct (5.84) R= 2 Q1 + Q2 Each pixel of the camera has its own temporal synchronism and Therefore, it is possible to determine the depth of each pixel associated with a 3D point of the scene in the angle of view of the camera. If you use a light source that illuminates the whole scene, you can determine the depth of all the points in the scene, thus acquiring the whole image of depth. (b) Method with modulated light with continuous waves: although based on the same principles, this method operates in an indirect way for the calculation of
fact, for a resolution of the distance of R = 1 mm corresponds a temporal resolution of −12 s. t = 2R c = 6.7 ps, where 1 picosecondo = 10 8 SPAD is the acronym of Single Photon Avalanche Diode, and is a photodiode able to detect reflected light even at low intensity (up to the single photon) and to report the reception times of photons with a resolution of a few tens of picoseconds. In addition, SPADs behave almost like digital devices, so the subsequent signal processing can be greatly simplified. The basic structure of the ToF camera detector array consists of SPADs linked to CMOS timing circuits. An SPAD generates an analog voltage pulse, which reflects the detection of a single photon and which can directly activate the CMOS circuit. The latter implements the conversion from analog to digital to calculate the measurement of the delay between an emitted signal and the received signal, i.e., the arrival time of the photons in the individual pixels of the whole array. 7 In
282
5 Digitization and Image Display
(a)
Received Signal
Δt
Se(t)
τ
τ
Intensity
Signal emitted
(b) Received Signal A
Sr(t)
Sr(t) B
Time
0
π/2
π
3π/2 2π 5π/2 3π 7π/2 4π Time
Fig. 5.47 Demodulation of the reflected optical signal. a In the pulsed light method, the emitter illuminates the scene in the time period t and the signal received with delay τ is sampled for each pixel, simultaneously, using two samples Q1 and Q2 in the time window t. Q1 is acquired while the illumination is active, instead the emitter is blocked during sampling Q2 . b In the continuous wave method, the demodulation of the received signal occurs by sampling Sr (t) in four equally spaced windows of π/2 in a modulation period. From the 4 samples, it is possible to estimate the phase difference, the amplitude A of the received signal, and the B information of the ambient light
ToF, since it is basically based on the calculation of the phase difference between transmitted and received signal (Fig. 5.46b). The light is emitted continuously by a radio frequency (RF) generator resulting in light modulated in frequency (FM) or amplitude (AM) with sinusoidal or rectangular waves. When the modulated light comes into contact with an object in the visual field, the wave will be out of phase in relation to the distance of the object points from the camera.9 The light will then be reflected by the object and a detector associated to each pixel is used to determine the offset. The depth of the whole scene can be determined with a single image acquisition. Detected by the electronics of each pixel, the phase shift ϕ, between the incident and the received signal, the depth between the camera and the object is calculated as follows: c ϕ (5.85) R= · 2 2π f where c is the speed of light and f is the frequency of the modulated signal. From (5.85), we observe that the distance calculation does not depend directly on ToF but on the phase difference ϕ which must be estimated, while the modulated frequency f is known (it varies between 10 ÷ 100 MHz). Moreover, considering (5.83), the relation between phase difference ϕ and flight time τ is obtained from the previous equation, as follows: ϕ = 2π f τ
(5.86)
Let’s now see how each pixel independently, through the process of demodulation of the received signal, recovers the amplitude, the phase difference, and 9 The
phase shift of the reflected signal also depends on the energy absorbed by the object during the reflection of the incident light, on the attenuation of the signal during the return path depending on the square of the distance between object and camera. Basically, the propagation of the return signal is not instantaneous.
5.11 Digital Image Acquisition Systems—DIAS
283
the contribution of the ambient lighting. The demodulation of the received modulated signal can be performed by correlation with the transmitted modulation signal (cross correlation) [1,2]. Let Se (t) and Sr (t), respectively, be the emitted signal and the received signal modeled as simple harmonic functions defined as follows: Se (t) = 1 + a cos(2π ft) (5.87) Sr (t) = A cos(2π ft − ϕ) + B where a is the amplitude of the modulated emitted signal, A is the amplitude of the received signal (attenuated), and B is an offset that takes into account the contribution of the ambient light on the received optical signal. The crosscorrelation function between the two harmonic functions (see Sect. 6.10) is given by +t/2 1 Cer (x) = lim Se (t)Sr (t − x)dt (5.88) T →∞ T −T /2
Substituting the equations of the signals expressed by (5.87), after some simplifications [2], we obtain aA Cer (x, τ ) = cos(2π f (x + τ )) + B (5.89) 2 Using the notation of ϕ according to (5.86) and placing ψ = 2π fx, we have the solution: aA Cer (ψ, ϕ) = cos(φ + ϕ) + B (5.90) 2 In this equation, the most important unknown is the difference in phase ϕ that once determined we can calculate with Eq. (5.85) the depth R. The amplitude of the A signals (dependent on the reflectivity of objects and sensor sensitivity) and coefficient B which takes into account ambient lighting are also estimated. In particular, A is inversely proportional to the square of R due to the dispersion of the received optical signal. Therefore, if we sample the correlation function in 4 samples equally spaced over a modulation period [1], for example, by changing the illumination phase in steps of 90◦ (see Fig. 5.47b), we will have four predicted values of Cer (ψ, ϕ) indicated as follows: C0 = Cer (0, ϕ) C1 = Cer (π/2, ϕ) (5.91) C2 = Cer (π, ϕ) C3 = Cer (3π/2, ϕ) From these sampled values of the signal modeled with the cross correlation function Cer (ψ, ϕ), we can determine the estimate of the amplitude A, of the offset B and of the phase difference ϕ, given by [1] (C3 − C1 )2 + (C0 − C2 )2 A= 2a C0 + C1 + C2 + C4 B= 4 C3 − C1 ϕ = arctan C0 − C2
284
5 Digitization and Image Display
It should be noted that the phase difference ϕ is defined at less than 2π , this is known as phase wrapping (also described in Sect. 5.2 Vol. III). When this happens, the depth R is defined with a certain ambiguity Ra = 2fc which coincides with the maximum measurable distance (as shown in (5.85) by setting the maximum phase difference of 2π ). For each pixel (i, j), it is possible to evaluate the relative ambiguity Ra (i, j) due to the wrapping: ϕ(i, j) Ra (i, j) = (5.92) + n(i, j) Rmax 2π where Rmax = 2fc and n = 0, 1, 2, . . . is the wrapping number. The previous equation can also be written in the following form: R(i, j) = RToF (i, j) + n(i, j)Rmax
(5.93)
where R(i, j) is the estimated real depth value, while RToF (i, j) is the measured one considering that the wrapping number is not identical for each pixel. For frequencies f = 30 MHz, the depth ambiguity varies from a minimum of zero to a maximum of Rmax = 5 m. The solution of the ambiguity generated by the wrapping phase is known in the literature as the unwrapping phase.10 We also consider another aspect. The amplitude A and the offset B of the received signal have an impact on the accuracy of the depth measurement R. The variance of the depth measurement σR can be approximated by a Gaussian [3], given by √ B c σR = √ (5.94) A 4 2π f from which it emerges that with the increase of the amplitude A of the received signal, it improves the accuracy (it happens with small distances), the same happens limiting the interference of the ambient illumination B with the reflected signal. It is also noted that accuracy improves using high modulation frequencies, also considering that f is related to the maximum measurable distance (which would decrease with increasing f ) as seen above with phase wrapping. Since the modulation frequency is adjustable, a compromise is chosen to adapt the operation of the camera to the application. An alternative, to this effect on the choice of the appropriate frequency, is the solution adopted by some manufacturers of ToF cameras that use multiple frequencies to extend the operation to greater depth measurements without reducing the frequency of modulation. In summary, two technologies are currently available, one based on pulsed light and the second based on continuous wave modulation. The pulsed light sensors directly measure the flight time of the transmitted and received pulse. In the first category belongs LIDAR cameras that use rotating mirrors or a light diffuser (the Flash LIDAR) to produce 2D depth maps. ToF cameras with continuous wave sensors (CW) measure the phase difference between the signals emitted and received and
10 In
Sect. 5.2 Vol. III, we will resume the problem of wrapping and unwrapping in the context of remote sensing, for the acquisition of altitude maps, of extended areas of the territory, using microwave sensors on board satellites or airplanes.
5.11 Digital Image Acquisition Systems—DIAS
285
estimated by demodulation. LIDAR cameras usually work outdoors and can reach up to a few kilometers. CW cameras usually work indoors and have an operability to measure distances up to 10 m. The estimate of the depth based on the measurement of the phase difference is influenced by an intrinsic ambiguity due to the wrapping phase. The higher the modulation frequency, the more accurate the measurement, the shorter the depth range to measure.
5.11.13.1 CW-ToF Camera A CW-ToF camera is well characterized by its main component of optical-electronic demodulation, which performs the following functions for each pixel: • it processes the optical reception signal, i.e., it converts the photons, received from the light reflected by the object, into electrical charges; • performs the demodulation of the received signal based on the correlation; • synchronizes the signal emitted and received appropriately during the illumination, integration, and memorization phase of the 4 sampled values Ci (ψ, ϕ) of the received IR signal. To have a real and accurate depth measurement, the integration and memorization phases can be repeated for different periods (the number of periods depends on the integration time). The cyclicity of the measurements is strictly linked to the frequency of camera operating modulation and to the camera’s frame rate (need of high-speed shutters also to eliminate the problem of motion blur for moving objects). For example, with f = 30 MHz and acquiring 30 images per second, 106 integration periods are feasible. After the n demodulation periods, each pixel-sensor ¯ of the offset B, ¯ of the phase difference ϕ, estimates the means of amplitude A, ¯ and then calculates the depth and the 3D coordinates of the corresponding point of the scene. Several ToF-CW cameras are currently also available at low cost with CCD/CMOS based image sensor producing depth maps and scene images based on amplitude and offset. For example, the SR4000/4500 cameras of the Mesa Imaging company with continuous wave technology acquire images with a resolution of 176 × 144 pixels at a frame rate of up to 30 fps, with the accuracy of the depth of ±2 cm at the operating distance of 0.8 ÷ 5 m. The operating frequencies are 29–30–31 MHz for maximum distances around 5 m and 14.5–15–15.5 MHz for distances around 10 m. Multiple cameras can be used together synchronized, up to a maximum of six, at the same frequency set by the user (to realize for example color maps), or set at different operating frequencies. The light source is an array of 55 LEDs (Light Emitting Diodes) infrared, whose wavelength is 850 nm and is modulated at 6 frequencies above. These cameras are affordable eye-safe. This type of light is invisible to the human eye (see Fig. 5.48). The Microsoft company’s RGB-D Kinect v2 camera, used mostly for computer games, uses the ToF sensor and methodology [4,5]. Among the ToF cameras available is the one with the highest resolution of the image 512 × 424 pixels (resolution
286
5 Digitization and Image Display
Fig.5.48 The 850 nm light (infrared, invisible to the human eye) can be seen through the nightvision function of any consumer camera
Emitter IR Photoreceptors ToF
Protect io lt for LE D emit er ters
Kinect v2 RGB-D
ToF se n RGB C amera sor and IR em with wavelength similar to LEDs) for ToF sensor
itters
DSR325 SoftKinetic
R a n GB c d T am oF e r CM a o OS pti se cs ns or
rce
SR4500
ou
SE4000
Fig. 5.49 Models of some ToF cameras described in the text. SR4000/4500 from MESA Imaging, Microsoft’s Kinect v2 RGB-D, and DS325 from SoftKinetic
of the depth measurement encoded in 13 bits (0–8096)) and time resolution of 30 fps. The RGB image instead has HD resolution and 30 fps. Moreover, it operates with multiple modulation frequencies in the range 10 ÷ 139 MHz, achieving a good compromise between depth accuracy and the problem of the ambiguity generated with the wrapping phase. In [5], some evaluations on depth measurements in the range 0.8 ÷ 4.2 m are reported with the accuracy of 0.5% with respect to the measured distance. Given the good performance, they have also been tested on board autonomous vehicles in structured and nonstructured environments. ToF DS311 and DS325 cameras are built by SoftKinetic. The first works for limited depths in the range 0.15 ÷ 1 m or 1.5–4.5 m at the resolution of 160 × 120 pixels, while the second operates in the range 0.15 ÷ 1 m at the resolution of 320 × 240 pixels. Both cameras acquire images up to 60 fps. Accuracy is in the range 1.4 cm ÷ 1 m. They are basically used in applications for the recognition of hand postures. In Fig. 5.49, the described cameras are shown. In conclusion, although very useful and efficient, ToF cameras also have some drawbacks: (a) They have a rather limited resolution if we compare them with the resolutions typical of standard digital cameras.
5.11 Digital Image Acquisition Systems—DIAS
287
(b) The accuracy of depth measurements is maintained only within limited distances (about 5 m) as soon as this distance increases, the degradation becomes noticeable. (c) Another drawback of the ToF cameras is that depth measurements are influenced by the surface properties of objects (mainly by reflexivity). (d) The accuracy of the measurements is also influenced by multiple reflections of the infrared signal emitted (this phenomenon often occurs when the observed scene has concave angles). (e) Measurements can also be influenced by ambient light. As a result of these drawbacks, depth maps acquired with ToF cameras are generally extremely noisy. However, you should put these disadvantages in perspective since ToF cameras are among the only sensors that can deliver real-time 3D information, with a good high frame rate of capture, regardless of the appearance of the observed scene. Also considered the low cost of some models and their level of miniaturization increasingly pushed, the ToF technology is used for a wide range of applications also in combination with the classic passive RGB cameras. Among the most popular applications, we highlight some sectors: automotive (support for autonomous driving and increased security), industrial automation (to improve safety in automation cells where men and robots interact in the immediate vicinity), sanitary (recognition of postures and gestures).
5.12 Microscopy In Sect. 4.4.2, we examined in detail the optical configuration of a compound microscope consisting of two convergent lenses, respectively, objective and eyepiece (or ocular) fixed at the ends of a metal tube. In practice, these lenses are in turn composed of more lenses to correct and minimize aberrations although from the functional point of view, we can imagine the microscope consisting only of the two lenses: objective and ocular. The distance between objective (has a short focal length) and object can be adjusted with micrometric (for focusing) and macrometric advancements. In this way, the distance of the image plane can be adjusted to assume values of q f while the distance of the object is almost equal to the focal f of the lens (p ∼ = f ). Under these conditions, the magnification factor M = q/p 1 and the optics become fundamental to get highly enlarged and quality images. Galileo was the second who announced the invention of the microscope after Zacharias Janken. The objective near the object, with the focal fo , forms the real image, inverted and magnified (see Fig. 4.19). The real image is formed in the field stop plane of the ocular lens with focal fe . The light rays coming from this image plane pass through the ocular lens as a beam of rays parallel to the optical axis. In this second lens (the ocular), the virtual image is formed further enlarged that can be viewed by the observer or sensor. The result of the MMC final magnification power is the product of the transverse magnification
288
5 Digitization and Image Display
Mo caused by the objective lens with focal fo and that of the ocular magnification Me . Therefore, the magnification of the compound microscope is given by Eq. (4.47) described in Sect. 4.4.2 that we rewrite L · 250 MMC = Mo · Me = − fo · fe where normally the value of L is 160 mm and the constant 250 mm we remember to be the distinct vision distance (near point) associated with the human visual system. MMC is called the magnification power of a vision instrument defined as the ratio of image size on the retina when the object is observed with the naked eye viewer. It is also observed that if the object is placed at the distance fo from the objective, the magnification Mo = −q/p tends to infinity, i.e., the image is formed at infinity. If we consider a microscope configured with focal length fo = 30 mm of the objective and fe = 25 mm for the eyepiece, we obtain the following magnification: MMC = −
L · 250 160 · 250 = ≈ 50 fo · fe 30 · 25
Normally, the magnification factor is indicated with 50X with the negative sign to remember the image is upside down. The peculiar characteristics of a vision system, and in particular for a compound microscope are, in addition to magnification, the contrast and the resolving power. The contrast is related to the brightness of the objective characterized by the parameter f /# (for convenience also indicated by N ) and by the lighting modes of the object that is normally treated with dyes to better highlight it. For a microscope, it is more useful to characterize the objective in relation to the magnification power and brightness given by the numerical aperture NA defined by Eqs. (4.53) and (4.54) from which we obtain NA = no · sin α ∼ = D/2f
(5.95)
where no is the refractive index of the medium adjacent to the lens in which the object is immersed (oil, air, water, etc.), and α (α = arctan[D/2p]) is the semi-angle of maximum angular aperture of light acquired from objective (see Sect. 4.5.1). For the air, the refractive index is about 1. The approximation of Eq. (5.95) is justified for small values of the opening, large values of magnification and assumed p ≈ f for the objective. The resolution and luminosity of the microscope are determined by the value of NA of the objective which, according to Eq. (5.95), increases with the increase of the diameter of the entrance pupil or with the increase of the refractive index no . Typical NA values for the microscope objective range from 0.1 for low magnification objectives to 0.95 for dry objectives and 1.5 for oil immersion objectives. A dry objective is the one that works with the air between the object being examined and the objective. An immersion objective requires a liquid, usually a transparent oil that has a refractive index typically of 1.5 (similar to that of glass), to occupy the space between the object and the front element of the objective. This solution is much cheaper than the use of an objective with a large numerical aperture NA.
5.12 Microscopy
289
For the microscope, the resolving power is better defined if one thinks on the plane of the object rather than on the plane image. In this case, the object is placed at the distance from the objective which is almost always equal to f . Under these conditions, known as the magnification factor Mo of the microscope objective, the cutting frequency in the object plane is given by ∼ 2NA/λ Ft = Mo D/λf = D/λf = (5.96) and the diameter of the confusion disk calculated with the Rayleigh distance is given by ε = 0.61λ/NA (5.97) If the magnification power is adequate, with a not very high angular aperture, the image is of good quality with a diameter of the confusion disk with a value in the range 0.025÷0.05 mm for photography and cameras. With the magnification powers of 50× and a brightness of f /1, you can verify with the previous formulas that the depth of field is significantly reduced by a few fractions of a micron of millimeter.
5.13 Telescopic In Sect. 4.4.2, we have examined in detail the optical configuration of a telescope. In contrast to the microscopy, in the telescopic, a far object is observed appearing well enlarged. As shown in Fig. 4.20, of the object, which is at a finite distance from the objective with focal fo , the real image is formed immediately after the second focus of the objective itself. This image, that can be recorded, becomes the object of the second lens with focal length fe which is the eyepiece (or ocular). This last lens forms a maximized virtual image if the intermediate image is formed at a distance equal to or less than the fe with the eyepiece. In essence, the intermediate image is constant and only the eyepiece is moved to focus the telescope. The relation do >>d i ∼ = f is valid for the telescope. The threshold frequency Fc (expressed in cycles/radians) in the angular coordinate system centered in relation to the telescope is given by D Fc = (5.98) λ while the Airy disk has the diameter: δ = 2.44λ/D
(5.99)
The magnification of the telescope is given by the ratio between the angle θ under which the object is observed and the angle under which the same object is seen without the telescope. The magnification (see Eq. 4.49) is proportional to the focal length fo of the telescope and inversely proportional to that of the eyepiece: MT = −fo /fe Normally fo is fixed while changing the eyepieces one with different focal lengths. For example, a telescope with fo = 10,000 mm produces a magnification of 1000 if an eyepiece with fe = 10 mm is used.
290
5 Digitization and Image Display x
i
Digitization
0
0
Elaboration
i
Display
0 I(i,j)
f(x,y)
y
age l Im sica Phy
j
Dig
ital
Ima
ge
j
Elab
ed im orat
age
Fig. 5.50 Image acquisition and restitution system: with hk the functions of optical system transfer, digitization, image processing, and visualization are indicated
5.14 The MTF Function of an Image Acquisition System A complete system (Fig. 5.50) for capturing, processing, and image restitution can be considered as a sequential chain of linearly linear invariant systems and their resultant effect can be modeled with a single function PSF or with the corresponding single transfer function. The acquisition subsystem is described by the PSF functions h1 and h2 , respectively, corresponding to the optical component and to the digitization component (for example, a camera) of the system. These two components are those that normally define the image quality in terms of spatial and radiometric resolution. In particular, the optical component is assumed to be of high quality with negligible optical aberrations. The image quality is instead compromised by diffraction and normally the optical system manufacturer provides the features of the function PSF with which the limited diffraction is modeled. If the PSF of the optical system is unknown, it can be calculated experimentally. The processing subsystem described by p(x, y) assumes that it performs a processing on the image described by h3 (x, y)), for example, it is noise attenuation (the associated PSF is a Gaussian function). The restitution subsystem, for example, a monitor, is modeled by the function h4 (x, y). The MTF modulation function of the whole system is given by combining, by multiplication, the MTFk functions of each k-component of the system. The PSFk functions or transfer functions are analytically known or calculated experimentally or provided directly by the manufacturers of the individual components of the system. In Fig. 5.51 are shown the PSF and the transfer functions of the system components. Recall that the optical image is described by the PSF through the process of convolution in the spatial domain. To better quantify how the optical system reproduces the details of the object in the image plane, the OTF optical transfer function (described in Sect. 5.8.2) is calculated, which is the Fourier transform of the PSF. In the (Fourier) frequency space, the MTF function (i.e., the OTF module) described in Sect. 5.8.1 and the PTF function (i.e., the OTF phase) are defined. The latter is negligible for high-quality optical (paraxial) systems operating in incoherent light conditions. Under these conditions, the OTF optical transfer function is reduced to the MTF function, which is used interchangeably with OTF. Moreover,
5.14 The MTF Function of an Image Acquisition System Fig. 5.51 Transfer functions of the various system components: optical, digitalization, image processing, and visualization
291
Optics
x
u
x
u
Digitization
Elaboration
x
u
x
u
Visualization
Final results h(x)
H(u)
x
u
if the PSF is symmetrical (optical systems have circular symmetry), OTF is also the Fourier transform of PSF, it is symmetrical with real values. Returning to Fig. 5.50, the second component that conditions the quality of the image is related to the conversion from analogical (optical image) to digital (to the digital image) through the process of sampling and quantization. Remembering the graph of the PSF, approximated by a Gaussian, to maintain a good resolution of the image the PSF function must be as tight as possible. We know that this is controllable with the aperture of the lens (maximum possible) and increases proportionally to the value of the wavelength λ of the light. The quality of the image produced by the entire system is determined by the sequence of convolutions performed starting from the object in the spatial domain and convolving in sequence with the PSF hk , k = 1, . . . , Ncomp where Ncomp indicates the number of components of the system. Alternatively, the transfer function of the entire MTF system is given by the product of the MTFk (u, v) transfer functions of each component of the system. Normally MTF has a smaller amplitude, while the corresponding PSF function h(x, y) of the entire system has a greater amplitude demonstrating how the system introduces a slight blurring effect. A good image acquisition and visualization system must have a MTF modulation transfer function of the image characterized to attenuate the high frequencies
292
5 Digitization and Image Display
generally dominated by noise while it should leave unchanged the frequencies corresponding to the details of the images according to the Nyquist criterion which defines the cutoff frequency, i.e., the highest frequency present in an image to be maintained, and according to the Rayleigh criterion that fixes the sampling spacing in half the diameter resolution of the sampling spot. We conclude, moreover, by having modeled the optical subsystem as a spatially invariant linear system and considered valid within certain limits of system opening above certain values. In lighting conditions with incoherent light, the optical system is considered a linear system. The PSF function that describes the optical system is finite and tends to be broad due to the effect of optical aberration and diffraction due to the wave nature of light propagation. A simplified way to model the feature (PSF and OTF) of the system is to consider the optical component with limited diffraction (without optical aberrations). With this assumption, the resolution of the system depends only on the (limited) diffraction. An optical system with limited diffraction has the characteristic of transforming divergent spherical waves entering into spherical waves coming out converging from the optical system. The light energy that passes through the optical system is characterized by the entrance pupil that represents the plane containing the opening of the optical system itself.
References 1. R. Lange, P. Seitz, Solid-state time-of-flight range camera. IEEE J. Quantum Electron. 37(3), 390–397 (2001) 2. H. Radu, H. Miles, E. Georgios, M. Clément, An overview of depth cameras and range scanners based on time-of-flight technologies. Mach. Vis. Appl. 27(7), 1005–1020 (2016) 3. F. Muft, R. Mahony, Statistical analysis of measurement processes for time of flight cameras, in SPIE the International Society for Optical Engineering, vol. 7447 (2009) 4. A. Mehta, B. Thompson, C.S. Bamji, D. Snow, H. Oshima, L. Prather, M. Fenton, L. Kordus, A. Payne, A. Daniel, et al., A 512 × 424 CMOS 3d time-of-flight image sensor with multifrequency photo-demodulation up to 130 MHz and 2GS/s ADC, in IEEE International SolidState Circuits Conference Digest of Technical Papers (2014), pp. 134–135 5. T. Elkhatib, S. Mehta, B. Thompson, L.A. Prather, D. Snow, O.C. Akkaya, A. Daniel, D.A. Payne, S.C. Bamji, P. O’Connor., et al., A 0.13 µm CMOS system-on-chip for a 512 × 424 time-of-flight image sensor with multi-frequency photo-demodulation up to 130 MHz and 2 GS/s ADC. IEEE J. Solid State Circuits 50(1), 303–319 (2015)
6
Properties of the Digital Image
6.1 Digital Binary Image In many vision applications, digital images are used quantized in only two levels. A digital image B(i, j) is called binary image if there are two distinct classes of pixels that can take generic values indicated, respectively, with F (usually are the pixels representing the object in the image, also called Foreground) and F (they are the complementary pixels that represent the Background of the image). A binary image is obtained directly from a binary imaging device or from a digital image I (i, j) quantized with N levels of intensity that through an image processing converts each pixel at N levels into two distinct values A (the foreground pixels often assume value 1) and Q (the background pixel have value 0) [1–3]. If we know the threshold value S (level of intensity) in the monochrome image I (i, j) with which it is possible to distinguish objects from the background, the binary image (see Fig. 6.1) is obtained as follows: 1 ifI (i, j) ≤ S B(i, j) = (6.1) 0 otherwise where it has been assumed that the objects are dark with respect to the background having values of intensity higher than the threshold value S. If the object is known, the intensity level interval (S1 , S2 ) to distinguish from the background the binary image is given by 1 if S1 ≤ I (i, j) ≤ S2 (6.2) B(i, j) = 0 otherwise After defining the binary image, it is useful to define some geometric properties to extract some dimensional and shape characteristics of the objects in the image. To do this, let’s examine some geometric relationships and not between pixels to extract the following information: © Springer Nature Switzerland AG 2020 A. Distante and C. Distante, Handbook of Image Processing and Computer Vision, https://doi.org/10.1007/978-3-030-38148-6_6
293
294
6 Properties of the Digital Image
Fig. 6.1 Example of a binary image. a Original monochromatic image; b Binary image obtained with threshold S = 100
(a) (b) (c) (d) (e) (f)
Neighborhood Distance Connectivity Histogram Topological Correlation.
6.2 Pixel Neighborhood Discrete geometry suggests a neighborhood criterion of a generic pixel P(i, j) compared to other pixels that are to be considered adjacent to it horizontally and vertically. Given a pixel P(i, j) of a binary image, the adjacent pixels (horizontal and vertical) are defined by the coordinates: (i + 1, j)(i − 1, j)(i, j + 1)(i, j − 1) they determine a set V4 (P) of pixels, the so-called 4-neighbors of P(i, j) that is to be the 4-adjacent ones (see Fig. 6.2a). Figure 6.2b shows that the same point P(i, j) has four adjacent pixels arranged diagonally defined by the coordinates: (i + 1, j + 1)(i − 1, j − 1)(i − 1, j + 1)(i − 1, j − 1) which determine the set VD (P) of so-called 4-neighbors diagonal pixels. Figure 6.2c instead highlights the set V8 (P) = V4 (p) ∪ VD (P) of the pixels called 8-neighbors, consisting of the set of the 4-neighbors V4 (P) and the 4-diagonal neighbors of P(i, j) identifies a 3 × 3 window centered in the pixel P(i, j) of a binary image. The set of 4-neighbors, 4-diagonal neighbors and 8-neighbors for each pixel P(i, j) of a binary image will be considered, in the various image processing algorithms (metric, shape, connectivity, similarity, etc.), based on the relationship of spatial adjacency and intensity values between pixels.
6.2 Pixel Neighborhood
295
(a)
(b)
(c)
Fig. 6.2 Neighbors between pixels: a 4-neighbors of a red pixel are its vertical and horizontal neighbors denoted by V4 (P); b 4-diagonal neighbors of a red pixel denoted by VD (P); c 8-neighbors of a red pixel are its vertical, horizontal, and 4 diagonal neighbors denoted by V8 (P) Fig. 6.3 Examples of pixel distance metrics: a Euclidean, b city block, and c chessboard
(a)
(b)
(c)
6.3 Image Metric In various image processing algorithms, it is required to estimate the distance between pixels of an image. This can be done with different metrics using distance functions that estimate the distance value with a different approximation. Given the pixels Q, R, and S, any distance function D must satisfy the following properties: 1. D(Q, R) ≥ 0 and D(Q, R) = 0 2. D(Q, R) = D(R, Q) 3. D(Q, R) ≤ D(Q, R) + D(R, S)
if
Q=R
In the following paragraphs, we will introduce the most used distance functions (Fig. 6.3) in computational geometry and image processing.
6.3.1 Euclidean Distance For a digital image, defined as a 2D matrix, the Euclidean Distance DE represents a metric that is a quantitative and significant function of the distance between any two pixels of the image. If (i, j) and (k, l) the coordinates of two pixels, we define the Euclidean distance DE , the classical geometric measure obtained from the note relation: DE [(i, j), (k, l)] = (i − k)2 + (j − l)2 (6.3) The Euclidean distance is a simple measure from the intuitive point of view, but from the computational point of view, it is costly because of the root operator and
296
6 Properties of the Digital Image
the non-integer values of DE . The numeric matrix (6.4), that follows, represents the distances, calculated with the Euclidean function, relative to the set of pixels included in a disk of radius 3 having as its center a generic pixel of coordinates (i, j). It is possible to observe the large number of square root operations necessary to calculate the distance and a good discretization of the shape of the disk [4,5]. √ √ √8 √ 5 5 2 3 √2 √1 √5 √2 8 5
3 2 1 0 1 2 3
√ √5 2 1 √ √2 5
√ √8 5 2 √ 3 √5 8
(6.4)
6.3.2 City Block Distance An alternative approach for calculating the distance between two pixels is the minimum number of movements to be made on the matrix grid to go from one pixel to the other. Considering horizontal and vertical movements on the grid, you get a new measure of distance called city block or distance of Manhattan (in analogy with the distance between two points of a city imagined connected by a grid as in Fig. 6.3b): D4 [(i, j), (k, l)] = |i − k| + |j − l|
(6.5)
The numerical matrix (6.6) represents the measurements of the distances, calculated with the city block function, relative to the set of pixels included in a disk of radius 3 having as center a generic coordinate pixel (i, j). In this case, the computational calculation is reduced only to subtraction and sums with integers but an approximation of the disk to the shape of a diamond is obtained. 3 32 321 32 3
3 2 1 0 1 2 3
3 23 123 23 3
(6.6)
It is also observed that the pixels with distance D4 = 1 correspond to the 4-neighbor pixels of the central coordinate pixel (i, j).
6.3 Image Metric
297
6.3.3 Chessboard Distance The distance D8 , also called chessboard (see Fig. 6.3c) is obtained from D8 [(i, j), (k, l)] = max(|i − k|, |j − l|)
(6.7)
The numerical matrix (6.8) represents the measurements of the distances, calculated with the chessboard function, relative to the set of pixels included in a disk of radius 3 having as center a generic coordinate pixel (i, j). As for the city block distance, the computational calculation is reduced only to subtraction operation with integers and an approximation of the disk to the form of a square. 3 3 3 3 3 3 3
3 2 2 2 2 2 3
3 2 1 1 1 2 3
3 2 1 0 1 2 3
3 2 1 1 1 2 3
3 2 2 2 2 2 3
3 3 3 3 3 3 3
(6.8)
It is also observed that the pixels with distance D8 = 1 correspond to the 8-neighbor pixels of the central coordinate pixel (i, j). In summary, the distances D4 and D8 are convenient measures with respect to the Euclidean distance for their computational simplicity. An aggregate of pixels that is within a distance r (each pixel Dm ≤ r; m = E, 4, 8) is called the radius disk r. The geometric shape of this disk depends on the metric used to measure the distance. The Euclidean distance seems closer to reality (continuous image) even if it has the drawback of being more expensive in the calculation.
6.4 Distance Transform The distance transform produces a map of the distances between any points and the nearest points of interest. For example, in the field of image processing, it is useful to define the minimum distance between internal points of a region and its boundary or between a point and points of interest (feature). The transformed distance of a binary image, with the pixels of the objects labeled with a value of 1 and those of the background with a value of 0, produces a gray level image with the background pixels unaltered, while the value of the pixels of the objects corresponds to the value of their distance from the nearest background pixel. In essence, the result of the transform is a gray level image that looks similar to the input image, except that the gray level intensities of pixels inside foreground regions are changed to show the distance to the closest boundary from each pixel.
298
6 Properties of the Digital Image
The distance measurement used, between a pixel of the object and the corresponding pixel of the nearest background, can be chosen among those described previously (Euclidean distance DE , city block D4 , chessboard D8 ). Let R be the set of pixels of an object (represented in the image by a region), and I the digital image such that R ⊆ I, the general formula (in the discrete) of the distance transform DT is given by (6.9) DTR (i, j) = min (D[(i, j), (k, l)] + WR (k, l)) (k,l)∈I
where WR (k, l)) is an indicator function for the pixels of R which takes value 0 for the pixels (k, l) ∈ R, otherwise it assumes a very large distance value (normally indicated with ∞, useful in the initialization phase of the procedure for calculating the distance transform). The calculation of the distance transform, applying the (6.9) for all the points of a binary image, even of small dimensions, would have a complexity O(mn) considering m pixels of R and n pixels of I. A sequential and efficient heuristic approach (there are also parallel algorithms), for the computation of the distance transform, is based on the use of the local distance information. This approach involves essentially two steps, one step for direct image scanning, proceeding from left to right and top to bottom, and one step for scanning backward, i.e., pixels are examined from right to left and from bottom to the top. Before activating the two scans, the input image is initialized with the pixels of the background to 0, and the pixels of the object with very large value (for example ∞). (a) Step_1: Assuming as a D4 metric, in the direct scanning each pixel (i, j) is placed the current value DT (i, j) with the following value: DT (i, j) ← min{DT (i − 1, j − 1) + w2 , DT (i, j − 1) + w1 , (6.10) DT (i + 1, j − 1) + w2 , DT (i − 1, j) + w1 , DT (i, j)} where the values of the weights are w1 = 1 and w2 = 2 for the metric D4 . (b) Step_2: in the backward scanning every pixel (i, j) is placed the current value DT (i, j) with the following value: DT (i, j) ← min{DT (i, j), DT (i + 1, j) + 1, DT (i − 1, j + 1) + 2, DT (i, j + 1) + 1, DT (i + 1, j + 1) + 2}
(6.11)
With the metric D8 , using the same procedure, the weights values in the (6.10) and (6.11) are w1 = w2 = 1. If the D4 and D8 metrics are not satisfactory (the propagation of the distance is very approximate), the same procedure can be used with the Euclidean distance assigning different weights to diagonal, horizontal, and vertical distances. The diagonal distance√with the Euclidean metric, applied to digital images, would predict weight w2 = 2. This would require floating-point calculations. A good approximation of the Euclidean distance DE is obtained considering the weights w1 = 2 and w2 = 3 or w1 = 3 and w2 = 4.
6.4 Distance Transform 1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1
1 1 1
1 1
1
299
1
∞ ∞
∞ ∞
∞ ∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ ∞
Input binary image 1 1 1 1 1 1 1 1 1
1
1
1
1
1 1
1 1
1 1
1 1
1 1 1
1 1 1
1 1 1
1 1 1
∞
∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ ∞ ∞ ∞
∞ ∞ ∞
∞ ∞
∞
1 1 1 1 1 1 1
∞
Image initialization
1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1
1 1 1
Input binary image
∞
1 1 1
1 1 1 1 1 1 1 1 1
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
∞
∞
∞
∞
∞ ∞
∞ ∞
∞ ∞
∞ ∞
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
1 2 3 3 3
1 2 3 4 2
1 2 3
1 2 2
1 2
1
1 1 1 1 1 1 1
1
∞ ∞ ∞
∞ ∞
∞ ∞
∞ ∞ ∞
∞ ∞ ∞
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
∞ 0 0 0 0 0 0 0 1
2 0 0 0 0 0 0 0 1
2 1 0 0 0 0 0 1 2
3 2 0 0 0 0 0 1 2
4 2 1 0 0 0 1 2 3
5 3 2 0 0 0 1 2 3
6 4 2 1 0 0 1 2 3
7 5 3 2 1 0 1 2 3
Result after first step
1 2 2 2 2 2 1
1 2 3 2 1
1 2 3 2 1
1 2 1
1 2 1
1 1
1
1
Final result after second step
Result after first step
∞ ∞ ∞
Image initialization
1 2 2 2 2 2 2
8 6 4 3 2 0 1 2 3
9 7 5 4 2 1 2 3 4
2 1 1 1 1 1 1 1 2
1 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 1
2 2 1 1 0 0 0 0 0 0 0 0 0 0 1 1 2 2
3 2 1 0 0 0 1 2 3
3 4 2 3 1 2 0 1 0 0 0 0 1 1 2 2 3 3
5 4 3 2 1 0 1 2 3
5 4 3 2 1 0 1 2 3
6 5 4 3 2 1 2 3 4
Final result after second step
Fig. 6.4 Results of the distance transform with D4 metric calculated with the 3 steps: initialization, direct scanning, and backward scanning. The first row shows the partial and final results of the procedure applied to characterize an object through the DT image in the object recognition context. In the second row, the same procedure is applied by inverting object-background (the object becomes the obstacle to be avoided and the background the free space) to extract the map of distances useful for the navigation of an autonomous vehicle
Figure 6.4 shows the results of the distance transform with D4 metric calculated with the 2 steps (direct scan and backward scan) above, preceded by initialization. The first row in the figure shows the partial and final results of the procedure applied to characterize an object in the objects recognition context. The second row in the figure shows the results of the same procedure applied to the same test image where the object is considered as an obstacle (the procedure considers it as the current background where it does not calculate distances) while the background of first becomes the space (in which the procedure calculates distances from the obstacle) where an autonomous vehicle moves with the support of the distance map obtained with the transformed distances (context of paths planning and obstacle detection). Figure 6.5 shows the results of the distance transform for the three metrics considered, applied to the same binary image of the test. Slight distortions are observed introduced with the approximations of the D4 and D8 metrics√with respect to the non-approximated metric DE calculated with the weight w2 = 2. The distance transform is used very much as a basis for evaluating the similarity of binary images that include objects with complex shapes that can vary slightly in shape and position. The comparison takes place between prototype images and current image by evaluating a proximity measure (based on distance) rather than measuring the exact overlap. These proximity measures are based on the matching algorithms (Chamfer and Hausdorff) [6].
6.5 Path The path between two pixels A and B is defined as the sequence of the pixels S1 , S2 , . . . , Sn where S1 = A, Sn = B and the pixel Si+1 is close to the pixel Si with i = 1, . . . , n − 1.
300
Original binary image
6 Properties of the Digital Image
DT with Euclidean metric
DT with Cityblock metric
DT with Chessboard metric
Fig. 6.5 Results of the distance transform applied to the same binary image of the test comparing the three metrics DE , D4 , and D8 Fig. 6.6 Examples of pixel paths: a 4-Path and b 8-Path
(a)
(b) Q
A
A B
P
B
The simple path is a path with no repeating pixels (excluding the first and last) where no pixel has more than two neighbors (Fig. 6.6a). The closed path is a simple path with the first pixel adjacent to the last pixel. According to the definition of path and neighbors, two different paths can be defined: 4-path and 8-path that originate from the concept of neighboring to 4 or to 8. A path in a binary image defines a curve. Not always a closed curve defined by a path divides the image into two parts. A certain ambiguity often occurs based on the 4-path and 8-path with respect to our geometric interpretations. The ambiguity of belonging or not of a pixel to a curve is generated both with the criterion 4-neighbors and with 8-neighbors. In fact in Fig. 6.6b, P and Q to whom do they belong? The curve does not divide the region into two parts. A heuristic solution for binary images is as follows: consider paths based on 8-neighbors for objects and 4-neighbors for the background (or vice versa).
6.6 Adjacency and Connectivity Two pixels P and Q in an image I are connected if there exists between P and Q a path located entirely in I . Connectivity implies not only the adjacency between pixels, but also an equivalence relationship that defines some similarity criterion. For a binary image, the equivalence criterion between pixels is based on two values 0 or 1. For a gray level image, the equivalence criterion between pixels is based on possible intensity values L from 0 to 255. The intensity L can also represent a range of levels of these 256 values.
6.6 Adjacency and Connectivity Fig. 6.7 Examples of the type of connectivity and removal of topological ambiguity with the introduction of M-connected: a 4-connected, b 8-connected, and c M-connected
301
(a)
(b)
(c)
Let P, Q, and R be three pixels of image I , the connectivity relationship establishes the following properties: (a) Reflexivity, the pixel P is connected to P. (b) Commutativity, if P is connected to Q, then Q is connected to P. (c) Transitivity, if P is connected to Q and Q is connected to R, it follows that P is also connected to R. The concept of connectivity is closely linked to the concept of path and consequently to the adjacency that as we know foresees the set of 4-neighbors and 8-neighbors pixels. For binary images, three types of connectivity are achieved: (a) 4-connected, two pixels P and Q belonging to the set of pixels with a value of 1 (foreground) or to a value of 0 (background) are called 4-connected if Q belongs to the set V4 (P). (b) 8-connected, two pixels P and Q, belonging to the set of pixels with a value of 1 (foreground) or to a value of 0 (background), we say 8-connected if Q belongs to the set V8 (P). (c) M-connected, two pixels P and Q belonging to the set of pixels with a value of 1 (foreground) or to a value of 0 (background) are called M-connected if: 1. Q belongs to the set V4 (P), or 2. Q belongs to the set VD (P) and the set V4 (P) ∩ V4 (Q) = ∅ (i.e., the set of 4-neighbors of both P and Q is empty). The M-connected is a modification of the 8-connected that eliminates the topological ambiguities highlighted earlier (see Fig. 6.7). For grayscale images, if L = {i1 , i2 , . . . , ik } represents a set of gray levels, it can be used as an equivalence criterion between pixels to define three types of adjacency (4-, 8-, M-adjacency) as previously defined the three types of connectivity for a binary image. Two parts E1 and E2 of a binary or gray level image are adjacent if they have at least one adjacent pixel.
302
6 Properties of the Digital Image
6.7 Region The region R is defined as the subset of pixels of an image I in which it is possible to define a path considering any pair of pixels of R. In this case, we also mean the M-paths based on the M-connected criteria.
6.7.1 Connected Component We have already defined the concept of connectivity between two pixels P and Q, that is, when there is an entirely inclusive path in the C subset of an image. The connected component C is defined as the set of C pixels that are connected to each pixel P ∈ C, that is, for each pixel pair P, Q ∈ C there is a path that passes through n-neighbors (with n = 4, 8) of pixels entirely in C. If the subset C contains only one connected component, then C is said to be a connected set. By virtue of the connected set definition, we can redefine a region R as a connected set. It follows that the connected component and region are interchangeable terms to indicate a subset of connected pixels in a binary image (Fig. 6.8a). By definition, each pixel pair of a connected (or region) component is connected. Two regions R1 and R2 are called as adjacent regions if their union R1 ∪ R2 form a connected set. If they are not adjacent, they are called disjointed regions. Remember that the concept of adjacency is closely related to the type of connectivity if 4-connected or 8-connected. In Fig. 6.8c is shown how two regions become adjacent when used 8-connectivity, instead with 4-connectivity would result two adjacent and distinct regions as there would be no 4-path.
6.7.2 Foreground Background and Holes The foreground set F normally has all pixels with a value of 1 in a binary image. In terms of regions (connect components), the foreground F can be redefined as the union of M regions Rj , j = 1, M disjointed existing in the image, none of which includes the edge of the image. The background set F (complement of F) is the set of all connected components that also includes the pixels on the edge of the image. The other components belonging to the background F are called holes. If there are no holes in a region of the image, it is called a simple connected region. A region that has holes is called multiple region connected. We remind that to eliminate ambiguity, we use 8-connectivity for F and 4connectivity for F (Fig. 6.9).
6.7 Region
303
(a)
(c)
(b)
(d)
(e)
Fig. 6.8 a Binary image with 4 connected components (b); c Binary image with 4 components of which two adjacent, which remain 4 connected components if used connectivity 4-connected (d) , instead they become 3 connected components if used connectivity 8-connected (e)
(a)
(b)
(c)
Fig. 6.9 Example of foreground and background : a if we consider the 4-connectivity for foreground and background , we have 4 regions of 1 pixel and no hole; b if instead we use the 8-connectivity for both, we have a region and no hole; c to avoid ambiguity, we use the 8-connectivity for the foreground and the 4-connectivity for the background and in this case, we obtain a region
6.7.3 Object The region concept uses only the connectivity property. For the interpretation of the image, it is better to use some secondary properties associated with the regions. It is usual to call some regions of the image with objects. The procedure that processes a binary image to search for regions (that is, partitioning into regions) that correspond to isolated objects is called segmentation. An object is represented in the image by a
304 Fig. 6.10 a Binary image with the three components: b contour, internal and external part of an object
6 Properties of the Digital Image
(a)
(b) P
External Pixel
Internal Pixel
Contour
connected component, but typically may not be. The holes are pixels not belonging to the object. Example: If we consider this sheet as an image, the white sheet is the background, the objects are all the individual characters in black, the holes are the white areas surrounding the letters (in the letter “O” is the white area inside).
6.7.4 Contour An important characteristic of a region R is the contour (also called border or boundary) that assumes a remarkable importance in the analysis of the images. The bound ary is the set of pixels R internal to the region that have one or more neighborhoods outside R, that is, adjacent pixels R (complement of R). In other words, boundary pixels delimit all pixels in a region and have at least one pixel adjacent to the background F. What is defined coincides with the interior of R, i.e., R − R (Fig. 6.10) and R can be called an external boundary of R to distinguish it from any internal contours whose pixels are adjacent to the holes. A region T may include a region R, i.e., T surrounds the region R (or vice versa R is included in T ) if any 4-path from any pixel of R passing the boundary intersecting T . It is also observed that a pixel can be or does not belong to the contour (Fig. 6.10b) in relation to the connectivity used. In the example, if 4-connectivity was used to define the contour, the P pixel would not be a pixel belonging to the region but would be part of the contour. It becomes instead part of the region if 8-connectivity is used (See Fig. 6.10b). Therefore, when you search the boundary with the border following algorithms, 4-connectivity for the background and 8-connectivity for the contour must be used, as shown in the figure.
6.7.5 Edges While the boundary is a concept associated globally with a region, the edge is instead a local property of a pixel with its neighbors and is characterized as a vector defined by the module and the direction. The boundary is always a closed path, the edges may have discontinuity instead. The edges, normally identified at the boundaries between
6.7 Region
305
(a) (b) (c) A
B
i
Fig. 6.11 Examples in which Euler’s number is a E = 0, b E = −1, and c E = 2
homogeneous regions of an image, are fundamental for the human visual system as they constitute the basic information for the perception of the objects. Normally, they represent the strong geometric variations of the observed objects and are the pixels on which the utmost attention is focused for the 3D reconstruction of the objects themselves. There are several local operators that will be used to extract the edges starting from the gray level function of an image.
6.8 Topological Properties of the Image The topological properties concern the study of the shapes of geometric figures that are preserved through deformation, twisting, and stretching, regardless of size and absolute position. In this context, the topological properties are used to describe the regions present in the image. If the number of holes in a region is defined as the topological descriptor, this property is invariant to the stretching or rotation transformation. However, the number of holes may vary if the region is bent or altered by tearing. Stretching only affects distance and this does not affect topological properties. In general, topological properties do not vary when an image undergoes a transformation that modifies its geometric shape. Imagine, for example, the deformation that undergoes an image drawn on a balloon that deflated loses its spherical figure. The deformation sustained by the image does not alter the homogeneity of the objects represented by the image itself, nor alter the possible presence of holes in the regions. The geometric transformation undergone by the image introduces variation of distance but does not compromise topological properties. For example, if there are regions with different holes in the image, their number remains unchanged if the image undergoes only a geometric transformation. In other words, topological properties remain invariant to distortion and distension of objects that do not suffer lacerations. For example, the image of a circle can be altered in ellipse and vice versa, or a sphere deforms into ellipsoid and vice versa.
6.8.1 Euler Number The Euler number E is used as a feature of the object. It is defined as the difference between the number of connected components (regions without holes representing
306
6 Properties of the Digital Image
Fig. 6.12 Topological descriptions of an object: a object, b its convex hull, and c lakes and bays
(a)
(b)
(c)
objects) C and the number of holes B (connected components that do not belong to the background) present in the image: E =C −B
(6.12)
If the pixels of the objects (foreground) are indicated with F and those of the background with F (which also include the edge pixels of the image), the pixels associated with the holes form connected components not belonging to the background F. Figure 6.11 shows some examples of Euler numbers. The Euler number is invariant with respect to translation, rotation, and scaling.
6.8.2 Convex Hull The convex hull is the smallest region that contains an object, such that, having taken any two points of the region, the segment joining the two chosen points belongs to the region itself. Example: Let R be an object that resembles the letter “R” (as in Fig. 6.12) and suppose to wrap a thin elastic around R. The figure represented by the elastic is the convex hull. An object with nonregular shapes can be represented by a collection of its topological components. The region around the convex envelope that does not belong to the object is called the convexity deficit. This can be divided into two sub-regions. The first ones, called lakes, are completely surrounded by the object, while the second ones, called bays, are connected with the contour of the convex envelope. The convex hull, lakes, and bays are sometimes used for the description of the object.
6.8.3 Area, Perimeter and Compactness The area and the perimeter are two other topological parameters that characterize the connected components S1 , S2 , . . . , Sn present in the image. The area, for each Si component, is given by the number of pixels contained. The perimeter of a connected component is defined as the sum of the pixels that make up the component’s boundary. There are other definitions that will be introduced later. The area and perimeter are quantities dependent on the operations of geometric transformations performed on the image.
6.8 Topological Properties of the Image
307
Fig. 6.13 Histogram algorithm
Compactness is another topological parameter of a connected geometric figure. It expresses a measure of isoperimetric1 inequality of an associated connected component: p2 ≥ 4π (6.13) C= A where p and A are the perimeter and the area, respectively. A circular region is the figure with compactness value of C minimum (reaches 4π ). If a circle is inclined to the position of an observer, then it will take the form of an ellipse. In this case, the area decreases in a greater proportion than the perimeter that varies slightly. It follows that the value of compactness increases. A square is much more compact than a rectangle with the same perimeter.
6.9 Property Independent of Pixel Position An image I (i, j) obtained from the digitization process we know consists of a set of pixels whose value is actually a measure of the irradiance. This measurement is linked to the stability characteristics of the sensor which can have an intrinsic variability in modeling by estimating the probability density function HI (L) which indicates the probability of observing an irradiance value (brightness level or gray levels for images). In other words, discrete L values of each pixel are considered to be obtained from a random process and are assimilated to random values in the range 0 to LMax with probability HI (L). Normally, the probability function HI (L) can be calculated experimentally (see Fig. 6.13). Assuming that the irradiance variability is spatially invariant with respect to the (i, j) position of the pixels in the image, the probability function is approximated with the histogram. In this case, we talk about the homogeneous stochastic process.
1 In the classic problems of isoperimetric, it is usually asked to identify the figure that, with the same
perimeter and under certain constraints, is able to maximize the area.
308
6 Properties of the Digital Image
(a)
(b) x 10
(c)
4
(d)
2.5
1000
2
800
1.5
600
1
400
0.5
200 0
0 0
50
100
150
200
0
250
50
100
150
200
250
Fig. 6.14 Images a and c with the respective histograms b and d
6.9.1 Histogram The HI (L) histogram of an image I is a vector that provides the frequency for each level of gray L present in the image, included in the range Lmin ≤ L ≤ Lmax of the minimum and maximum gray levels (see Fig. 6.14). If the image is thought of as being produced by a stochastic process, the histogram represents a posteriori estimate of the probability distribution of the gray levels. The histogram is the only global information available for the image (the spatial correspondence of the pixels is lost). The histogram will be used in many image processing algorithms with the purpose of (a) (b) (c) (d) (e)
change gray levels, segment an image, extract objects from the background , reduce the dimensionality or compress, etc.
From the histogram, we estimate the statistical parameters of the first order such as the average m (or expected value): m =< I >=
L max
p(k)I (k)
(6.14)
k=0
with p(k) given by p(k) =
H (k) N ×M
(6.15)
where with p(k) we define the probability that the gray value K appears, while N ×M is the number of pixels in the image, remembering that by definition it results 0 ≤ p(k) ≤ 1 ∀k = 0, . . . , Lmax
and
L max k=0
where Lmax indicates the maximum value of gray levels.
p(k) = 1
(6.16)
6.9 Property Independent of Pixel Position
309
It is also possible to derive the variance σ 2 (moment of order 2) given by σ 2 = (I − < I >)2 ≥
L max
p(k) · [I (k) − m]2
(6.17)
k=0
and the moments of order n given by Mn =
L max
p(k) · [I (k) − m]n
(6.18)
k=0
The moment of order 3 represents a measure of asymmetry of the probability function around the value of the mean (skewness). On the other hand, if this distribution function is symmetrical with respect to the mean value, the moments of order 3 and the others of higher and odd order have zero value. There are several probability density functions with which it is possible to describe various stochastic physical processes but the most widespread is the Gaussian (or normal) distribution that will be used in the following paragraphs.
6.10 Correlation-Dependent Property Between Pixels 6.10.1 The Image as a Stochastic Process ! Random Field The first-order statistic that is considered with the histogram calculation does not contain any information about the relationship between the pixels. In fact, the computed histogram HI (L) for an image can be identical when computed for different images and does not contain information about the number of objects, their size, and position. In other words, we have assumed the image obtained from a deterministic process for the distribution of the irradiance spatially. In reality, we know that the formation of the image is characterized by spatially unpredictable fluctuations in the pixel value for which it is not possible to formulate a model of spatial variability. To consider also the spatial disposition of the gray levels, it is necessary to consider the statistic for each position (i, j) of the pixels and in this case, an image is considered as a statistical quantity also known as Random Field. Not being able to model the image I (i, j) as a deterministic 2D signal since acquiring more images in the same operating conditions we would have different 2D signals, we can record a finite number of images IK (i, j), k = 1, 2, . . . where the image Ik (i, j) is considered as a realization of the stochastic process. Essentially, the set of images IK (i, j), k = 1, 2, . . . would result in a stochastic process constituted by a set of random variables spatially indexed with (i, j) which is the position of the pixels in the k-th realization of the stochastic process.
310
6 Properties of the Digital Image
IQ(i,j)
i
i
)
i
m,
0
0
) q+n
0
(p+
(p,q) j
j
j
Fig. 6.15 Stochastic process diagram represented for the images. Each image Ik represents a realization of the process and the set of pixels Ik (p, q), k = 1, 2, . . . in the generic position (p, q) is considered a random variable
It follows that the Ik (i, j) matrix representing the k-th image would consist of N × M random variables. This implies calculating the probability function p(i, j) for each pixel (i, j) of the image through the acquisition of Q images in the same operating conditions. In Fig. 6.15 are schematized Q images that would result the realizations (ensemble) of the stochastic process and the set of random values I1 (i, j), I2 (i, j), . . . Ik (i, j) for a fixed pixel in the position (i, j). The aim is to highlight and characterize the stochastic process not only the individual spatial random variables but also the relationships and statistical dependencies between the random variables in the different spatial positions (i.e., analysis of the statistics of the intensity values between spatially close pixels). The averages for each pixel (i, j) would be calculated as follows: < I (i, j) >=
L max
p(k, i, j) · I (k)
(6.19)
k=0
whose estimate is obtained from the integrated average IT , which under the same conditions of acquisition would be < IT >=
Q 1 Ik Q
(6.20)
k=1
relative to Q observations of the same pixel with gray level I (i, j). The variance is estimated as follows: 1 = (Ik − < I >)2 . Q−1 Q
σI2
k=1
(6.21)
6.10 Correlation-Dependent Property Between Pixels
311
6.10.2 Correlation Measurement To relate pixel of different positions in the image, we use the correlation measure of gray levels given as the product of the gray levels in the two positions considered. This is achieved by the autocorrelation function: RII (i, j; k, l) =< Iij , Ikl >=
Lmax −1 Lmax −1 r=0
Ir Is p(r, s; i, j; k, l)
(6.22)
s=0
where the probability function p has six parameters and highlights the probability that it is simultaneously estimated for the pixel (i, j) with gray level r and for the pixel (k, l) with gray level s. The autocorrelation function has four dimensions. So this statistic in general is complicated to use it. The problem is simplified if we assume that the statistic is not dependent on the pixel position. In this case, the stochastic process returns to be homogeneous, and the value of the mean no longer depending on the position of each pixel, and is constant for the whole image: < I >= constant
(6.23)
and the autocorrelation function becomes shift Invariant, i.e., invariant from translation and therefore, independent of the position of the two pixels: RII (i + n, j + m; k + n, l + m) = RII (i, j; k, l) = RII (i − k, j − l; 0, 0) = RII (0, 0; k − i, l − j)
(6.24)
The last identities are obtained by placing (n, m) = −(k, l) and (n, m) = −(i, j).
(6.25)
In practice, the autocorrelation function R depends only on the distance of the two pixels and consequently the dimensionality of the function goes from 4 to 2 obtaining M −1 N −1 1 Iij · Ii+k,j+l RII (k, l) = NM
(6.26)
i=0 j=0
The autocorrelation thus obtained is interpretable as spatial mean. In fact, several stochastic processes are homogeneous, i.e., the average and autocorrelation function do not depend on the pixel position but only on the distance between them. To ensure that the spatial average and autocorrelation of the homogeneous stochastic process are equal to the spatial average and autocorrelation of the realizations, it must be assumed that the stochastic process is ergodic. This assumption of the ergodic process then allows to estimate the average and autocorrelation of the stochastic process that coincides with the average and spatial autocorrelation of each single realization of the stochastic process. The advantage of an ergodic process applied to
312
6 Properties of the Digital Image
images is immediate, being unable to acquire different ones in some real contexts. The assumptions made can be valid also for processes not completely homogeneous and ergodic as can be the processes associated with the degradation of the images in particular for the impulsive noise. The spectral characteristic of a stochastic process will be described in Chap. 9 to analyze in frequency the autocorrelation function introducing the definition of power spectrum based on the Fourier transform.
6.11 Image Quality During the various phases of acquisition, processing, and transmission, an image can be degraded. A measurement of image quality can be used to estimate the level of degradation, depending on the field of application. Therefore, two evaluation methods can be distinguished: subjective and objective. The former are widely used in television and photography based on improving the visual quality of images. Quantitative methods measure the image quality by comparing an image with the reference image (model image) [7,8]. Normally, as model images, we choose the really acquired ones, which are well calibrated, of which the radiometric and geometric characteristics are known. Alternatively, in some applications, we are forced to use only model images obtained synthetically. Given an image I (i, j) and indicating with g(i, j) the degraded image from any physical phenomenon or elaboration, a quantitative measure of the degradation or error level is defined by the correlation measure between the two images I and g given by RI g (k, l) =
M −1 N −1 1 Iij · gi+k,j+l NM
(6.27)
i=0 j=0
RI g is also called cross-correlation represents a measure of similarity between the original image I and the degraded image g. The cross-correlation is an operation similar to the convolution unless the sign. In cross-correlation only the displacement and the product between pixels are needed. Normally, the degradation process is also stochastic and is often modeled with the Gaussian probability density function. A more general method could be to minimize a functional if the characteristics of similarity between the two images I and g are known. For example, a functional F that minimizes the error based on the difference between the two images would result F(g − I )dxdy = minimum (6.28) Two distinct measures on the difference between images are obtained by estimating the Mean Square Error (MSE) and the PSNR (Peak Signal-to-Noise Ratio).
6.11 Image Quality
313
Let g(i, j) be the resulting image of a transformation process (compression, transmission, processing, etc.) and let I (i, j) be the original image free from defects, we can define the MSE error: eMSE =
M N 1 [g(i, j) − I (i, j)]2 . MN
(6.29)
i=1 j=1
The only problem with this measure is that it strongly depends on the scale variation of the image intensity. An average square error of 100 for an 8-bit image (with pixels ranging between 0 and 255) would result in a qualitatively poor image, while with the same error, a 10-bit image (pixels ranging between 0 and 1023) would have a better quality. The PSNR avoids this problem by scaling the MSE error on the image variability interval as follows: PSNR = −10 log10
eMSE S2
(6.30)
with S the maximum value of light intensity present in the image. The PSNR is measured in decibels (dB), it is not an ideal measure but is commonly used to estimate the quality of an image. It is usually useful to compare with restoration techniques for the same image that we will see in Chap. 4 Vol. II of the Restoration.
6.11.1 Image Noise The real images are normally degraded by random errors introduced by the image digitization process, during processing and transmission. This degradation is usually called noise. This phenomenon is customary to model it as a stochastic process. An ideal noise is called white noise which has a constant power spectrum, i.e., its intensity does not diminish with the increase in frequencies. A frequency analysis to model some types of image noise will be deepened ine Chap. 9.
6.11.2 Gaussian Noise It is another model of approximation of the noise. The Gaussian noise is a noise that has as probability density function a normal distribution. Let x be the random variable, i.e., the level of gray describing the noise, follows: p(x) =
−(x−μ)2 1 √ e 2σ 2 , σ 2π
(6.31)
where μ represents the mean, σ the standard deviation, and σ 2 the variance of the random variable x. In the image processing, the Gaussian model is a good approximation of noise for Vidicon and CCD type acquisition systems. Noise affects any level of gray.
314
6 Properties of the Digital Image
Fig. 6.16 8-neighborhood masks used for noise removal salt-and-pepper
6.11.3 Salt-and-Pepper Noise This type of noise is characterized by the presence of dark pixels in light regions or vice versa. It is often caused by the construction of a binary image obtained with a thresholding operation. Salt corresponds to light pixels in a dark region whereby the thresholding operation causes them to be clear, while pepper corresponds to dark pixels in a light region that are below a certain threshold and for which zero (black) is assigned. This noise may be due to classification errors, resulting from: lighting variations; from characteristics of the surface of the material; or noise caused by the analog–digital conversion of the frame grabber. It must be considered that in some cases, the presence of this effect, i.e, the presence of isolated pixels, is not to be considered a classification error, but are small details in contrast with the pixel neighborhood in which it appears, for example, the button of a shirt or a free zone of a forest, etc. which can represent the desired detail in the application being addressed. Figure 6.16 shows two different 8-neighbors masks for removing this effect.
6.11.4 Impulsive Noise It is caused by the random occurrence of very high gray level values (white), unlike the Gaussian model, which applies to all gray levels. This type of noise cannot be modeled with a stationary random process. In fact, the impulsive noise occurs with spatial local variations of strong intensity of the brightness levels of the pixels. It can be mitigated by local remodeling operations of pixel values as described in Chap. 9.
6.11.5 Noise Management The noise is intrinsically dependent on the signal2 itself and also occurs in the process of image formation. The observed image IE is produced by a nonlinear transformation
2 Voltage
spikes in the Analog-to-Digital Converter (ADC) board can be the cause of the noise in the circuit, where the changes in the voltage due to the physical properties of the circuit material offer the best situation for this kind of noise.
6.11 Image Quality
315
of the source image I that is degraded by multiplicative noise n, so modeled: IE = I + n · I = I (1 + n) ∼ =I ·n
(6.32)
where n models the known speckle noise introduced by the image digitization equipment. Another type of noise, which is independent of the same signal, occurs in the case of transmitted images or from the thermal error due to the CCD sensor. This type of noise is additive in the spatial domain modeled as follows: IE (i, j) = I (i, j) + n(i, j)
(6.33)
where IE is the observed image, I is the unknown original image, and n is assumed to be an independent and identically distributed (iid) random white Gaussian noise with zero mean and finite variance. In Chap. 4 Vol.II of the Restoration, we will return to deepen and model different types of noise to mitigate as much as possible the effects of degradation produced in the image [9,10].
6.12 Perceptual Information of the Image The human visual system uses some psycho-physical parameters for the perception of objects in the scene. Perception algorithms are developed by attempting to emulate some of the mechanisms of the human visual system. In human perception, objects are more localized and identified if they are well contrasted with respect to the background. In some contexts, it is known that the human system also fails.
6.12.1 Contrast It defines a local change of light intensity and is defined as the ratio between the average brightness of an object and the brightness of a background. The human visual system is sensitive to logarithmic brilliance and consequently for the same perception, higher brightness values require higher contrasts. The apparent brilliance depends greatly on the brilliance of the local background: conditional contrast.
6.12.2 Acuteness It expresses the ability to determine details in an image, and depends on the system’s optics and the distance between the object and the observer. The resolution of the image must be appropriate to the perceptive capacity of the vision system. The human visual system has a resolution of about 0.16 mm at a distance of about 250 mm under conditions of 500 lux (60 W lamp at 400 nm).
316
6 Properties of the Digital Image
References 1. S.E. Umbaugh, Digital Image Processing and Analysis: Human and computer Vision Applications with CVIPtools, 2nd edn. (CRC Press, 2010). ISBN 9-7814-3980-2052 2. A.K. Jain, Fundamentals of Digital Image Processing, 1st edn. (Prentice Hall, 1989). ISBN 0133361659 3. R.E. Woods Rafael, C. Gonzalez. Digital Image Processing, 2nd edn. (Prentice Hall, 2002). ISBN 0201180758 4. K.R. Castleman. Digital Image Processing, 1st edn. (Prentice Hall, 1996). ISBN 0-13-211467-4 5. Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity. Trans. Image Process. 13(4), 600–612 (2004) 6. H. Barrow, J. Tenenbaum, R. Bolles, H. Wolf, Parametric correspondence and chamfer matching: Two new techniques for image matching, in Proceedings of International Joint Conerence of Artificial Intelligence (1977), pp. 659–663 7. B. J¨ahne, Digital Image Processing, 5th edn. (Springer, 2001). ISBN 3-540-67754-2 8. W.K. Pratt, Digital Image Processing, 2 nd edn. (Wiley, 1991). ISBN 0-471-85766-1 9. W. Burger, M.J. Burge. Principles of Digital Image Processing: Core Algorithms, 1st edn. (Springer, 2009). ISBN 978-1-84800-194-7 10. M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis and Machine Vision, 3rd edn. (CL Engineering, 2007). ISBN 978-0495082521
7
Data Organization
7.1 Data in the Different Levels of Processing In developing a computational vision application two aspects are relevant: 1. The organization of data, which must be appropriate with respect to the functionality of the algorithm; 2. The appropriate choice of the algorithm, which must be simple and efficient. The choice of the algorithm and the organization of the data, is of fundamental importance to optimally solve (almost real-time), the complex functions of perception. Any computational model of the vision proposes to extract, from one or more 2D images of the scene, the information of movement, orientation and position, in order to produce a symbolic description of the observed objects. These objectives are achieved at various stages with different computational levels. Starting from the digitized images acquired, different representations of the data are required I Level: 2D images, which represent the original input information (intensity, gray level, RGB, etc.); they are generally given in matrix format. II Level: Primal Sketch, which is the data extracted from the first elementary processes of vision (for example, segments belonging to contours or edges, small regions belonging to parts of the surface of objects). III Level: Geometric representation, which maintains the knowledge of 2D and 3D shapes. This representation is very useful when modeling the influence of lighting conditions and the movement of real objects. IV Level: Relational models, which provide hypotheses for the interpretation of the scene (Image understanding). These models allow data to be processed more efficiently at a high level of abstraction.
© Springer Nature Switzerland AG 2020 A. Distante and C. Distante, Handbook of Image Processing and Computer Vision, https://doi.org/10.1007/978-3-030-38148-6_7
317
318
7 Data Organization
In this context, it is very useful to know a priori general information about the scene to interpret. In this regard, knowledge can be encoded using complex data structures (frames, semantics) typical of artificial intelligence techniques (AI). For example, if you have to interpret a scene relative to the interiors of a building, it is useful to codify, from the existence of the maps of the same building, all the knowledge a priori, to simplify later the recognition phase.
7.2 Data Structures 7.2.1 Matrix It is the most basic level of data structure, necessary to maintain all the low-level information of the digitized images and the intermediate ones elaborated to improve radiometric and geometric qualities. The organization of the matrix data structure implicitly contains spatial relationships (closeness relationship) between the various connected regions that exist in the image. In the matrix organization we have the following types of images: (a) Binary image, contains pixels with only 0 or 1 value. (b) Grayscale image, contains the value of each pixel representing only an amount of light, that is, intensity information. (c) Color image, consisting of RGB or CMYK components for color management with primary or secondary components, respectively. (d) Multispectral image, consisting of multiple images of the same size, each containing (see Fig. 7.1a) the spectral information (visible, infrared, ultraviolet, etc.). (e) Data structures of hierarchical images, defined with matrices representing different resolutions of the same image (Fig. 7.1b).
Fig. 7.1 Matrix organization. a Multi-component image; b image with different spatial resolution with pyramidal representation: the base contains the image at maximum resolution while the vertex at the minimum
(a)
(b)
Low resolution
0 1 2
n n-1
1
2
1x1 2x2
L-1
3
L
N/2 x N/2 NxN
High resolution
7.2 Data Structures
319
7.2.2 Co-Occurrence Matrix In the previous chapter, we defined the histogram H I (L), L = 0, . . . , N − 1 of an image I with N gray levels. The histogram H can be considered as the first level of 1D global image data organization by providing the frequency of each level of gray in the image. However, the histogram does not provide spatial information of gray levels between pixels in the image. One way to incorporate spatial information between pixels along with the distribution of their gray levels is achieved with a 2D data structure as is the case for the co-occurrence matrix. It is another example of global image information often referred to as GLCM (Gray Level Co-occurrence Matrix). Consider a joint probability estimate associated with a pair of pixels I (i, j) and I (k, l), of an image I , with levels of gray from 0 to L max , linked by any geometric relationship, for example, the distance relation expressed in Cartesian coordinates (x, y) or polar (r, θ ). The co-occurrence matrix represents the 2D histogram P(L 1 , L 2 ; i, j, k, l) considered as an estimate of the distributions of the joint probability that a pair of pixels have intensities L 1 and L 2 , respectively, that is, P(L 1 , L 2 ) = Prob. joint {I (i, j) = L 1 and I (k, l) = L 2 }
(7.1)
Each element of the 2D histogram P(L 1 , L 2 ), representing the joint probability, can be expressed in a normalized form in the interval [0, 1], as follows: F(L 1 , L 2 ) NR
(7.2)
p(L 1 , L 2 ) = 1
(7.3)
p R (L 1 , L 2 ) = with sum of all the probabilities equal to 1 L max L max L 1 =0 L 2 =0
where F(L 1 , L 2 ) indicates the frequency (co-occurrence) of the presence in the image of the pair of gray levels I (i, j) = L 1 and I (k, l) = L 2 linked by a geometric relation R of distance, for example, (k = i + x, l = j + y), L max indicates the maximum number of gray levels, and N R indicates the total number of cases in which the geometric relationship R in the image is satisfied. For a given geometric relation R between pairs of pixels of intensity L 1 and L 2 , the co-occurrence matrix PR (L 1 , L 2 ) has a square dimension N L max × N L max corresponding to the maximum number of gray levels present in the image.
7.2.2.1 Algorithm for the Calculation of the Co-Occurrence Matrix Let R be a generic relation (for example geometric, of proximity, etc.) between intensity pairs L 1 , and L 2 , follows:
320
(a)
7 Data Organization
(b)
(c)
Fig. 7.2 Calculation of the co-occurrence matrix. a Image of size 5 × 5 with 6 levels of gray, b Co-occurrence matrix of dimensions 6×6 calculated for image (a) with spatial relation (x, y) = (1, 1), and in c co-occurrence matrix with the same size and for the same image but with spatial relationship (x, y) = (1, 0)
1. FR (L 1 , L 2 ) = 0 ∀L 1 , L 2 ∈ (0, L max ), where L max is the maximum intensity value of image I . 2. ∀ pixel (i, j) in the image, calculates the pixel (k, l) that satisfies the relation R with the pixel (i, j) and updates the FR co-occurrence matrix FR (I (i, j), I (k, l)) = FR (I (i, j), I (k, l)) + 1 Example Consider an image of 5×5 pixels with intensity values from 0 to 5. In the image there are six levels of gray, therefore, the co-occurrence matrix will be 6×6. The geometric relation R(1, 1) (Fig. 7.2b) that links all the possible gray level pairs (L 1 , L 2 ) is of the type (x, y) = (1, 1), i.e., the relation of the pixel pair will be consider the pixel to the right and the adjacent underlying (pixels along the diagonal from top to bottom). Let’s examine the element of the co-occurrence matrix PR (1, 2) = 3 this is justified because there are three pairs of pixels with intensity (L 1 , L 2 ) = (1, 2) that satisfy the geometric relation R(1, 1) and correspond to the pairs of pixels (highlighted the pair of pixels in Fig. 7.2a and the element F(1, 2) = 3 in Fig. 7.2b) (i, j) (k, l) ⇔ (i + 1, j + 1) (1,4) (2,5) (3,3) (4,4) (4,1) (5,2)
7.2 Data Structures
321
The geometric relation R(1, 0) (Fig. 7.2c) that binds all the possible gray level pairs (L 1 , L 2 ) is of the type (x, y) = (1, 0), i.e., the relation of the pixel pair will be: consider the pixel to the right, the adjacent horizontal. From the 2D histograms F(L 1 , L 2 ) shown in Fig. 7.2b, c, by applying Eq. (7.2), we can calculate the co-occurrence matrices p R thus obtaining the respective joint probabilities, i.e., how many times the pair of levels (L 1 , L 2 ) appears in the image satisfying the spatial relationship R(L 1 , L 2 ). For the spatial relation R(1, 1) it results: N R(1,1) =
5
F(i, j) = 17
i, j=1
and considering the pair of gray levels with highest frequency, the joint probability results: 3 F(1, 2) = 0.18 = p(1, 2) = N R(1,1) 17 Analogously we would proceed to calculate the co-occurrence matrix for the geometric relation R(1, 0) using the 2D histogram of Fig. 7.2c. Figure 7.3 shows other examples of calculating the co-occurrence matrix for binary images. The linear geometric structures (vertical and oblique) that characterize the co-occurrence matrices are evident in the two images. In particular, the example of Fig. 7.3d concerns the image of the chessboard whose geometric textures are well highlighted in co-occurrence matrices. In fact, the only frequencies F(1, 1) and F(0, 0) in the matrix with geometric relation R(1, 1) can be observed while the frequencies F(0, 1) and F(1, 0) are present in the co-occurrence matrix for the R(1, 0) relation. The matrix p R (L 1 , L 2 ) is not symmetric because the number of pairs of pixels with gray level (L 1 , L 2 ) does not necessarily match the number of intensity pairs (L 2 , L 1 ). The elements of the main diagonal p R (L i , L i ) represent the area of the regions in the image with gray levels corresponding to L i . Consequently, they correspond to the histogram of the first order. The co-occurrence matrix contains the global spatial distribution information of the gray levels of an image. The calculation of p R requires considerable intensive calculation because to accumulate, the 2D histogram must be processed every pixel (i, j) of the image and applied the relation R for the pixel to be correlated. To reduce the calculation time, the geometric relationship R can be restricted for the pair of pixels to be considered. This is possible in two ways: (a) Reduce the gray levels of the image; (b) Reduce the influence domain of the relation R.
322
7 Data Organization Image 6x6
(a)
(b)
(c)
1
0
1
0
1
0
1
0
0
1
0
1
1
0
0
0
1
0
0 13 0
1
0
1
0
0
1
1
1
0
0
1
0
0
1
0
0
0
1
0
(d)
Image 6x6 0
1
0
1
0
0
1
0
1
0
1
1
0 1
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
1
0
1
0 5
R(1,0)
1
L
6
0
1
8
9
1 13
0
0
Δx=1 Δy=0
Δx=Δy=1
(e)
1
0
L
R(1,1)
(f)
R(1,1) L Δx=Δy=1
0
1
L
R(1,0)
0
1
0
15
0 12 0
0
1
1 15
0 13
0
Δx=1 Δy=0
Fig. 7.3 a Binary image of size 6 × 6, b Co-occurrence matrix FR(1,1) (L 1 , L 2 ) of 2 × 2 dimensions calculated for image (a) with spatial relation (x, y) = (1, 1) between pairs of pixels, c Matrix FR(1,1) (L 1 , L 2 ) with spatial relation (x, y) = (1, 0); d Chessboard binary image, e and f are the respective matrices FR(1,1) (L 1 , L 2 ) and FR(1,0) (L 1 , L 2 )
The first method involves a loss of accuracy on the intensity of the geometric structures present in the image (texture), while with the second, which reduces the domain of the relationship R, it results in errors in the presence of extended geometric structures. A good compromise is achieved by using images with 16 gray levels and square windows of about 30–50 pixels. The co-occurrence matrix is applied for the description of the microstructures (texture) present in the images. If the pixel pairs of an image are highly correlated, the elements of p R (L 1 , L 2 ) of the main diagonal contain most of the information. Once the co-occurrence matrix has been calculated, it is possible to derive different statistical information with the following formulas (defined by Haralick [6]) that characterize the texture present in an image. Entr opy A characteristic that measures the randomness of gray level distribution is entropy; defined as E =−
L max L max
p(L r , L s ) · log2 p(L r , L s )
(7.4)
r =0 s=0
The highest value of entropy is when all elements of p have equal value. This corresponds when the image has no gray level pairs with a particular preference
7.2 Data Structures
323
over the specified relation R. Entropy measures the disorder of an image. Contrastor I ner tia C=
L max L max
(L r − L s )2 · p(L r , L s )
(7.5)
r =0 s=0
This statistical measure highlights the gray level differences present locally in the image. Low values of the Contrast imply locally uniform values of the gray levels (information located on the main diagonal of p R ), on the contrary, they would have high values of the contrast. Energy Energy =
L max L max
p 2 (L r , L s )
(7.6)
r =0 s=0
This statistical measure expresses the level of homogeneity (even locally) of the image’s texture if the values of p are uniformly distributed. H omogeneit y H=
L max L max r =0 s=0
p(L r , L s ) 1 + |L r − L s |
(7.7)
The statistical homogeneity measure expresses an inverse characteristic to the contrast. Homogeneity measures the uniformity of the pixel values by assuming high values for small variations in intensity. In fact, with the absolute differences of the gray values placed in the denominator, the values of the p R matrix closest to the main diagonal where the uniform gray levels are located are more affected. It follows that for high homogeneity values correspond low contrast values. Absolutevalue V =
L max L max
|L r − L s | · p(L r , L s )
(7.8)
r =0 s=0
For different applications, the co-occurrence matrix is calculated for the same image by varying the type of relationship R between the gray level pairs. The p matrix, which maximizes a given statistical measure, is chosen for image analysis, for the identification of microstructures. A good use of the co-occurrence matrix happens for the classification of the territory using multispectral images coming from satellites. In Chap. 3 vol. III of the Texture the statistical measures above will be used to characterize the texture present in the monochromatic, color, and multispectral images.
324
7 Data Organization
Fig. 7.4 An example of chain code with 8-neighbors; the reference pixel is marked by the arrow 00007766555 5556600000064444444222 1111112234445652211 3
2
4 5
1 0
6
7
7.3 Contour Encoding (Chain Code) A compact way to represent data about the connected components of a monochrome image is the Chain Code, which is an appropriate encoding for the contours. The Chain Code (proposed by Freeman [2]) is an approach that describes a list of straight traits of a contour. The code used is a sequence of digits that specify the direction of each boundary trait in the list. The possible directions of a boundary trait are normally eight and encoded with digits 0 through 7. The contour is completely defined by the Cartesian coordinates (x, y) of the reference contour trait (it localizes in the image plane the contour) and by the sequence of codes, corresponding to the unitary traits of the contour, indicating a predefined direction. In Fig. 7.4, an example of an 8-neighbors defined contour coding is shown. Codes with 4-neighbors can also be defined. The data structure for this type of coding offers the following advantages: (a) Significant data reduction: To memorize a contour present in an image, only 3 bits are needed for encoding from 0 to 7 of the contour traits, instead of keeping the complete bitmap of the image. Compression factors greater than 10 can be achieved. (b) Quick search: They can be searched efficiently if sections of a contour have a particular direction.
7.4 Run-Length Encoding Another method of compact organization of monochromatic image data (including color in the case of thematic images) is Run-Length Code (RLC or RLE) encoding. In the case of binary images, the encoding occurs through numbers that indicate
7.4 Run-Length Encoding
325
(a)
(b)
1
7
12
17
20
1
7
12
17
20
1 1 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 0 1 1 1
1 1 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 0 1 1 1
0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
Row 1: Row 2: Row 3:
(1,3)(7,2)(12,4)(17,2)(20,3) (5,13)(19,4) (1,3)(17,6)
String of 1 and 0, Row 1: Row 2: Row 3:
3,3,2,3,4,1,2,1,3 0,4,13,1,4 3,13,6
Fig. 7.5 Example of coding R LC 1 and R LC 2
the length of the strings (r un), that is, the consecutive pixels of identical value 1 (1-string) present in each row of the image and the position in the row interested. We present 3 different approaches used to organize the encoded image data R LC. (a) RLC_1 For each row of the image a list of strings of the following type is produced S0 (P0 , L S0 ), S2 (P2 , L S2 ), . . . , Sn−1 (Pn−1 , L Sn−1 ) where Si indicates the ith 1-string, (Pi , L Si ) indicate the position of the string in the row and its length, respectively. An example is shown in Fig. 7.5a. (b) RLC_2 For each line of the image is produced a list of 1-string and 0-string of the type L 1 (V ), L 2 (V ), . . . L n (V ) where L i (V ) represents the length of the ith V-string that can be formed by the zero sequence the 0-string or one for the 1-string. An example is shown in Fig. 7.5b. (c) RLC_3 For the whole image there is only one list consisting of sub-lists which organizes the data as follows: List = (Sub-list 1 , Sub-list 2 , · · · , Sub-list n ) Sub-list = (Single-string) Single-string = r ow-number , 1-string-star t 1 , 1-string-end1 , . . . . . . , 1-string-star t n , 1-string-end n where start/end of 1-string indicates the start and end position in the row of each 1-string present (Fig. 7.6). With encoding 1 and 3 only lines of images that have at least one 1-string are encoded. The Run-Length coding is useful for representing all the regions that belong to the visible objects in the image. Therefore, these regions are coded in one of the three approaches indicated. With this coding it will be possible to implement simple algorithms for the intersection and union of regions present in the image. The area of all codified regions is obtained by summing the length of all the 1string described in the list. If we indicate with L i,k the length of the kth 1-string
326
7 Data Organization
Fig. 7.6 Example of coding R LC3 ; the binary image is represented by the following strings: (11377) (22246) (41356) (63455)
0 1
2 3
4 5 6 7
1
1 1 1
1
0 1 2 3 4 5 6 7
1
1
(1,13,77)
1 1
(2,22,46)
1 1
1
(4,13,56)
1
1
1
1
(6,34,55)
belonging to the ith line of the image, the area of all the regions of the image is as follows: n−1 m i −1 A1 = L i,k i=0 k=0 (m i −1)
A2 =
n−1 2
L i,2k+1
i=0 k=0
where m i is the number of the 1-strings present in the i-th row of the image. A1 represents the area for the R LC1 and A2 encoding for the R LC2 encoding. With this type of coding it is possible to obtain some useful information such as (a) Horizontal projections (b) Vertical projections. The first one is obtained immediately as indicated in Fig. 7.7. The vertical projection can be calculated without the need to reconstruct the image but by processing the strings of each line appropriately. For the extraction of more general information of the regions, it is necessary to decode the image with the consequent need for further calculation time. It should be noted that in this context the RLC coding is used more as an organization of data than for data compression. It is known that there are several versions of the RLC encoding for lossless data compression. In fact, it is used for the transmission of data by fax and a version based on the R LC 2 coding is integrated into the JPEG standard for the coding of the cosine transform coefficients (described in Sect. 2.6 Vol.II) and also in the encoding of PDF (Portable Document Format) formats.
7.4.1 Run-Length Code for Grayscale and Color Images In this case, the image is encoded using strings of the type L-string for each row which is made up of groups of consecutive pixels of variable length L value. Therefore, the
7.4 Run-Length Encoding
327
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8
(0,0) (2,1)(6,3) (3,3)(9,1) (2,1)(5,1)(7,3) (5,1)(8,1) (6,2) (5,1)(9,1) (4,1)(7,2) (4,2)(7,1)(10,2) (5,8) (7,1)(9,1) (7,2) Fig. 7.7 Horizontal projections calculated with the Run-Length encoding R LC 1 Fig. 7.8 Example of R LC2 encoding for color images where the string of each line encodes the runs with a number representing the length of the run followed by the color value C = {W, B, R, G} to indicate, respectively, the colors of the white, black, red, and green
12W 12W 4W,4B,4W 4W,1B,3R,1B,3W 3W,1B,4R,1B,3W 3W,1B,1R,2G,2R,1B,2W 2W,1B,3R,1G,2R,1B,2W 2W,1B,5R,1B,3W 2W,1B,3R,2B,4W 3W,3B,6W 12W 12W
lists generated with the previous approaches are modified by adding in the description of the strings the gray level value L to represent a sequence of pixels with an L value of intensity. The R LC coding can be effectively applied to color and gray images of the thematic maps type with large regions of uniform value. In this case, for each line of the image a list of C-strings is produced, of the following type: L 1 (V )C1 , L 2 (V )C2 , . . . L n (V )Cn where Ci indicates the gray or color value of the length string L i (V ). An example is shown in Fig. 7.8. Analogous to the representation of images with matrices, also for the string coding lists the data will be organized in memory using the procedures of dynamic allocation of the memory considering the considerable mass of data required.
328
7 Data Organization
7.5 Topological Organization of Data-Graph The topological structure of the data describes the image as a collection of objects and their relationships. The relationships between objects can be represented by graphs (or tree structures). Graph : A graph G = (V, E) is an algebraic structure that consists of a set of nodes (or vertices) V = {V1 , V2 , . . . , Vn } and a set of edges (or arcs) E = {e1 , e2 , . . . , en }. Ar c : Each ar c is defined by the unordered pair (Vi , V j ) of nodes of V . Given an ar c e = (Vi , V j ) defined by the nodes Vi and V j it is said that the ar c e associates (or joins) Vi and V j which become its extremes. The degr ee of the node is equal to the number of attached node arcs. A graph, which associates the appropriate information of the objects with its nodes and arcs, provides a hierarchical organization of the data to effectively describe the relationships between the objects.
7.5.1 Region Adjacency Graph (RAG) R AG represents a relational organization of data used for image segmentation. In fact, it is a graph that describes the adjacency relations of the regions of an image by structuring the image data by matching the nodes with the regions and the adjacency relations between regions are represented by the ar cs. The data structure associated with each node also contains the most general characteristics of a region, for example, topological and geometric information. An image composed of a set of regions representing a particular property of the scene (the segmented image is a partition of regions) is represented entirely by a R AG. In Fig. 7.9 regions are indicated with numbers from 1 to 8.
7.5.2 Features of RAG Some regions of the image (e.g., regions 2 and 7 of the figure) may also include other regions recursively. In the R AG representation these regions can be easily separated
Fig. 7.9 Segmented image and related graph of adjacent regions
1
1 3 2
4 7
8
6
2 5
3 7
8 6
4 5
7.5 Topological Organization of Data-Graph
329
by breaking up the graph. Degree 1 of nodes represents simple holes. R AG arcs can include information about relationships between adjacent regions. The R AG can be used to compare other previously stored R AGs in the context of recognizing objects from similar scenes.
7.5.3 Algorithm to Build RAG The construction of the R AG starts from the segmented image where each pixel represents the value of the regions present. The essential steps of the R AG’s construction algorithm are: 1. Analyze each pixel (i, j) of the image matrix I and perform the next steps. 2. R1 = I (i, j) 3. Consider the neighbor pixels (k, l) of the pixel (i, j). For each neighbor, take the next step. 4. Let R2 = I (k, l). If R1 = R2 add an ar c between the nodes R1 and R2 in the R AG.
7.5.4 Relational Organization At the base of the image interpretation algorithms, there is the relational organization of data and their symbolic representation, which play a fundamental role. A graph is still an adequate organization to describe even complex relationships between objects in the scene. Compared to the topological organization, the graph can contain semantic relations between objects or parts of it. Example For the scene described by the image of Fig. 7.9 it is useful to construct a relational table that includes the information that characterizes the regions in terms of spatial position and attributes such as color, adjacency, and inclusion. In analogy to what has been developed in the semantic analysis of the language, also for the description of the scene from an image, relational functions between regions are used once the associated objects have been identified. For the example of Fig. 7.10 relationships are: is a, is with and to observe.
7.6 Hierarchical Structure of Data Many image processing algorithms require considerable computation time both for the complexity of the operations to be performed and for the large amount of data to be processed. It follows to take into account, where possible, the following aspects in the implementation phase of an image processing system.
330 Fig. 7.10 Example of semantic relationships among the nodes of a graph
7 Data Organization
is a
doctor
John observes
is with
Object 1
M.Grazia is a
Computer
Reactive system: For many applications, human-machine interaction is required to take place in a reasonable time or better, the system responds to requests in near real time. Multiprocessor: A solution to these problems involves the use of systems with intensive calculation, generally based on architectures with multiple processors. Sequential to parallel: This solution, even if economically accessible, is not always practicable since not all image processing algorithms are available for a completely parallel calculation model. Hierarchical structure of data: One possibility to reduce the calculation time is to organize the data with a hierarchical structure that allows to use them in a dynamically variable quantity from a minimum to a maximum in relation to the achievement of satisfactory results. Resolution reduction: It involves a reduction of data, for example, in terms of image size and/or compression, with the consequent loss of information to be contained within acceptable limits. This is possible by representing images with different spatial resolutions. The hierarchical representation strategy allows us to initially process a low-resolution image, once the essential features of the images are extracted, processing can continue on data at a higher resolution but only for some areas of interest in the image. This strategy is used in different contexts of image processing (filtering, extraction of characteristic points-features, segmentation, textures, object recognition, etc.) and, in particular, in the transmission of images starting to transmit the image with coarse resolution and then gradually sending images with increasing resolution until the user considers it appropriate. The most common hierarchical organizations are: pyramids and quadtrees.
7.6 Hierarchical Structure of Data
331
7.6.1 Pyramids The pyramidal structure of an image with an N × N dimension is represented by the image itself and by k reduced images of the original version [1]. It is convenient to choose a value of N as a power of 2 (2 L = N ). This leads to building the reduced images of the pyramid with the following decreasing dimensions up to 1 pixel: Dimensions: N × N ,
N 2
Level:
L −1
L
×
N N 2 4
×
N 4,
··· 1 × 1
L − 2 ···
0
In this representation, the base of the pyramid coincides with the image of origin which is of level L; the vertex of the pyramid represents the most coarse image of 1×1 pixels; while any reduced level image l is obtained by combining different pixels of the image of level l + 1. The level l varies from 0 to L. Moving from one level to the next lower level means reducing the resolution by a factor of 2 which is equivalent to reducing the number of pixels by a factor of 4 and consequently reducing the calculation time by a factor of 4. In Fig. 7.1b, the pyramidal structure is obtained by calculating each pixel P (l) of the lth level with the average value of the corresponding gray level with the 4 pixels of the lower level (l + 1)th: 4 1 (l+1) Pj P (l) = 4 j=1
For some image processing algorithms, the average may be inappropriate and are used instead of the average calculated on 2 × 2 windows, local pixel operations on larger image windows or more complex processing such as the Gaussian operator. Normally the pyramid is truncated before the apex level of size 20 .
7.6.1.1 M-Pyramid (Matrix-Pyramid) For a pyramidal structure, type M-pyramid, a pixel of level k is obtained by reducing the resolution in half compared to the corresponding pixel of level k + 1, i.e., by sampling every second pixel of each second row of the level k + 1. The total number of pixels required to represent a M-pyramid structure is calculated as follows: 1 1 4 + · · · < N2 ∼ N2 1 + + = 1.33N 2 4 16 3 where N is the side of the square image of origin (image with higher resolution).
332
(a)
7 Data Organization
(b)
(c)
Fig. 7.11 Quadtree representation of a a region in the binary image; b decomposition into quadtree quadrants and c relative tree structure
7.6.2 Quadtree The hierarchical structure of quadtree data is based on the principle of recursive decomposition analogous to the divide and conquer methodology.1 The quadtree structure is obtained by recursively decomposing the image into four sub-square images of equal size. If the pixels of the sub-images are white or black (homogeneous areas), the recursive decomposition process for these sub-images stops. We are faced with leaf nodes of the tree structure that for a quadtree provides three types of nodes: white, black, mixed (also called gray). If these sub-images contain black and white pixels, i.e., there are mixed pixels (that means, nonhomogeneous areas), the recursive decomposition process continues by creating 4 new sub-images (see Fig. 7.11). At the end of the recursive process of subdividing the image into square subimages, a tree structure is obtained whose root coincides with the initial image. Each node of this structure has four children, so it is called QuadTree (the four quadrants of the hierarchical level with finer resolution), with the exception of the leaf nodes that represent quadrants with all homogeneous pixels with respect to a prefixed criterion. The quadtree structure has been widely used for the analysis, coding, and extraction of characteristics for binary images (Fig. 7.12).
7.6.2.1 Advantages of Quadtree The quadtree structure is not exclusively based on the principle of spatial recursive decomposition but also on the basis of the criteria used by the decomposition algorithm and how many times the decomposition process can be triggered which may vary with the type of data. Allows the implementation of simple algorithms for the
1 Divide
and conquer indicates, in data processing, an approach for the resolution of computational problems. It recursively divides a problem into two or more sub-problems until the latter becomes a simple resolution. Then we combine the solutions in order to get the solution for the given problem. This approach allows to address in a simple way also very difficult problems, where the complexity is irreducible or nonlinear. Moreover, the nature of the divide allows parallelizing the computation and increasing its efficiency on distributed or multiprocessor systems. This type of approach is typically called top-down.
7.6 Hierarchical Structure of Data
(a)
333
(b)
Fig. 7.12 Example of Quadtree decomposition (b) for a complex binary image (a)
calculation of areas, the sum of images and the estimate of statistical parameters (moments).
7.6.2.2 Disadvantages of the Quadtree As for a pyramidal hierarchical representation, there is the disadvantage of being dependent on the position, orientation, and relative size of the objects. Images belonging to the same objects but obtained from slightly different observations, present different representations of the quadtree. These problems can be mitigated by normalizing the size of the regions and in this case, quadtree is created not for the whole image but only for the regions of interest.
7.6.2.3 Differences Between Quadtree and Pyramid In the pyramidal data structure, the data have a spatial multiresolution representation with matrices that are less and less decreasing in a quarter of the size compared to the previous matrix. On the contrary, the quadtree structure has a variable resolution in the organization of data. This difference characterizes the retrieval mode, in particular, in the spatial relationship on the data. With the quadtree it is possible to query the data with spatial relational bases in the sense that the search can be based by navigating the tree structure (on descending and adjacent nodes) to explore the presence of a given object and other similar objects nearby. In the quadtree, a search based on feature becomes complex, as it is not indexed spatially. Instead, this is simpler for a pyramid by analyzing the data by navigating in the matrix structure from coarse to finer resolution, arriving if necessary, at the final level of the entire image to verify the existence of the given feature. In some applications it would be helpful, when possible, to attempt to consider both of the two hierarchical structures.
334
7 Data Organization
7.6.3 T-Pyramid In several applications, it is useful to have simultaneously a data organization with different spatial resolution rather than a single image of the matrix pyramid. This type of data organization is carried out using a pyramidal tree structure called T-Pyramid which decomposes the image at each level into four quadrants as in the quadtree structure. Unlike the quadtree (unbalanced structure) this pyramidal hierarchical structure is a balanced structure (in the sense that the corresponding tree always decomposes the image independently of the content) for which it is regular and symmetrical (Fig. 7.13). Another aspect concerns the interpretation of the values of individual nodes. The reduced versions of the images can be obtained by adopting different strategies to arrive at the terminal node. For example, in the case of nonbinary image shown in Fig. 7.14a is observed in Fig. 7.14b the truncated pyramidal structure. The value of a nonterminal node in the truncated pyramid indicates whether a given feature is present in the subtree of the considered nonterminal node. The memory required for a T-pyramid is similar to the M-pyramid, considering that the branches of the tree structure do not require pointers for the child nodes given the regular structure to represent the various levels.
0
1
Level 0
3
2
4
Level 1
3.1 3.2 3.3 3.4 4.1 4.2 4.3 4.4
1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4
Level 2
Fig. 7.13 T-Pyramid
(b)
(a)
A,B,C,D,E,F A
F
B
C
D
E
F
B C
B
F
A
F E
3
2
4
1 4.3
3.2
D
A
C
F
F
2.1 2.2 2.3 2.4 Node values 2: A,F,B,C Node values 3: F,B,A,C,E Node values 3.2: B,F,A Node values 4: D,E,A,C,F Node values 4.3: D,A,C,F
3.1
3.3 3.4 4.1 4.2
3.2.1 3.2.2 3.2.3 3.2.4
4.4
4.3.1 4.3.2 4.3.3 4.3.4
Fig. 7.14 Hierarchical representation of nonbinary image (a), and its truncated T-Pyramid tree structure (b)
7.6 Hierarchical Structure of Data
335
7.6.4 Gaussian and Laplacian Pyramid The so-called Gaussian and Laplacian pyramids are not actually a new hierarchical structure of image data. Instead, they are a multi-scale pyramidal data organization that optimally adapts to algorithms for extraction and tracking of multi-scale image features. A Gaussian pyramid is obtained starting from the original image (level 0 of the pyramid) to which the following Gaussian filter G(i, j, σ ) =
1 −(i 2 + j 2 )/2σ 2 e 2π σ 2
and the image obtained is sub-sampled by a factor of 2: this is the level 1 of the pyramid. The filtering operation is repeated (with a value of σ always higher) on level 1 to obtain level 2. Proceed until the desired level number k is reached. The construction of the Gausssian pyramid P(i, j, σ ) or a variable-scale hierarchical structure is obtained given an image I (i, j) from the convolution of the image with a Gaussian filter G(i, j, σ ) given from P(i, j, σ ) = G(i, j, σ ) ∗ I (i, j) where the variance σ increases more and more at each successive level to create filtered images (low pass) of the original image (see Fig. 7.15a). A Laplacian pyramid is derived from the Gaussian pyramid (Fig. 7.15c). Recall that in the Gaussian pyramid thus realized, each level, filtered version (low pass) of the previous level, contains redundant information representing the lower spatial frequencies. Thus, every level of the Gaussian pyramid can be seen as a coarser prediction of the previous level of greater detail. The Laplacian pyramid is intended to record, at different scales, the prediction error through a bandpass filter. The construction of the Laplacian pyramid L is made starting from the coarsest level, i.e., L n = Pn which coincides with the last level of the Gaussian pyramid P. The kth level is obtained as follows: L k = Pk − Expansion(Pk+1 ) with k = n − 1, n − 2, . . . , 1. The expansion operator generates an oversampling of a factor of 2 (doubling the rows and columns) of the Gaussian image Pk+1 previously sub-sampled by the same factor. The information lost in the subsampling is retrieved with a bilinear interpolation after adding rows and columns of zeros with the expansion (see Fig. 7.15b). It is observed that a kth level of the Laplacian pyramid obtained as the difference of Gaussian images L K = Pk (i, j, σk ) − Pk+1 (i, j, σk+1 ) with σk < σk+1
336
(b)
Gaussian pyramid
Expan
Ex
pa
(c)
sion(P
ns
ion
(P
)
Laplacian pyramid
Low resolution
Low resolution
(a)
7 Data Organization
Ex
P
pa ns io P n(
High resolution
High frequencies (image details)
)
Fig. 7.15 a Gaussian pyramid; b expansion of the previous levels of the Gaussian pyramid filtered with greater value than the variance, and c pyramid Laplaciana obtained from the difference of Gaussian belonging to adjacent levels with value σk−mo < σ(k+1) th
actually corresponds to the Gaussian Difference operator (DoG) which approximates, the differential operator (of the second order) of the Laplacian of Gaussian (LoG), which as it is known, is a bandpass filter, which highlights the details (the high frequencies) present in the image. The LoG and DoG operator are described, respectively, in Sects. 1.13 and 1.14 of Vol. II. From the Laplacian pyramid, you can go back to recover the original image. This is accomplished by an inverse procedure (called synthesis) to that of the Laplacian pyramid L through the recovery of the Gaussian pyramid P. We are starting from the coarsest level n by assigning Pn = L n , that is, the Laplacian image is matched to the corresponding level of the Gaussian image. Then we calculate the Gaussian images of the levels n − 1, n − 2, . . . , 1, 0 as follows: Pk = L k + Expansion(Pk+1 ) The Gaussian and Laplacian pyramids are also used in the Fourier domain. The Gaussian and Laplacian pyramidal organization is effective for the extraction of spatial features (called keypoints) that are invariant when the scale and the observation point of the scene vary. They are also used for the correlation between images showing scale variations and small radiometric variations.
7.6 Hierarchical Structure of Data
337
(b)
(a)
(c) A 15 B
11 12
14
4
9 10 5 6
13
1
2
1
2
3
4
5
6
7
8
9
13
14
15
10
11
12
Fig. 7.16 a Simple 3D object; b decomposition octree of the object and c relative tree structure
7.6.5 Octree An octree data structure [3,5] is an analogous 3D representation of the quadtree. The data structure is organized as follows: It starts with an image represented by a cube (where each elementary cube, in this case, represents a voxel, i.e., the 3D pixel) and recursively the cube is divided into eight congruent and disjoint cubes (called octant, hence the name octree). If its current volumetric decomposition corresponds to a homogeneous characteristic (for example it represents the same color or the same physical entity), or a certain level of decomposition has been reached, the process ends otherwise the recursive decomposition continues. Figure 7.16 shows an example of a 3D object, its octree block representation and its tree structure. The first level of the tree is the root (starting level 0). The complete structure of an octree provides 8 N L nodes where N L represents the number of levels.
7.6.6 Operations on Quadtree and Octree Normally when processing an image you need to know the maximum spatial resolution a priori. The advantage of a hierarchical organization of image data allows to vary the spatial resolution in relation to the complexity of the objects to be processed. The most used image decomposition is the quadtree with squared sub-images and rarely decompositions with rectangular or triangular regions are used. Once the criterion of decomposition of the sub-images has been established, then the coding strategies of the entire tree structure are defined (nodes, pointers to the sub-images, coherence information of the sub-images, pointers to nodes representing adjacent regions, etc.) and how to visit (tree traversal, also known as tree search) the tree structure (quadtree and octree). These topics are well detailed in Knuth [4]. Another important aspect is the typical operations on the hierarchical structures of the image data. For example, the intersection and union operations of two tree structures, achieved by traversing them simultaneously with the top-down approach.
338
7 Data Organization
Quadtree 2
Quadtree 1 1.1
1.1
1.2 1
1.3
1.4
1.3
4.1
1.2
2.1
2.2
1.4
2.3
2.4 4.4
1
2
4.2
2.1
4.3
4.4 Q2
Q1
2
1
4
3
4.1 4.2 4.3 4.4
1.1 1.2 1.3 1.4
2
1
1.1 1.2 1.3 1.4
3
Intersection
1
2.1
2.2
2.3
2.4
1
2
4.1
4.2
4.3
4. 4
4
3
3
Q1 U Q2
2
4
2.1 2.2 2.3 2.4
Union
1
2.2 4
3
3
Q1
3
4
1
4.1 4.2 4.3 4.4
U
2
Q2
3
4
2.1 2.2 2.3 2.4
Fig. 7.17 Union and intersection operations on quadtree
In the case of binary images, the operation of union between two trees (quadtree or octree) is obtained (see Fig. 7.17) considering the pairs of nodes corresponding to the two trees as follows: 1. if one of the two nodes is white, the other node (which can also be a gray node and then copy the four child quadrants) is copied into the union tree;
7.6 Hierarchical Structure of Data
339
2. if one of the two nodes is black (remember that it represents parts of the object) we copy this node in the union tree (even if the other one should be gray, indicating a subtree, this must not be traversed); 3. if both nodes are gray in the union tree, a gray node is added and these steps are repeated for the corresponding children of the gray nodes. The union operation is completed by visiting all the terminal nodes merging the child quadrants that result in the same color. In Fig. 7.17 shows an example of a union between two images, also reporting the respective quadtree. For the intersection operation between two trees (quadtree or octree) we proceed as for the union (see Fig. 7.17) but reversing in the procedure the whites with the blacks. This is because the intersection between the two images is thought of as the merging operation with respect to the white pixels. Therefore for the intersection, it is necessary to first execute the complement on the binary pixels in the union algorithm. The complement operation always by binary image is easily obtained by traversing the tree, for example, in a pre-order mode, i.e., the root is visited and then descend to the nodes for each of which the black/white inversion is performed and vice versa. These operations on quadtree and octree structures are realized in the hypothesis of aligned trees, i.e., their root corresponds to the same region in the image.
References 1. P.J. Burt, E.H. Adelson, On the encoding of arbitrary geometric configurations. IEEE Trans. Commun. 31(4), 532–540 (1983) 2. H. Freeman, On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. EC 10, 260–268 (1961) 3. G.M. Hunter, K. Steiglitz, Operations on images using quad trees. IEEE Trans. Puttern Anal. Mach. Intell. 2, 145–153 (1979) 4. D.E. Knuth, Art of Computer Programming, vol. 1, 7th edn. (Addison-Wesley, Boston, 1997) 5. D.J. Meagher, Geometric modeling using octree encoding. Comput. Graph. Image Process. 19, 129–147 (1982) 6. I. Dinstein, R.M. Haralick, K. Shanmugam, Textural features for image classification. IEEE Trans. Syst. Man. Cybern. 3(6), 610–621 (1973)
8
Representation and Description of Forms
8.1 Introduction The segmentation process leads to the partition of an image in homogeneous regions. In this context, it is assumed that these homogeneous regions represent objects even if they may not correspond to a real object. Subsequently, these regions are given input to the recognition process for the identification and location of objects in the scene. For this purpose, it is necessary to define the modalities of representation and description of the shape of homogeneous regions obtained from the segmentation process. A homogeneous region can be represented considering the set of pixels that make up the contour of the region (representation by the contour) or considering the pixels, as aggregates, which constitute the homogeneous set of the pixels of the region itself (representation through the aggregation of pixels). For simplicity, a region represented by the contour is known as an external representation, while a region represented as a pixel aggregation is called an internal representation. External representation is useful when the shape is considered important while the internal one is used to better characterize the information content of the pixels in terms of radiometric (for example, color or texture). Once the internal or external representation scheme of a region has been chosen, it needs to define the modalities for describing the region itself, to characterize its form in a nonsubjective way. For example, if an external representation model is chosen, it is important to define the characteristics of the contour by extracting the most significant topological information such as Perimeter, Center of mass, Area, Number of concavities, etc. The role of the description of a region is strategic, for the next step, in the process of recognizing objects (see Fig. 8.1), which, starting from the form measures of the regions will have to produce a nonsubjective result. There are no exhaustive methods for the representation and description of the forms. Ad hoc solutions must be adopted for specific applications on the basis of the a priori knowledge available. © Springer Nature Switzerland AG 2020 A. Distante and C. Distante, Handbook of Image Processing and Computer Vision, https://doi.org/10.1007/978-3-030-38148-6_8
341
342
Image Pre-processed
8 Representation and Description of Forms
Segmentation Image
Segmented Image
Representation and Description of the shapes
Representation and Description of the features
Object Recognition
Objects
Fig. 8.1 Process of representation and description of objects, inserted immediately after the image segmentation process, functional to subsequent modules for object recognition
Recall that the forms to analyze are 2D projections of 3D objects that may appear different in relation to the different conditions of observation (change of position and orientation between object and observer). This generates different 2D shapes in the image plane for the same object. From this emerges the need to develop methods of representation and description of the forms that ideally should be invariant with respect to the change in scale, the position, and orientation of the objects observed. Unfortunately, there are no methods of describing the shapes that work perfectly in the various conditions of object observation. However, there are several methods of representation and description that offer good results to describe 3D objects, starting from one or more of their 2D projections, on the basis of some geometrical and topological properties derived from the observed forms. In this chapter we will describe some modalities of external representation of objects having in the Chap. 6 Properties of the digital image already described the internal representation with some topological properties (perimeter, area, compactness, ...). Subsequently, the most well-known shape descriptors will be reported.
8.2 External Representation of Objects The object contours are determined in the image by using the contour extraction algorithms by extracting a sequence of coordinates of the pixels that represent the contours. These contours can be described with cartesian coordinates, polar coordinates related to the origin, tangential coordinates that give the direction of the tangent at a point of the boundary, or with sequences of chain coding (and relative coordinates of the starting pixels) created with 4-neighborhood or 8-neighborhood. The description of the chain code was introduced in the Chap. 7 Sect. 7.3.
8.2.1 Chain Code The contour description with the Freeman [1] method, also called Freeman Chain Code, considers a starting pixel of the contour, observing the direction of the next pixel of the contour nearby, moves on this pixel and look again in the vicinity the direction where the next pixel is located and so on until it finds the starting pixel. The
8.2 External Representation of Objects
343
S=000664432 0
1 2
0 3
1
2
1
1000332212
0
3
2
3 2
0
0
0
2
3
4
4
0 6
3
6
4 5
2
1 0
6
7
2212100033
Fig. 8.2 Boundary chain code in 4-neighbors defined with two different starting points, and 8neighbors for the same contour
set of these directions is a sequence of numeric codes known as chain code. Graphically these numerical codes can be considered as unit segments whose direction is defined precisely by the numerical codes. Figure 8.2 shows how the pixel directions of the contour are coded in a square grid on the basis of the 4-neighbors (angular resolution of 90◦ ) and 8-neighbors (angular resolution of 45◦ ). We also observe the complete description of the pixels of a contour which includes the coordinates of the position (x, y) of the initial pixel (normally the topmost left pixel) of the contour and a 4-path or 8-path which constitutes, respectively, the ordered sequence of the direction codes from 0 to 3 or from 0 to 7. The use of the 4-neighbors or 8-neighbors depends on the angular resolution with which you want to describe a contour or how to define the adjacency of a pixel with those neighbors if 4-connected or 8-connected. Figure 8.2 also shows how the sequence of the codes of contour changes with the variation of the starting pixel of the contour. This would involve a big problem for the analysis of the forms when based on the comparison of the code sequence. This drawback can be solved by normalizing the sequence of the contour codes as follows. The sequence of the codes is considered as a circular sequence (made up of the numbers indicating the direction codes), which can be reconfigured by elementary operations of circular rotation (displacement and insertion) of the direction numbers (the code coming out of one end of the sequence re-enters from the opposite end) until the sequence of numbers forms an integer with minimum value. For example, the sequence 1000332212 of the contour of Fig. 8.2 is normalized with a single displacement and rotation of the codes to form the smallest number 0003322121. With the chain code a contour S = (s1 , s2 , . . .) can be rotated by 45◦ × n (or ◦ 90 × n), if each code si of the sequence S is added with (n mod 8) (or (n mod 4)). In this way, the integer n can vary from 1 to 7 (or from 1 to 3). Figure 8.3 shows an example of counterclockwise rotation of 90◦ of the sequence S = 2212100033 which changes with (si + 1 mod 4)1 in S (1 × 90◦ ) = 3323211100.
1 To avoid negative numbers, module 4 is used for adjacency of pixels with 4-connectivity or module
8 for adjacency with 8-connectivity.
344
8 Representation and Description of Forms
2
0 3
0
0
1
1
2
1
3
3
1
3 2
0
0 S’=1x90°=3323211100
1 2
1
2
3
2
0
S’
3
S=2212100033 Fig. 8.3 Anticlockwise rotation of 90◦ of a 4-neighborhood boundary by calculating the derivative of the original chain code
With respect to rotation, the code of a sequence is normalized (rotation invariant) if the first derivative of the encoding is used instead of the encoding itself. The first derivative is calculated by counting the number of directions changes (counterclockwise) that separate two adjacent codes of the sequence. The chain code derivative can be obtained from an ordinary chain coding sequence considering the first difference sj − si (mod 4 or mod 8) of each pair of consecutive codes . . . , si , sj , . . . For example, if we consider the contour 10103322 its first derivative with respect to a 4-path would result Input code Difference
1 0 1 0 3 3 2 2 3 1 3 3 0 3 0
Circular difference Shape number
3 3 1 3 3 0 3 0 0 3 3 1 3 3 0 3
If the code is considered as a circular sequence, the last element of the normalized code is obtained by considering the directional distance between the last element of the input contour (in the example it is 2) and the first code (in the example it is 1). In the example considered, the circular first difference code is also shown having added the code 3 associated with the last transition of the circular sequence 2 → 1 (last and first code of the input sequence). Finally, the normalized code (to make it independent of the starting code) has been added to the first difference previously known as the shape number. The first derivative of the chain code is, in fact, another sequence whose codes indicate a relative direction of the individual codes seen as unitary directional segments. The 4-directional codes (0, 1, 2, 3) are normally defined with respect to the image seen as a pixel grid. For some contexts, it may be useful to define the codes si in terms of relative directions in perspective of the movement, i.e., right direction → 0, above ↑ 1, left ← 2, and below ↓ 3. The sequence is, therefore, described with these codes with directions relative to the previous direction taken. For example, if we consider the sequence of absolute codes 3012 that represents a square contour (anticlockwise path starting from the top left corner), the sequence expressed with
8.2 External Representation of Objects
345
relative codes, calculating the derivative, is 1111 which corresponds to the relative movements of (left, left, left, left), having defined the codes: 0 for movements in the same direction as the previous segment, 1 on the left, 2 on the back, and 3 on the right. The choice to use the relative coding, in addition to being invariant to rotation, is also useful when one is interested in evaluating the directional variation of the contour locally. The choice of using 4 or 8 directions influences the adjacency of the contour pixels. In fact, the paradox occurs that if we assume an object with an 8-connected boundary, the background is 4-connected. Vice versa, if we assume an object with a 4-connected boundary, the background is 8-connected. This implies that we can not choose the same connectivity to describe the object and the background of an image. From Fig. 8.2 we observe how the outer contour of the object, which belongs to the background, can be encoded in 8-neighbors when the contour of the object is encoded in 4-neighbors whereas the opposite occurs if the object is encoded in 8-neighbors. Chain code as well as shape number is very sensitive to noise, image resolution and object orientation in the image. Furthermore, the coding of the contour requires long sequences of codes. Scale invariance can be controlled by adjusting the size of the image sampling grid. An advantage of the chain code with respect to the matrix representation, especially for binary objects, is given by the compactness of this coding. For example, for a circular object with a diameter of D pixels, in the matrix representation it would take about D2 memory locations to store all the pixels of the object, while with the chain code with 8-directions, the object would be encoded in about π D bits (in practice 3 bits for each pixel of the contour). For a very large D, the advantage of compactness is more evident. Another advantage of this coding is when it needs to calculate the perimeter and the area of the form as will be described in the next paragraphs.
8.2.2 Polygonal Approximation—Perimeter The contour of an object can be approximated by a closed polygonal consisting of a sequence of points x1 , x2 , . . . , xn , where each pair of consecutive points (xi , xj ) defines a segment. The contour will be closed if x1 = xn . The ends of each segment constitute the vertices of the polygon. The vertices are points normally belonging to the edges extracted with the Edge Detection algorithms described in Chap. 1 Vol.II. The contour is perfectly approximated when the number of segments of the polygon is equal to the number of pixels in the contour. The approximation level of a contour is controlled in relation to the number of segments used. In many applications, however, it is useful to approximate a contour using a polygon with an optimal minimum number of segments that best represent the shape of an object. Among the various approximation techniques [2–4], the simplest one is to calculate a polygonal with minimum perimeter value. Suppose we can define a closed contour with a path, that is, as short as possible. This is possible by thinking of
346
8 Representation and Description of Forms
(a)
(b)
(c)
(d)
x
Fig. 8.4 Polygonal approximation of a closed contour. a Original contour divided by the x1 x2 line connecting the farthest end points. b Found the points x3 and x4 on the contour, the most distant perpendicular to the x1 x2 line and join the first four vertices found in the polygon. c Iteration of the splitting process on the x1 x3 segment and joining its extremes with the more distant x5 point found. d Final result of the polygonal approximation by splitting
wrapping with a rubber band the most external pixels belonging to the object thus delimiting, as much as possible, the homogeneous region. This path will determine a polygon with a minimum perimeter that approximates the geometry of the object. As shown in Fig. 8.2, using an 8-path the error, √ that is, committed between the ideal and the approximate boundary will be at d 2, where d (indicates the spacing of the square grid of sampling) depends on the resolution of the grid.
8.2.3 Polygonal Approximation—Splitting The goal is to find the vertices xi of the contour approximation polygon. Initially, a segment is considered on the contour, and subsequently, the segment is split (splitting) into two parts. The segments obtained can be further subdivided so that this procedure can be repeated recursively until a given criterion is satisfied. The phases of the splitting procedure are 1. Find two points x1 and x2 , the most distant possible, on the contour and join them with a x1 x2 line. This line divides the outline into two parts as shown in Fig. 8.4a. 2. Find a point x3 from the top side of the line and point x4 from the bottom side, such that, the perpendicular distance of each point from the x1 x2 line is the maximum possible (see Fig. 8.4b). 3. Trace the segments connecting the ends of the initial line x1 x2 with the points found x3 and x4 . The polygon is thus obtained as shown in Fig. 8.4c. 4. Repeat steps 2-3-4 until the perpendicular distance is less than a certain threshold. Normally a default threshold is considered as a fraction of the initial x1 x2 line (for example 0.2).
8.2.4 Polygonal Approximation—Merging Generate the approximation polygon by calculating the segments using a linear interpolation process of the contour points. A method used is based on the calculation
8.2 External Representation of Objects
347
of the minimum quadratic error between the origin points of the contour and the approximation segments. The points of the boundary (merging) are considered that in the linear interpolation the error is maintained within a predefined threshold value. Interpolated segments that do not exceed the threshold value are saved. The process is repeated with a new interpolation until all points of the original contour have been processed (various approaches are described in [2]). The vertices of the polygon are calculated with the intersection of the lines described by the interpolated segments. The vertices do not always correspond with points in the original contour. The two splitting and merging approaches can be combined to work cooperatively.
8.2.5 Contour Approximation with Curved Segments A contour can be decomposed into curved segments with a constant radius of curvature or divided into segments that are interpolated by polynomials (normally seconddegree, such as ellipse, parabola, circumference). Moreover, the segments of a contour can represent the primitives of a syntactic classification process (see Sect. 1.18 vol III). In various applications of artificial vision, the contours are used to characterize the shapes of objects at various scales.
8.2.6 Signature The signature is defined as a 1D function to represent the contour of an object. Among the different functions that can be defined, the simplest is to consider as the signature function the distance r of each pixel of the boundary from its center of mass defined by varying the angle θ between the axis of the x and the distance vector r (see Fig. 8.5). In practice, we pass from a 2D representation of the boundary to a 1D representation r(θ ) thus obtaining a simplified version of the contour. Another signature function is obtained by considering the angle formed between the tangent at each point of the contour with a horizontal reference line (see Fig. 8.6a). This function contains the signature of the contour is a representation of the shape. The horizontal segments of the function correspond to rectilinear segments in the contours since the tangent angle remains constant.
Fig. 8.5 Graph of the signature function relating to the circular and quadratic forms. In the first case the pixels of the contour are all at a constant distance from the center of mass, while for the square the signature is given by r(θ) = A sec(θ) if 0 ≤ θ ≤ π/4, while we have r(θ) = A csc(θ) if π/4 < θ ≤ π/2
348
8 Representation and Description of Forms
Fig. 8.6 Signature function considering shapes with complex geometry: a θ in this case is given by the tangent of each point of the boundary with a horizontal line and b Slope density function
(b) (a)
r(θ) θ
rs(θ)
ri(θ)
√2A
r
ri
A
√2A θ
rs
A
√2A θ
A A/2
A
0
π 4
π 3π 2 4
π 5π 3π 7π 2π 4 2 4
θ
0
π 4
π 3π 2 4
π 5π 3π 7π 2π 4 2 4
θ
0
π 4
π 3π 2 4
π 5π 3π 7π 2π 4 2 4
θ
Fig. 8.7 Graph of the signature function of a rotated square contour of 45◦ and reduced 2D scaling by a 1/2 factor. The effect of the rotation on the signature is observed that reduces only to a translation equal to the angle of rotation, while the profile remains the same. The change of scale produces on the signature only a reduction of the amplitude of the same entity of the scale factor
This method can be seen as a continuous representation of the chain code. A variant of this function is the so-called contour representation using the slope density function (see Fig. 8.6b). This function represents the histogram of the slopes (angle of the tangents) of the contour. Since a histogram represents a measure of the accumulation of the values of the angles of the tangents present in the contour, the density function of the slopes has peaks corresponding to the contour lines with strong angular variations (edges or presence of discontinuity). The signature function generated depends on the size of the shape, rotation and the starting pixel of the contour. Scale invariance can be controlled with the normalization of the r(θ ) function with respect to a predefined maximum value (normally r(θ ) is normalized with values between 0 and 1). The invariance at the starting point is obtained always starting from a known point of the contour or calculating first the chain code representation and then normalizing the code as already known (see Sect. 7.3). The invariance to the rotation can be realized by translating the signature function until the global minimum is in correspondence of θ = 0 in the hypothesis that such minimum exists. Figure 8.7 shows the signature functions for the contour of a rotated and scaled object. It is observed that it is possible to evaluate, through the signature function, whether these contours belong to the same object. In fact, the signature ri (θ ) function of the slanted square of 45◦ is identical to r(θ ), that of the horizontal square, except for the simple translation along the θ axis of the same angle. The signature rs (θ ) function of the 2D scaled contour of a factor 1/2 is identical to the original square one with the same scale factor.
8.2 External Representation of Objects
349
The conditions of invariance to rotation and scale are not completely maintained if we consider the effects of pixelation of digital images, particularly in magnification. This actually involves the introduction of noise into the signature function and cannot be used to correctly evaluate the similarity of the shapes.
8.2.7 Representation by Convex Hull The representation of complex contours in which concavities are present cannot be achieved with the previous methods. In these cases, one method considered is to decompose the region into convex parts. For this purpose, the decomposition methods based on the minimum convex envelopes are used. Recall that a set R (in this case represents a region) is said to be convex, if, and only if, for any pair of points Pi and Pj belonging to R, the associated segment Pi Pj is completely included in R (see Fig. 8.8). Instead of the Pi Pj segment, the conjunction can be expressed in terms of the path between Pi and Pj , for example, with 8-neighbors. The convex hull H of an arbitrary set S (region) is the minimum convex set (region) containing S. The difference set H − S is called convex deficiency D of the set S. In other words, the set of points belonging to the convex hull that are not part of the object, form the deficit of the convexity of the object. In addition, the convex hull H has the smallest area and the smallest perimeter of all the possible convex polygons containing S. The contour of the region is partitioned by tracking the boundary points of S and labeling the transition points where it enters or exits the parts of the convex deficiency. With such labeled points, it is possible to well characterize the shape of the contour (see Fig. 8.9). In [5] the state of the art of the algorithms for the construction of the optimal convex hull that characterizes the shape of a region in image analysis applications is reported. In real applications, the contour is affected by spurious points due to noise (caused by the digitizing process) and by the contour extraction algorithms. In these cases,
Fig. 8.8 Examples of region a Convex and b Not convex
(a)
(b) Pi Pi Pj Pj
Fig. 8.9 Example of region S with a Convex deficiencies (colored) and b The partitioned boundary of the region
(a)
(b) S
350
8 Representation and Description of Forms
before partitioning the contour that would produce insignificant convex deficiencies, it is better to smooth the contour or apply a suitable polygonal approximation, simplifying the algorithm that extracts the convex hull.
8.2.8 Representation by Means of Skeletonization In many applications, there are images that represent particular objects with a dominance of linear structures with variable thickness and complex forms of ramifications. Think, for example, of images for inspection of printed circuits, or images from the digitization of elevation maps through of the elevation curves (slope) of the territory, or those obtained from the digitalization of technical drawings. In these cases, it is convenient to use the so-called skeleton for the representation of the shape of objects (elevation curves, printed circuit tracks, drawing sections). The process that generates the skeleton is called skeletonization or thinning. Returning to the image of the elevation maps normally presents the need to thin the elevation curves that normally after the digitization are very thick (see Fig. 8.10). The representation through the skeletonization, therefore, requires a process of thinning that must satisfy the following characteristics: (a) the skeletonization must produce lines with a thickness of at least one pixel; (b) the thinned structure should be located in the central area of the objects; (c) the skeletonization should not introduce discontinuity and artificial ramifications and tends to be insensitive to noise; (d) the skeletonization must converge to an optimal final result after a certain number of iterations; (e) the shape of the objects to be thinning must not be very different from the original after the skeletonization. There are several available skeletal-based algorithms and only a few of the above mentioned features are met. A skeletal approach is based on the Medial Axis Transformation—MAT (also called Symmetry Axis Transform—SAT). An intuitive explanation of the MAT is given by imagining that it wants to burn two meadows, one cylindrical and one rectangular, as shown in Fig. 8.11. Let’s assume that this happens by triggering the fire by simultaneously involving all the points of the boundary (circular and rectangular). Then, for the circular lawn, the fire proceeds up to the center of the lawn and we can think that the front of the flames propagates like so
Fig. 8.10 Result obtained with a thinning algorithm
thinning
8.2 External Representation of Objects Contour
351 Rectangular contour Lines of fire
Fire border
Median axis of the skeleton
Median axis of the skeleton
Fig. 8.11 Snapshot of the process of skeletonization, for a circular and rectangular region, by the median axis transform (MAT)
many concentric circumferences always smaller until it is reduced to a point when the fire goes out. Similarly, for the rectangular lawn, the front of the flames evolves with smaller rectangular figures until it is reduced to a line a moment before the fire is turned off. For the rectangular lawn, there are points, where the horizontal and vertical fire lines intersect, which are at the minimum equidistance from at least two points of the contour. The set of focus points with this characteristic constitute the median axis of the skeleton which can be seen as another alternative way of representing the shapes of objects. For each P point belonging to the median axis, we can match a value of f (P) in relation to the time taken by the fire front to reach P, obtaining, in this case, the function f (P). Vice versa, knowing the P points belonging to the median axis and the values f (P) it is possible to perfectly reconstruct the original form of the object. On an intuitive level, the reconstruction of the object is equivalent to regenerating the lawn as a reverse process of the fire front with a propagation time equal to f (P) for each point of the median axis. More generally, we can define the MAT transformation of a R region with a C boundary as follows: Each pixel of R belongs to the median axis if it is at the same time equidistant from at least two points of the contour C. In the case of digital images, the value f(P) of the pixels belonging to the median axis corresponds to the distance of P from the contour. The definition of the median axis can also be seen as the place of the centers of the binary bounding circles completely included in the region. The value of the radius of each circle corresponds to the value of the MAT (see Fig. 8.12). The inverse problem, i.e., as the MAT reconstructs the region, this is possible with the union of the bitangent circles centered in the points of the median axis having the value of the MAT as a radius. It highlights the compact representation property of an object with the MAT and derived shape information. The closest distance depends on the definition of distance between pixels that can be defined in different ways and consequently influences the results of the skele-
352
8 Representation and Description of Forms
Fig. 8.12 Median axis of some geometric shapes
tonization. Normally the Euclidean distance is used. The skeletonization based on the medial axis transform consists precisely in the determination of the pixels belonging to the median axis starting from the already segmented input image. Each object has associated its own median axis as shown in Fig. 8.12. The MAT requires prohibitive calculation times because for each pixel of the region it is necessary to calculate the distance from each point of the contour. A simple skeletonization is obtained by applying the transformed distance (DT) (see Sect. 6.4 of Chap. 6) which measures, in this case, the minimum distance d (P, C) of each pixel P of the R object from the contour C. The distances d are local maxima if d (P, C) ≥ d (Q, C) for each pixel Q in the neighborhood of P. The set of pixels R∗ in R with distance d from the contour C which are local maxima is called a skeleton, or a median axis or a symmetry axis of R. In a binary image, the object is represented with pixels of value 1 and the background that includes the contour is represented with 0. The distance value is calculated by operating with a 3 × 3 window. By positioning the window in each pixel of the binary input image f (P), the distance g(P) of the pixel centered on the window is calculated by adding to the value of the current pixel f (P) the minimum value of the 4-neighbors pixels (or 8-neighbors). The process can be repeated by reapplying the same procedure to the transformed image g, i.e., adding to the value of the pixel under examination f (P) the value of the distance g(P) of the previous step, thus obtaining a second image transformed g 1 (P). The process continues until there are no further changes in the transformed image g m (P). The latter contains for each pixel P of the object the values of minimum horizontal distance or vertical (or diagonal) from the contour (or background). More generally, the value of the distance for the pixel P(i, j) to the k-th iteration is given by g 0 (i, j) = f (i, j) g k (i, j) = g 0 (i, j) +
min
{g k−1 (i, j), Dx [(i, j), (u, v)]} k = 1, 2, . . .
Dx [(i,j),(u,v)]
(8.1)
8.2 External Representation of Objects
(a)
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 2 2 2 1
1 2 2 2 1
1 2 2 2 1
1 2 2 2 1
353 1 2 2 2 1
1 1 1 1 1
1 1 1 1 1
1 2 2 2 1
1 2 3 2 1
1 2 3 2 1
1 2 3 2 1
1 2 2 2 1
1 1 1 1 1
1
1 2
2 3
3
3
2
2
1
1
Skeleton
(b)
Image
Distance Transform DT
Skeleton
Median axis
Fig. 8.13 a Result of the iterative process for calculating the skeleton based on the median axis transform and b Results obtained with the (8.1), as the place of the local maximums of the distance transform, for a rectangular region
where (u, v) are the pixel locations in the vicinity of (i, j). The minimum of 8neighbors or 4-neighbors for each pixel (i, j) is given by D8 [(i, j), (u, v)] = 1 for 8-distance D4 [(i, j), (u, v)] = 1 for 4-distance The set of local maximum pixels of the image g k , obtained from the transformed distance, represents the skeleton (median axis) and the MAT values correspond exactly to the local maxima. The latter are found by looking for a first local maximum in g k and then searching for the other local maxima with pixels connected with 4-neighbors or 8-neighbors. Figure 8.13 shows the results of this procedure to obtain the skeleton of a rectangular region. It is observed (see Fig. 8.13b) that the transformed distance from the binary image produces a gray level image where the pixels of the background remain at zero while the internal pixels of the region assume values corresponding to their distance (in relation to 4/8-neighbors) from the background. On the maximum local image obtained from the distance transform, one can associate the circles and reconstruct the original form of the object as made with the bitangent circles centered in the median axis. The presence of noise in the original image can generate remarkable artifacts on the extracted skeleton, compromising the representation of the shape of the object.
8.2.8.1 Thinning Algorithm The best performing skeletonization algorithms are those based on the thinning approach. In the literature, there are several algorithms of thinning, developed for various application contexts, whose criticalities are discussed in [6]. These algorithms gradually thin the object by deleting the contour pixels while connectivity is maintained. The thinning process stops when no more pixels can be erased without interrupting the region. Pixels that generate breaks are analyzed (visited) more than once while traversing the boundary. Consequently, thinning algorithms need to cross the region boundary, identify the pixels that can be candidates for deletion
354
8 Representation and Description of Forms
(not producing interruptions) and record how many times they have been visited. Subsequently, among the candidate pixels, only those that have been visited only once are deleted. The process continues until other pixels can be erased. A thinning algorithm is presented [7] operating on a binary image with a zero background and a pixel of the region with a value of 1. The procedure involves several iterations each consisting of two steps on the candidate contour pixels of the region, where the boundary pixel is defined as any pixel that has a value of 1 and has at least one pixel between the 8-neighbors at value 0. Each boundary pixel is parsed with a 3 × 3 window where the Pi pixels 8-neighbors are shown in the window as follows: P9 P2 P3 P8 P1 P4 (8.2) P7 P6 P5 where P1 indicates the pixel in processing and P2 , . . . , P8 indicate the 8-neighbors pixels. Each iteration involves the two steps that operate as follows: 1. A pixel P1 of the contour is labeled for cancelation if it meets all the following conditions: 2 ≤ NZ(P1 ) ≤ 6
(8.3)
Z(P1 ) = 1 P2 · P4 · P6 = 0 P4 · P6 · P8 = 0
(8.4) (8.5) (8.6)
where, NZ(P1 ) indicates the number of nonzero pixels of the 8-neighbors of P1 , Z(P1 ) is the number of transitions from 0 to 1 in the circular sequence {P2 , P3 , . . . , P8 , P9 , P2 }. 2. All contour candidate pixels labeled for deletion are deleted and examined the remaining boundary points to be labeled for deletion if they meet the following conditions: 2 ≤ NZ(P1 ) ≤ 6 Z(P1 ) = 1 P2 · P4 · P8 = 0 P2 · P6 · P8 = 0
(8.7) (8.8)
The procedure is repeated several times until no deletion occurs. Figure 8.14 shows the results of the thinning algorithm described above. The analysis of the procedure highlights the following: (a) The first step processes each contour pixel in the region and if any of the conditions is violated, the value of the pixel under examination is not changed. If all the constraints are satisfied, the pixel is deleted in the second step.
8.2 External Representation of Objects
355
Fig. 8.14 Results of the described thinning algorithm applied to circular regions of variable size
(b) The constraint (8.3) guarantees that the point to be deleted is not an extreme point (we would have if NZ(P1 ) = 1) that would create an isolated pixel. If NZ(P1 ) = 7 and P1 is canceled, there is an erosion of the region. (c) The constraint (8.4) guarantees the connectivity of the boundary preventing the division of the region. (d) The constraints (8.5) and (8.6) are satisfied if P4 = 0 or P6 = 0 or if simultaneously there is (P2 = P8 = 0). According to the window (8.2) and keeping the first two constraints, these situations occur when the pixel under test P1 is a contour pixel belonging to the east or south side, or is a north-west corner. In these cases, P1 does not belong to the boundary and is deleted. (e) The second step works in a similar way to the first but with the constraints (8.7) and (8.8) that are satisfied if we have these minimum values: P2 = 0 or P8 = 0, or (P4 = P6 = 0). This occurs when the pixel under consideration belongs to the contour from the west side, or to the side from the north, or is an edge or from the south-east.
8.3 Description of the Forms The next step to segmentation concerns the process of recognizing objects (see Fig. 8.1), i.e., their identification starting from the homogeneous regions obtained by segmentation. The recognition process uses the representation schemes of the regions discussed above. Let us now see how it is possible to extract quantitative measures that characterize the shapes of the regions to simplify the process of object recognition. Shape measurements can be extracted by analyzing the contour pixels or all the aggregate pixels of the regions themselves. We also remember that it is necessary to define measures of invariant forms with respect to the position of the
356
8 Representation and Description of Forms
object projected in the image plane and to the different possible projections of the same object. This would lead to the choice of invariant shape descriptors with respect to rotation, translation, and distance between object and observer. However, it is useful to extract simple measurements such as area, length, curvature, etc. of a region by reducing the description to simple scalar values although aware that these global measures do not completely capture the information of an object. In many applications, such scalar measures used as additional features may be appropriate for the recognition phase.
8.3.1 Shape Elementary Descriptors 8.3.1.1 Perimeter It is a simple geometrical measure that can be calculated starting from the representation of the contour of an object. The perimeter is given by the total length of the contour calculated directly from the contour pixels. The perimeter expresses the sum of the distances between the center and the center of the sequence of the pixels that constitute the contour of the object. From a chain code representation of a region, the perimeter is calculated directly from the boundary codes. With 8-direction coding, the perimeter is expressed by the contribution of pixels in a vertical or horizontal direction encoded with even numbers, and by the contribution of the pixels in a diagonal direction (at 45◦ with respect to the main axes) encoded with odd numbers (See Fig. 8.15). Assuming the pixel with square dimensions (ratio of width to height 1:1), the perimeter P of the 8-neighbors encoded object is given by: √ (8.9) P = nP + 2 · nD (a)
(b) y
y 1
√2
1 2
2
1
4
0
5
6
7
3 0
3
√2 x
x
Fig. 8.15 Calculating the perimeter of an object whose boundary is represented by the Chain Code encoding. a With 4-neighbors connectivity the perimeter is given by the number of horizontal and vertical pixels of the boundary (in this case P = 32 pixels). b For the same object √ with 8-neighbors connetivity, the contribution of the diagonal segments increases by the factor √ 2 obtaining however a better approximation of the perimeter measurement given by P = 11 · 2 + 10 · 1 ≈ 25.55
8.3 Description of the Forms
357
where nP and nD , respectively, represent the number of boundary pixels with odd and even code. If the boundary is encoded with 4-directions, the perimeter measurement is √ derived directly from the number of the bounding pixels (there is no longer the 2 factor relative to the diagonal pixels) and the perimeter length tends to increase. The perimeter is very sensitive to the noise present in the image and is influenced by the segmentation algorithms used for the extraction of homogeneous regions. A lot of attention must be paid when you want to use the perimeter as a shape descriptor evaluated on different images. The perimeter is not always invariant with respect to the rotation of the object while the change of scale can be controlled by normalizing with respect to a reference image.
8.3.1.2 Area The area of an object is a geometric measure of insignificant shape. The measurement of the area is obtained directly by counting the number of pixels of the region in case of its matrix representation. If the object is represented by the contour, the area measurement is more complex. In reality, the area measurement is more easily calculated if the contour is represented using the chain code (see Fig. 8.16). This derives from the compactness of the encoding of the contour. The procedure that calculates the area starting from the encoding of the contour operates in a similar way to the numerical integration. The calculation of the area is independent of the position of the object, therefore, in the integration process we can consider any reference system x − y in the image plane. For simplicity, we consider that the first code of the contour corresponds to the pixel with the maximum value of the coordinate y from which the process of integration of the object area starts. The procedure begins to process each code of the contour sequence and for the corresponding pixel, the ordinate y must be known. The area measurement is calculated by summing the positive or negative contribution of the area of the rectangles, generated in correspondence with each code of the sequence, which have the base width equal to one pixel and the height equal to the corresponding ordinate y. In particular, in the case of 8-direction coding, the codes 0, 1, and 7 with direction to
Fig. 8.16 Calculation of the area of an object whose contour is represented by the chain code encoding. The area measurement is given by subtracting the area underneath the upper boundary (drawn in red) from the area subtended by the lower boundary tract (drawn in yellow)
y
Starting point
-3
2
-4 -5
x
1+ 0+
6
7+
358
8 Representation and Description of Forms
the right (upper section of the contour) produce increments of the area in proportion to the ordered y of each pixel. Codes 3, 4 and 5 which indicate directions to the left (lower part of the contour) produce decreases of the area in proportion always to the ordinate y corresponding to the lower pixels of the contour, while the codes with vertical direction 2 and 6 do not produce any effect . Summing up, indicating with A the area, with S1 , S2 , . . . , Sn the sequence of the boundary codes, the area is calculated with the Algorithm 1. Algorithm 1 Pseudo-code of the algorithm calculation of the area of a region 1: Input: Sequence of the boundary codes S1 , S2 , . . . , Sn (8-directions) of which the image coordinates (x, y) of each pixel are known. 2: Output: Area A of the contour 3: A ← 0 4: while i ≤ n do 5: 6: 7: 8: 9: 10:
switch (Si ) case 0, 1, 7: A ← A + y(Si ) case 3, 4, 5: A ← A − y(Si ) case 2, 6:
11: 12:
end switch
13: end while 14: return A
It can be observed how the area calculation procedure, starting from the boundary code, is efficient, requiring only an addition or subtraction for each pixel of the contour and does not depend directly on the number of pixels of the region itself. In the 4-directions encoding the increment of the A area is determined by the code 0 (right direction), the code 2 (left direction) produces decrease while the codes 1 and 3 (vertical high and low) have no effect.
8.3.1.3 Compactness The geometrical perimeter and area measurements are both insignificant when an object is observed from different distances. In this case, it is necessary to define mea-
8.3 Description of the Forms
359
Fig. 8.17 Examples of very compact and noncompact objects
c >1
c =1
Compact shapes
c >>1 Non-compact shape
sures of invariant shapes to the change of scale of the object in the image plane. The compactness C is a simple (nondimensional) measure of shape, defined as follows: C=
P2 A
(8.10)
where P and A are the perimeter and area of the region, respectively. The circle is the most compact geometric figure in the Euclidean space with a minimum value of ∼ compactness equal to 4π √ = 12.57. For the square, the compactness is 16 and for an equilateral triangle is 36/ 3. It may be useful to define the normalized compactness c, with respect to the circle, as c = C/4π assuring a minimum value equal to 1 of the compactness, i.e., the minimum possible value. The measurement of the compactness becomes a significant feature when the objects are not very compact, i.e., with very high c values (see Fig. 8.17). Area, Perimeter, and Compactness, under appropriate conditions, are invariant shape measurements with the rotation of the object in the image plane. A measure of the roundness of an object is obtained by considering the reciprocal 1/c of the compactness.
8.3.1.4 Major Axis A major chord of an object is defined as the maximum length chord drawn between two extreme points of the contour of the object itself (see Fig. 8.18a). If we indicate with Pi and Pj two pixels of the contour C and with d (Pi , Pj ) the distance between Pi and Pj , the length of the major axis D1 is given by D1 = max {d (Pi , Pj )} = (xj − xi )2 + (yj − yi )2 (8.11) (Pi ,Pj )∈C
where (xi , yi ) and (xj , yj ) are the coordinates of the pixels Pi and Pj , respectively. The angle θ formed between the major axis and the positive axis of x defines the
360
(a)
8 Representation and Description of Forms
(b)
(c)
Fig. 8.18 Object with the graphical representation a of the major axis, b of the minor axis and c of the basic rectangle
orientation of the object in the image plane and is calculated by θ = arctan
y − y j i xj − xi
(8.12)
The length of the major axis D1 and the orientation angle θ are two useful characteristics for the description of the contour.
8.3.1.5 Minor Axis The minor axis of an object is defined as the maximum length chord that can be drawn in a direction perpendicular to the major axis. The length of the minor axis is calculated analogously to that of the major axis, considering, in this case, the pixels Pl and Pk of the contour corresponding to the end of the minor axis (see Fig. 8.18b). 8.3.1.6 Basic Rectangle The parallelogram with sides parallel to the major axis D1 and to the minor axis D2 which completely includes the object is called basic rectangle (see Fig. 8.18c). The basic rectangle corresponds to the smallest rectangle that completely includes the object and has an orientation that coincides with that of the object itself. The base rectangle can be calculated if the θ orientation angle of the object is known. The coordinates (x, y) of each pixel P of the contour are transformed by the geometric transformation equations for the rotation, given by x = x cos θ + y sin θ y = −x sin θ + y cos θ
(8.13)
From the transformed coordinates are selected those with minimum and maximum value, respectively, of the x andy axis, identifying in this way on the contour the corresponding four pixels P1 , P2 , P3 and P4 that generate the basic rectangle, with
8.3 Description of the Forms
361
) and height h = (y base a = (xmax − xmin max − ymin ). It follows that the area of the basic rectangle is given by Arb = a · h.
8.3.1.7 Rectangularity The measure of rectangularity of an object is defined by the ratio A/Arb between the area A of the object and the area Arb of the base rectangle. This ratio becomes 1 if the object is a rectangle that coincides with the basic rectangle.
8.3.1.8 Eccentricity The simplest measure of the eccentricity of an object is obtained from the ratio D2 /D1 between the length of the major axis and that of the minor axis.
8.3.1.9 Elongation The elongation (also known as aspect ratio2 ) of an object is defined by the ratio h/a between height h and base a of the base rectangle associated with the same object. The measurement of the elongation thus defined is not useful when the region represents an object with a horseshoe shape. In these cases it is convenient to define the elongation as elong = A/s2 , that is, evaluated as the ratio of the area A of the region and the square of the maximum thickness s of the object. s can be determined using the morphological erosion operator based on the number of steps necessary to make the region disappear.
8.3.1.10 Summary of Elementary Shape Descriptors The elementary descriptors introduced can be used in various applications as features to characterize various types of objects. For example, microscope images characterized by spherical objects of various sizes, the area measure is a useful feature. Eccentricity is useful for discriminating circular and elliptical regions of varying sizes. Major axes along with orientation angles and minor axes are very useful as a feature for analyzing images with dominant elliptical structures of various sizes and orientations. In robotic cell applications for the automatic gripping of objects with polygonal and circular contours, the measures of rectangularity and compactness are very useful for characterizing these shapes in the recognition phase and determining their localization in the image plane. For this last aspect, we need to highlight the descriptors (Compactness, Eccentricity, Rectangularity, Elongation) whose shape measurement is invariant with respect to their position in the image and change in scale. The Table 8.1 reports all the elementary descriptors presented in this paragraph.
2 The
aspect ratio, is used to characterize the format of images to be displayed or acquired with the various available technologies. In this case, it indicates the ratio between the width and height of the image (example 4:3, 16:9,...).
362
8 Representation and Description of Forms
Table 8.1 Elementary shape descriptors Descriptor
Symbol and formula
Perimeter
P
Area
A
Compactness
C = P 2 /A
N.zed compactness
c = C/4π Scale and position D1 = (xj − xi )2 + (yj − yi )2 –
Major axis
Invariance – – Scale and position
Minor axis
D2
Basic rectangle
Eq. (8.13) and Arb = h · a
Position
–
Rectangularity
A/Arb
Scale and position
Eccentricity
D2 /D1
Scale and position
Elongation
h/a
Scale and position
Elongation 2
elong = A/s2
Scale and position
8.3.2 Statistical Moments The concept of momentum derives from physics and then the moments of a function have been used in mathematics and probability theory [8]. The description of the contours and the shape of a region can be characterized by the properties of the moments. The set of moments (also called raw moment {Mm,n } of a limited function f (x, y) of two variables is given by ∞ ∞ Mm,n = xm yn f (x, y) m, n = 0, 1, 2, . . . (8.14) −∞ −∞
where m and n take nonnegative values and m + n indicates the order of the moments. The moments calculated with the (8.14) where the bases are given by xm yn are the simplest moments associated with the function f (x, y), called geometric moments. By virtue of a uniqueness theorem [8], it is shown that, if f (x, y) is continuous in traits and with values not null in the xy domain, then there are a set of unique moments {Mm,n } for the function f (x, y) and only f (x, y) has associated these particular set of moments. It follows that f (x, y) is completely described by {Mm,n } of lower order. The adaptation of the moment concept for digital images can be done by recalling the statistical nature of the image formation process (see Sect. 6.9). In fact, the gray level associated with a pixel is not characterized by a unique value but by a probability density function p(z(x, y)) that indicates how frequently we observe the gray level z(x, y) in the image. Therefore, in the context of digital images, we can consider the set of moments of the probability density function (pdf) to characterize the shape of an object in the image and the image at levels of gray itself.
8.3 Description of the Forms
363
For a digital image f (x, y) the moment of order m + n is defined as follows: Mm,n = xm yn f (x, y) m, n = 0, 1, 2, . . . (8.15) x
y
where x, y are the coordinates of the pixels on the whole image. The moments expressed by (8.15) are used to define a variety of physical measures of an object and a limited number of moments can univocally characterize the shape of an object in conditions of invariance with respect to position, orientation and change of scale. The zero-order moment M0,0 by definition represents the totality of the image intensity. If f (x, y) is a binary image with the pixels of the object represented with f (x, y) = 1 on the background of zero value, there is only one geometric moment M0,0 of zero order: f (x, y) (8.16) M0,0 = A = x
y
which corresponds to the area of the object. In this case, f (x, y) intrinsically represents the shape of the object independently of the gray levels (assumed uniform and of value 1) and this form can be uniquely described by a set of moments. The moments of the first order M1,0 and M0,1 provide the axial moments, respectively, relative to the x and y axis.
8.3.2.1 Translation Invariance As we will see in the following, it will be useful to define moments with the origin of the reference system positioned in the centroid of the image intensity. For the moment theorem (static) there is equivalence of the moments between the system that concentrates all the intensity M0,0 in a single point (xc , yc ) (center of the system) and the sum of the moments of the individual pixels f (xi , yi ) always calculated with respect to the same coordinate axis. Follows: M0,0 xc = xf (x, y) M0,0 yc = yf (x, y) (8.17) x
y
x
y
that allow us to calculate the coordinates xc and yc : M1,0 M0,1 x y xf (x, y) x y yf (x, y) xc = = = yc = M0,0 M0,0 x y f (x, y) x y f (x, y)
(8.18)
The coordinates (xc , yc ) define the centroid (according to the terminology of classical mechanics also known as the center of mass or center of gravity) of the image system.
364
8 Representation and Description of Forms
The moments can be redefined with respect to the centroid (xc , yc ) thus obtaining the central moments μm,n invariant to the translation, data from (x − xc )m (y − yc )n f (x, y) m, n = 0, 1, 2, . . . (8.19) μm,n = x
y
8.3.2.2 Scale Invariance In many applications there is a need to observe objects from different distances. This affects the shape measurements that must be invariant with respect to the change in scale of the observed object. Under these conditions we can consider that the coordinates of the object are transformed into: x = αx and y = αy and consequently the object is transformed by a scale factor α with the central moments modified by μm,n =
μm,n m+n+2 α
m, n = 0, 1, 2, . . .
(8.20)
The central moments may be normalized with respect to the zero order moment μ0,0 as follows: ηm,n =
μm,n (μ0,0 )γ
with γ = (m + n + 2)/2
m, n = 0, 1, 2, . . .
(8.21)
where ηm,n are the central moments normalized invariant with respect to the change in scale. The normal nonscaled central moments are instead: μm,n ϑm,n = m + n = 2, 3, . . . (8.22) (μ0,0 )γ
8.3.2.3 Object Orientation Invariance From the results obtained so far, we have that the zero order moment given by the (8.16) represents the area for the binary images while, for the images f (x, y) at the gray level M0,0 is integrated optical density (i.e., the total mass of an image). Therefore, for a gray level image, moments can represent in addition to the shape of an object also the distribution of density of the same and consequently, the invariant moments can characterize different objects. Returning to the central moments of the first order, we observe that they have zero value, in fact, we have μ1,0 = (x − xc )1 (y − yc )0 f (x, y) x
=
y
x
y
xf (x, y) − xc
x
y
f (x, y)
8.3 Description of the Forms
365
= M1,0 −
M1,0 =0 M0,0
and similarly we have that μ0,1 = 0. Let us now analyze the meaning of the three central moments of the second order, given by μ2,0 = (x − xc )2 f (x, y) μ0,2
x
y
x
y
x
y
= (y − yc )2 f (x, y)
μ1,1 =
(8.23)
(x − xc )1 (y − yc )1 f (x, y)
In Eq. (8.23) there are terms in which f (x, y), the value of the gray level of the pixels (which in analogy to classical mechanics represents the density of the object), is multiplied by the square of the distance from the center of mass (xc , yc ). From the statistical point of view, the central moments of the second order, μ2,0 and μ0,2 , express a measure of the variance, that is, of the distribution of intensity with respect to the origin or with respect to the relative averages. The moment μ1,1 represents the measure of covariance. From the mechanical point of view, the central moments μ2,0 and μ0,2 represent the moments of inertia3 of the object, respectively, with respect to the x and y axis. Let us now see how the central moments of the second order vary as the direction of the axes coordinated around the center of mass changes. Suppose you have the origin of the coordinate axes x and y in the center of mass (xc , yc ) and for simplicity we calculate the central moments of the second order for generic pixels in the positions (xi , yi ), of mass mi , belonging to the object (see Fig. 8.19a). By rotating the coordinate axes of an angle θ , the new pixel coordinates (xi , yi ) in the new reference system are expressed by the transformation equations for the rotation (8.13), which replaced in the expressions of the moment of inertia we can calculate the central moment of inertia μˆ 2,0 with respect to the new x axis and according to the central inertia moments μ2,0 , μ1,1 and μ0,2 . The central moment of inertia for a generic mass mi relative to the x axis rotated by θ is given by the relation μˆ 2,0 (θ ) = mi (yi )2 i
=
mi (yi cos θ − xi sin θ )2
i
3 The moment of inertia is a measure of the inertia of the body when its rotational velocity changes,
that is, a physical quantity useful to describe the dynamics of an object rotating around an axis. This magnitude is defined as the second moment of mass with respect to the position. In the case of axis of rotation, a system consisting of n point-like objects with masses mi distant di from a fixed z the moment of inertia of this system, with respect to the z axis, is defined by Iz = ni=1 mi di2 .
366
8 Representation and Description of Forms
=
mi (yi2 cos2 θ + xi2 sin2 θ − 2xi yi cos θ sin θ )
i
= μ2,0 cos2 θ − 2μ1,1 cos θ sin θ + μ0,2 sin2 θ
(8.24)
Proceeding in a similar way we obtain the moment of inertia μˆ 0,2 calculated with respect to the y axis and the moment of inertia μˆ 1,1 calculated with respect to both the new axes. These moments are: μˆ 0,2 (θ ) = μ2,0 sin2 θ + 2μ1,1 cos θ sin θ + μ0,2 cos2 θ μˆ 1,1 (θ ) =
1 (μ2,0 − μ0,2 ) sin 2θ − μ1,1 cos 2θ 2
(8.25)
(8.26)
having remembered the trigonometric duplication formulas: cos 2θ = cos2 θ − sin2 θ and
sin 2θ = 2 sin θ cos θ
Now let us highlight the following. By adding a member to a member of the Eqs. (8.24) and (8.25), i.e., moments of inertia with respect to the coordinate axes, we obtain: (8.27) μˆ 2,0 (θ ) + μˆ 0,2 (θ ) = μ2,0 + μ0,2 = costante The (8.27) informs us that the sum of the moments, calculated before and after the rotation, does not change (invariance to the rotation). This is motivated by the fact that the sum represents the polar moment with respect to the origin of the coordinate system and does not change with the variation of the rotation θ . By dividing, instead, both members of the (8.27) for cos 2θ and for a certain value of θ , we have the cancelation of the moment of inertia μˆ 1,1 (θ ), satisfying the following relation: 2μ1,1 2μ1,1 1 (8.28) tan 2θ = =⇒ θ = arctan μ2,0 − μ0,2 2 μ2,0 − μ0,2 In essence, in correspondence with the angle of rotation θ , defined by (8.28), if the central moments of the second order are calculated, the principal central axes of inertia are identified with respect to which the moment μˆ 1,1 is zero [9]. They correspond to the moments of inertia μˆ 2,0 (θ ) and μˆ 0,2 (θ ) which represent the extreme values (maximum and minimum) of the inertia function of the object with respect to the axes passing through the center of mass (xc , yc ). In Eq. (8.28), θ is the orientation of the principal axis with respect to the x-axis with values in the interval (−π/4, π/4). The angle θ does not guarantee the uniqueness of the orientation since there is an indetermination of π/2. This ambiguity can be solved by imposing the constraint μ3,0 > 0 with the moment of the third order. If the object has an elongated shape with a well-defined principal axis (see Fig. 8.19b) it is possible to calculate the orientation angle θ of the object as the inclination of the minimum inertial axis with respect to the x-axis. In other words,
8.3 Description of the Forms
367
(a)
(b)
(c)
y y’ yi
y’i
y
y y’
y’
mi
θ
x’i
x’
ρy (xc,yc)
ρx
θ
x’ x
ρy (xc,yc)
ρx
θ
x’ x
θ (xc,yc)
x
xi
Fig. 8.19 Geometrical moments invariant to rotation: a Rotation of the coordinate axes with origin in the center of gravity; b Binary image and corresponding central ellipse of inertia; c Object represented by the contour and approximation with central ellipse of inertia
the minimum axis of inertia coincides with the principal axis of the object and this generally makes it possible to find the orientation of the object itself with the central moments of the second order. An intrinsically symmetrical object has the principal axes of inertia always coinciding with the symmetry axes. If the axes originate in the center of mass and the axis of symmetry is superimposed with the coordinate axis y the points of the object are from this equidistant from both sides with coordinates of equal and opposite sign value. This explains the cancelation of the moment μˆ 1,1 and the coincidence between principal axes of inertia and axes of symmetry. The value of the central inertia principal moments μˆ 2,0 (θ ) and μˆ 0,2 (θ ) can be obtained as a function of μ2,0 and μ0,2 , according to (8.28) in correspondence with θ for which it is null μˆ 1,1 . Starting from the Eqs. (8.24), (8.25) and (8.28) through appropriate steps (using the trigonometric functions of duplication and power reduction) we obtain the following expressions: 1 μ2,0 + μ0,2 + (μ2,0 − μ0,2 )2 + 4μ21,1 2 2 1 μ2,0 + μ0,2 − = (μ2,0 − μ0,2 )2 + 4μ21,1 2 2
μˆ 2,0 = μˆ 0,2
(8.29)
Another property of the moments of inertia4 concerns the correspondence between central inertia radius and semiaxes of an ellipse, known as the central ellipse of inertia useful to represent the shape of an object (see Fig. 8.19c). This property is used in image processing [10] to approximate the original (elongated) shape of an object to an elliptical-shaped region, with pixel at constant intensity, thus having mass and second-order moments equivalent to those of the original shape. By indicating with ρx and ρy the semiaxes of the ellipse, respectively, the major and minor, called Radius of gyration or gyradius and indicated with ρx and ρy , they are the ideal distances to which one can think of concentrating the intensity of the image to obtain the same moment of inertia of the elementary components (the pixels) located in their original position. In fact, calculating with respect to the x and y axes we have μ2,0 = μ0,0 ρx2 and μ0,2 = μ0,0 ρy2 where in this case μ0,0 indicates the total intensity or the area of the object in the case of binary image.
4 Also
368
8 Representation and Description of Forms
corresponding to the main central inertia radius, according to the definition of radius of inertia (note 4), these semiaxes are determined by the second order moments. With the following formulas:
μˆ 2,0 μˆ 0,2 ρy = (8.30) ρx = μ0,0 μ0,0 The equation of the central inertia ellipse is given by y2 x2 + =1 ρx2 ρy2
(8.31)
If, for example, we consider an object (in a binary image), with a rectangular base shape b and height h, with the barycentric axes parallel to the principal axes of inertia we have the area and inertia moments as follows: μ0,0 = bh; μ2,0 = μˆ 2,0 = bh3 /12 and μ0,2 = μˆ 0,2 = b3 h/12 In this case, the moments of inertia and the principal moments of inertia coincide and we can calculate directly with the Eq. (8.30) the radius of inertia obtaining:
h b μˆ 2,0 bh3 μˆ 0,2 b3 h ρx = =√ =√ = = ρy = μ0,0 12bh μ0,0 12bh 12 12 which correspond to the major and minor axes of the central inertia ellipse. Since the axes of symmetry coincide with the principal axes, by definition the moment μ1,1 = 0 and applying the (8.28) we have tan 2θ =
2μ1,1 = μ2,0 − μ0,2
2·0 b3 h−bh3 12
= 0,
therefore, the orientation of the main axes x and y results as follows: 2θ = 0◦ =⇒ θ = 0◦ axis x 2θ = 180◦ =⇒ θ = 90◦ axis y It is pointed out how the shape of the object can be characterized by the parameters (semi-axis ρx , ρy position (xc , yc ) in the image and orientation θ ) of the central ellipse of inertia without performing operations of rotation and resampling of the image (see Fig. 8.20). In addition to the elementary descriptors given in Sect. 8.3.1 other descriptors can be derived, or reformulated, based on the moments of the second order. For example, using the main moments of inertia, a measure of expansion (spreadness) of the shape and a new measure of the Elongation are given, respectively, from the following
8.3 Description of the Forms
369
Pencil: Orientat. 0° Pincer: Orientat. 25° Eccentric. 0.95 Eccentric. 0.95 Max axis 236.5 Max axis 258 Min axis 8.5 Min axis 84 Oval: Orientat. -90° Eccentric. 0.6 Max axis 82 Min axis 64
A.Wench: Orientat. 4° Eccentric. 0.97 Max axis 278 Min axis 48
Fig.8.20 Parameter calculation (center of mass, eccentricity, major and minor axis, and orientation) of the central inertia ellipse to approximate objects with different shapes (pencil, oval shape, pincer, adjustable wrench)
expressions:
μˆ 2,0 + μˆ 0,2 μ20,0 Spreadness
μˆ 0,2 − μˆ 2,0 μˆ 2,0 + μˆ 0,2 Elongation
where μ0,0 represents the area of the shape. New eccentricity measurements of an object can be expressed in terms of the central moments of the second order and the semiaxes of the inertia ellipse as follows: ρx2 − ρy2 (μ2,0 − μ0,2 )2 + 4μ1,1 μˆ 0,2 = = = (8.32) μ0,0 μˆ 2,0 ρx
8.3.2.4 Invariance for Translation, Rotation, and Scale The moments of higher order of the second considered individually are not significant to derive shape descriptors. The central moments of the third order deriving from (8.19) are: μ3,0 = M3,0 − 3xc M2,0 + 2xc2 M1,0 μ0,3 = M0,3 − 3yc M0,2 + 2yc2 M0,1 μ1,2 = M1,2 − 2yc M1,1 − xc M0,2 + 2yc2 M1,0
(8.33)
μ2,1 = M2,1 − 2xc M1,1 − yc M2,0 + 2xc2 M0,1 From the statistical analysis of the moments μ3,0 and μ0,3 it is possible to analyze the projections of the images to extract shape information relative to the symmetry (compared to the axis of the mean μ3,0 → 0) and to the skewness indicating the degree of drift compared to the average axis. Dimensionless Sx and Sy measurements of this drift can be estimated by normalizing with the positive moments of the second
370
8 Representation and Description of Forms
order with the following expressions: Sx =
μ3,0 (μ2,0 )3/2
Sy =
μ0,3 (μ0,2 )3/2
(8.34)
Another statistical measure, known as kurtosis, can be considered with the fourth order moments μ4,0 and μ0,4 . This measure indicates the level of horizontal or vertical expansion of a normal distribution and at the same time gives a measure of how much it deforms compared to normal. Its best-known measure is based on the Pearson index, given by the ratio between the moments of the fourth order and the square of the variance. The kurtosis coefficients, with respect to the x and y coordinate axes, are calculated with the following expressions: Kx =
μ4,0 −3 (μ2,0 )2
Ky =
μ0,4 −3 (μ0,2 )2
(8.35)
For a normal distribution the coefficients assume the value zero, for values greater than zero the peak of normal tends to be sharp while for values less than zero tends to flatten. The fundamental theorem of the moments was first proposed by Hu [11], based on the theory of algebraic invariants, from which, starting from the functions of the normalized central moments given by the Eq. (8.22), 7 descriptors were defined {φi }i=1,7 which are simultaneously invariant to translation, rotation, and scale change. These descriptors are: φ1 = ϑ2,0 + ϑ0,2 2 φ2 = (ϑ2,0 − ϑ0,2 )2 + 4ϑ1,1
(8.36) (8.37)
φ3 = (ϑ3,0 − 3ϑ1,2 )2 + (3ϑ2,1 − ϑ0,3 )2 φ4 = (ϑ3,0 + ϑ1,2 )2 + (ϑ2,1 + ϑ0,3 )2
(8.38) (8.39)
φ5 = (ϑ3,0 − 3ϑ1,2 )(ϑ3,0 + ϑ1,2 )[(ϑ3,0 + ϑ1,2 )2 − 3(ϑ2,1 + ϑ0,3 )2 ] + (3ϑ2,1 − ϑ0,3 )(ϑ2,1 + ϑ0,3 )[3(ϑ3,0 + ϑ1,2 )2 − (ϑ2,1 + ϑ0,3 )2 ]
(8.40)
φ6 = (ϑ2,0 − ϑ0,2 )[(ϑ3,0 + ϑ1,2 )2 − (ϑ2,1 + ϑ0,3 )2 ]
(8.41)
+ 4ϑ1,1 (ϑ3,0 + ϑ1,2 )(ϑ2,1 + ϑ0,3 ) φ7 = (3ϑ2,1 − ϑ0,3 )(ϑ3,0 + ϑ1,2 )[(ϑ3,0 + ϑ1,2 )2 − 3(ϑ2,1 + ϑ0,3 )2 ] + (ϑ3,0 − 3ϑ1,2 )(ϑ2,1 + ϑ0,3 )[3(ϑ3,0 + ϑ1,2 )2 − (ϑ2,1 + ϑ0,3 )2 ]
(8.42)
These descriptors have been widely used in character recognition applications and complex shapes to distinguish different types of aircrafts. They are computationally
371 45 °
8.3 Description of the Forms
Reduced .25
1.229084048352729e-03 1.229071113542995e-03 1.229073171196376e-03 7.181948475505233e-10 7.194622005071556e-10 7.186868769384324e-10 1.655118155217839e-12 1.689137835076726e-12 1.698845829420989e-12 3.877393380607094e-07 3.900916461526135e-07 3.900946476033310e-07 2.205560995317769e-26 3.873628587993596e-24 -3.317755224361681e-24 -4.085066811689894e-17 -6.736166852373700e-17 -8.741632853768950e-17 1.779579351852067e-24 -4.672396612042285e-24 2.148908681106899e-24
Ro t
at
io
n
Original image
1.229412123890225e-03 6.919402789127334e-10 1.537056282939859e-12 3.890141152994017e-07 2.982393840782520e-26 3.369841983606313e-18 -5.196220662409943e-26
1.229509987476732e-03 6.783068167058137e-10 1.618350847089607e-12 3.652080005982709e-07 -6.150330138001056e-24 -4.463221402191759e-17 2.270260476932227e-24
Rot.45° Shear25° Red.25 Rotation 45°+red. .25
1.218836270879291e-03 7.065286945552272e-10 1.606044520989732e-12 3.587613309707206e-07 -2.279574300112305e-24 -2.699055804278078e-17 1.385490096132804e-24
1.343941517310190e-03 3.212801503549883e-07 1.588267063469533e-12 9.022501918904471e-07 1.366015729777084e-25 7.040919511228392e-16 -1.071675312988979e-24
Fig. 8.21 Test on a gray level image of the invariance of the 7 moments of Hu calculated on the original image, reflected horizontally, reflected horizontally and vertically, reduced by 1/4, rotated by −45◦ , reduced by 1/4 and rotated of −45◦ , and reduced by 1/2 + rotated by −45◦ + sheared horizontally of 25◦
expensive. They are also used to distinguish gray level images subject to change of scale, rotation and translation. Figure 8.21 shows the results of the invariance of the moments of Hu applied on a gray level test image rotated, scaled and reflected. In addition, it is observed that there is no invariance for affine distortions. It is finally highlighted how the first six moments are invariants also compared to the reflection and the seventh moment can be considered to discriminate between reflected image or not. Figure 8.22 shows the results of the invariance of the moments of Hu applied on binary images to describe the shape of different objects. In real applications, these descriptors present some problems: they are very sensitive to noise (also due to the discretization) and to occlusions (a common problem for all the descriptors), especially with high-order moments; they present considerable information redundancy since the bases are not orthogonal; moreover, using the bases with powers of m and n, the calculated high-order moments take on values with a very wide dynamic of several orders of magnitude, causing a strong numerical instability especially for large images. Several versions [12–14] of moment-based descriptors have been proposed in the literature to improve performance from a computational point of view and to make them invariant, in a more general context, with respect to affine transformations [13], rotation (orthogonal moments Zernike [15] and radial Chebyshev moments [16]), variation of gray levels of images [17] and color [18].
8.3.2.5 Contour Moments Shape descriptors can be defined by considering only the pixels of the contour of an object. In this case, a closed contour characterizes the shape of an object by knowing only the Euclidean distance z(i) from the center of mass of each of the N pixels (xi , yi ) of the discretized contour. Moments Mr and the central moments μr of order
372
8 Representation and Description of Forms
3.134236645580423e-03 2.299941643328644e-06 2.994795659346224e-10 1.099349956928044e-06 1.028457069708384e-20 1.040763587228213e-13 2.062963073724925e-21
7.679125328783333e-01 3.846594423959514e-01 6.740235792596242e-03 4.960711561398883e-02 -1.656169536631844e-05 2.425937606397884e-03 -3.231684937090200e-06
6.439068551512639e-04 2.412305420166645e-08 8.881474219140600e-14 1.597945857342205e-07 1.073651219680392e-25 1.373104840308614e-16 -1.272347407999445e-25
9.229755474753536e-03 8.467897097878822e-05 1.272882954767294e-08 1.512770404118388e-08 3.737914902479899e-16 1.002125245670964e-10 3.273183517516647e-18
Fig. 8.22 Descriptors based on 7 Hu moments applied on binary images to characterize 4 objects of different shapes
r are estimated by the following 1D functions: N 1 [z(i)]r N
(8.43)
N 1 μr = [z(i) − M1 ]r N
(8.44)
Mr =
i=1
i=1
The normalized moments Mr and the normalized central moments μr are defined as: 1 N r Mr i=1 [z(i)] N = (8.45) Mr = 2 r/2 (μ2 )r/2 { N1 N i=1 [z(i) − M1 ] } 1 N r μr i=1 [z(i) − M1 ] N μr = = (8.46) 2 r/2 (μ2 )r/2 {1 N i=1 [z(i) − M1 ] } N
The normalized moments Mr and μr are invariant to translation, rotation and scale. Other shape descriptors {Fj }j=1,4 , based on the moments, which are less sensitive to the noise of the contour, are proposed in [19] 1/2 {1 μ F1 = 2 = N M1
F2 = F3 = F4 =
N
2 1/2 i=1 [z(i) − M1 ] } 1 N i=1 z(i) N N 1 3 μ3 i=1 [z(i) − M1 ] N = 2 3/2 (μ2 )3/2 { N1 N i=1 [z(i) − M1 ] } N 1 4 μ4 i=1 [z(i) − M1 ] N = 2 2 (μ2 )2 { N1 N i=1 [z(i) − M1 ] } μ5
(8.47) (8.48) (8.49) (8.50)
In analogy to the representation of an object as a region in a binary image, also in the representation with the contour, considering the coordinates of the pixels
8.3 Description of the Forms
373
as a statistical distribution, it is possible to calculate the main central axes of the contour and approximate the shape of the contour by calculating the central ellipse of inertia. We have previously described the method of the central principal axes which requires the constraint of the cancellation of the moment μˆ 1,1 calculated with respect to the central principal axes which are rotated by the angle θ with respect to the reference axes x − y of the contour (see Fig. 8.19c). For a closed contour, with pixels {xi , yi }i=0,N −1 of intensity f (xi , yi ) = 1, the center of mass is calculated as follows: N −1 1 (xi + xi+1 )(xi yi+1 − xi+1 yi ) xc = 6A i=0 (8.51) N −1 1 yc = (yi + yi+1 )(xi yi+1 − xi+1 yi ) 6A i=0
where A is the area of the contour given by N −1 1 A= (xi yi+1 − xi+1 yi ) 2
(8.52)
i=0
In this case, the second order central moments of inertia of the contour, according to the (8.19), are defined by: μ2,0 = μ0,2 = μ1,1 =
N −1 1 (xi − xc )2 N
1 N 1 N
i=0 N −1 i=0 N −1
(yi − yc )2
(8.53)
(xi − xc )(yi − yc )
i=0
whose inertia matrix J (corresponding to the covariance matrix C) is μ2,0 μ1,1 J= μ1,1 μ0,2
(8.54)
With the eigenvalue method we know (see Sect. 2.10 Vol.II) that given a symmetrical square matrix it is possible to find a scalar λ and a vector v such that Jv = λv
(8.55)
where λ is an eigenvalue of J and v is the associated autovector. In general, if n × n is the dimension of the square matrix, the number of eigenvalues λi is n with n associated eigenvectors. For the inertia matrix J given by (8.54) the eigenvalue
374
8 Representation and Description of Forms
problem (8.55) can be reformulated using the unit matrix I as follows: (J − λI)v = 0
(8.56)
which represents a linear system of two equations in two unknowns v1 and v2 supported by the matrix (J − λI). Since the homogeneous system exists the trivial solution v = 0. The interesting solutions are those that derive λ if the matrix (J − λI) is singular, i.e., its determinant is null (μ − λ) μ1,1 =0 (8.57) det(J − λI) = 2,0 μ1,1 (μ0,2 − λ) The development of the system (8.57) leads to the following characteristic equation: λ2 − (μ2,0 + μ0,2 )λ + (μ2,0 μ0,2 + μ21,1 ) = 0
(8.58)
from which, solving, we obtain the following solutions for the two eigenvalues: 1 (μ2,0 + μ0,2 ) + (μ2,0 − μ0,2 )2 + 4μ21,1 2 1 λ2 = (μ2,0 + μ0,2 ) − (μ2,0 − μ0,2 )2 + 4μ21,1 2 λ1 =
(8.59)
These eigenvalues capture the intrinsic information (characteristic or principal values) of the values expressed by the matrix J. In our case the eigenvalues given by (8.59) coincide precisely with the principal moments of inertia whose directions are given by the v associated eigenvectors. In fact, substituting the eigenvalues λ1 and λ2 in the equation of the eigenvalues (8.55) we obtain the equations5 to calculate the components of each vi autovector associated with an eigenvalue λi . The equation of the eigenvalues (8.55) further informs us that the effect of the matrix of inertia on the vectors is to generate, in this case, the rotated autospace x − y (whose axes are defined by the eigenvectors, respectively, v1 and v2 ) with respect to the original x − y, in which the 3 central moments of inertia μˆ 2,0 , μˆ 0,2 and μˆ 1,1 were calculated. In other words, with the eigenvalues approach, the eigenvalues coincide with the value of the central inertia moments principal μˆ 2,0 , μˆ 0,2 and μˆ 1,1 , calculated in the rotated reference system whose direction is determined by the eigenvectors which essentially give the direction to the eigenvalues (i.e., to the moments, by virtue of the fact that the product λi · vi generates a vector parallel to the autovector vi being λi a scalar). Since the inertia matrix J is symmetrical, it is guaranteed that the principal axes determined by the eigenvectors are orthogonal. The eigenvector v1 , which is associated with the eigenvalue λ1 , is inclined to the angle θ relative to the x axis, and is determined by the Eq. (8.28). 5 It is possible to select the eigenvectors to obtain unique solutions. In fact, it can be verified that for
a certain autovettore vi solution of the equation Jvi = λvi , then also kvi is a solution of the same equation for any value of k.
8.3 Description of the Forms
375
With the approach to the eigenvalues, we have substantially diagonalized the inertia matrix J to derive the principal axes of inertia that are obtained by imposing the constraint μ1,1 = 0. As described in Sect. 2.10 Vol.II, the diagonalization of a square and symmetric matrix J is obtained with the following expression: μˆ 2,0 0 λ1 0 = (8.60) Jˆ = VT JV = 0 λ2 0 μˆ 0,2 where V is the orthogonal matrix that includes in each column the vi eigenvectors of the inertia matrix J, and Jˆ is the calculated central principal inertia matrix. The shape of the contour can be described by the principal inertiaellipse which is centered on the principal axes and has a larger semimajor axis ρx = λ1 /μ0,0 and minor semimaxis ρy = λ2 /μ0,0 . Objects represented by the contour, with different shapes, can be characterized by different descriptors based on the moments of the principal inertia ellipse calculated in the previous paragraph.
8.3.3 Moments Based on Orthogonal Basis Functions The normal moments described in the preceding paragraph have the set of basic functions xm yn nonorthogonal and as the order grows rapidly, considerably increases the high-precision computational load, considering the high dynamics of the values of moments. Furthermore, moments capture redundant global information. The differentiable moments with a set of orthogonal basic functions require less computational and low precision calculation, while maintaining the level of characterization of the shape of an object unchanged. The moments of Zernike (ZM) [10,20] project the image on a complex plane through a set of orthogonal complex base functions. The complex polynomials of Zernike are: Vnm (x, y) = Vnm (r cos θ, r sin θ ) = Rnm (r)ejmθ where Rnm (r) is the orthogonal radial polynomial: ⎧ (n−|m|)/2 (n−s)! ⎪ r n−2s (−1)s ⎨ s=0 1 1 s! 2 (n+|m|)−s ! 2 (n−|m|)−s ! Rnm (r) = ⎪ ⎩0
(8.61)
if n−m is even
, if n−m is odd.
(8.62) that is, n = 0, 1, 2, . . . and 0 ≤ |m| ≤ n. r is the radius from the pixel (x, y) to the centroid of the region and θ is the angle between r and the axis of x. Zernike polynomials are a set of complex orthogonal polynomials defined on a unitary disk, i.e., x2 + y2 ≤ 1. The complex moments of Zernike of order n and repetition m are defined as follows: Znm =
n + 1 f (r cos θ, r sin θ ) · Rnm (r) · ejmθ π r θ
r≤1
(8.63)
376
8 Representation and Description of Forms
The Zernike moments module is invariant to rotation. To make them invariant to the translation, before the ZM calculation, the original image f (x, y) is translated with the origin in the centroid of the region as done for the normal moments and the coordinates of the pixels are transformed in the interval of the unitary disk. The invariance at the scale takes place by normalizing the coordinates (x /a, y /a) of pixels, where (x , y ) are the translated coordinates with respect to the centroid while a = β/M0,0 is the factor of normalization of scale with respect to a predetermined value of β (M0,0 is the normal moment of order zero) [21]. Essentially, after translating the image into the centroid, the invariance at the scale is achieved by enlarging or reducing the region in such a way that the normal zero-order moment M0,0 is placed equal to a predetermined β reference value. The advantage offered by the orthogonal basic functions of the ZM allows to capture the minimal redundant information. In addition, ZM reproduces the shape details of the region with high resolution and better reconstructs the original image with respect to normal moments. The orthogonality also allows to analyze the contribution of each single moment ZM. The reconstruction of the image happens adding the contribution of the individual moments of each order as follows: fˆ (r, θ ) =
n max
Znm Vnm (r, θ )
(8.64)
n=0 m
Once the reconstructed image has been obtained, it can be evaluated through a metric (Euclidean distance, or Hamming distance, ...) the approximation level of the reconstructed image and evaluate the minimum order n of acceptability with respect to which the reconstructed image fˆn is accurate. In Fig. 8.23 shows an example of image reconstruction by applying the complex moments of Zernike with different order n = 5, 10, 20, 30, 40. In the field of image processing, further orthogonal moments have been proposed with the aim of improving the discriminating capacity and minimizing the level of redundancy of the information captured by the set of moments. In other words, the various orthogonal moments proposed are not based on advancement of the theory but for the appropriate numerical property of the various orthogonal polynomials
Images
n=5
n=10
n=15
n=20
n=30
n=40
Fig. 8.23 The two test images are reconstructed using the complex moments of Zernike with different values of the order n
8.3 Description of the Forms
377
used (Legendre, Chebyshev, Zernike, pseudo-Zernike, ...) from which the name of the moments derive. Among the best known we mention the moments of Chebyshev [22], the radial moments of Chebyshev [23] and the pseudo-Zernike moments [20,24].
8.3.4 Fourier Descriptors The Fourier Descriptors—FD is an alternative approach to describe the complex shape of an object, considering its closed contour. They are also known as spectral descriptors because the description of the contour or of a region occurs in the spectral domain (as for the wavelet descriptors—WD). Now let’s see how to derive a 1D function [25] from the coordinates of the points of the contour C = {x(n), y(n)}n=0,N −1 , normally called shape signature. Imagine that you have a closed contour (see Fig. 8.24a) in a complex plane, that is, rotated counterclockwise at a constant speed. The dynamics of the path can be described in relation to time t from a complex function z(t). If the velocity is such that the entire path takes a time of 2π , then we have that this complex function is also periodic with period 2π traversing the contour several times. With this artifice we have obtained the function z(t) that can be represented by the expansion in the Fourier series. To describe the shape of the contour it is more useful to imagine the complex plane represented by the x axis as the real axis and the y axis to be the imaginary axis. In this way, the pixels of the contour C = {x(n), y(n)}n=0,N −1 are projected in the complex plane and represented by the discrete complex periodic function z(n) = x(n) + jy(n) n = 0, 1, . . . , N − 1 (8.65)
(a) y
(b) (x(0),y(0))
(x(N-1),y(N-1))
y
(x(m),y(m))
0
x
0
x
Fig. 8.24 Fourier descriptors using parametric contour representation: a Complex plane derived from Cartesian coordinates (x(n), y(n)) of the closed contour; b Configuration of the contour with a different starting point shifted by m pixels with respect to the initial reference
378
8 Representation and Description of Forms
where the contour is closed. The discrete Fourier transform (DFT) (see Sect. 9.11.1) of the 1D z(n) function is given by Z(u) =
N −1
nu
z(n)e−j2π N
u = 0, 1, . . . , N − 1
(8.66)
n=0
The complex coefficients Z(u) (discrete complex spectrum characterized by the module and phase associated with the frequency u) are called complex Fourier descriptors CFD, the value of which depends on the shape of the contour C and the starting point from which it was discretized. Filtering operations with low pass filters can be done in the spectral domain to reduce any noise without altering the contour’s shape characteristics. The low frequencies of the spectrum (i.e., low-order coefficients) characterize the shape base of the object. The effect of this filtering on CFD descriptors can be verified by rebuilding the contour with the discrete inverse Fourier transform, given by z(n) =
N −1
nu
Z(u)ej2π N
n = 0, 1, . . . , N − 1
(8.67)
u=0
where the complex exponential term represents the oscillatory component in the spectral domain of the whole signal, that is, the basic function. A more compact representation of the contour is obtained by considering only the first M coefficients sufficient to describe a good approximation of the contour. From the Fourier descriptors (even starting from a limited number, for example, 10 coefficients) it is possible to perform the inverse transformation to recover the coordinates (x, y) of the contour pixels and verify if the shape of the object is correctly reconstructed as that of origin. From the comparison of Fourier descriptors, we can verify if two objects have the same shape. For this purpose, the Fourier descriptors must be invariant with respect to the translation, rotation, scaling and starting point of the digitization of the contour. This is achieved by normalizing the CFDs obtained from (8.66) through various methods. The first method is based on the properties of the Fourier transform [26,27] so that the characteristics of the invariance of the CFD descriptors can be easily verified.
8.3.4.1 Translation Invariance It can be easily verified that the effect of a translation T = x + jy, in the complex plane of the contour z(n) + T , produces in the spectral domain only a modification ˆ ˆ of the first coefficient Z(0) ← Z(0) + N · T leaving all other Z(u) = Z(u), (u = 0) unchanged. In fact, applying to (8.66) the translation T to the contour coordinates is
8.3 Description of the Forms
379
obtained ˆ Z(u) =
N −1
nu
(z(n) + T )e−j2π N =
n=0
N −1
nu
z(n)e−j2π N +
n=0
Z(n)
N −1 n=0
nu
Te−j2π N
(8.68)
effect of T
from which emerges ˆ Z(0) = Z(0) +
N −1
Te−j2π
n·0 N
= Z(0) + N · T
ˆ Z(u) = Z(u) with u = 0, (8.69)
n=0
ˆ that is, for u = 0 the translation T affects only the first CFD Z(0), while for u = 0 the second summation of (8.68) is always zero, being a summation of a periodic signal, so the original CFDs remain unchanged.
8.3.4.2 Scale Invariance The change in scale of a α factor of the contour αz(n) produces the same change of scale on the original CFD coefficients ˆ Z(u) ← αZ(u)
(8.70)
as it is easy to verify substituting z(n) with αz(n) in (8.66). As a result of linearity, the inverse transform will produce the coordinates of the contour pixels multiplied by the same factor α. In order to have the CFDs independent from the scale it is useful to normalize them making the largest descriptor equal to 1. The Fourier descriptor ratios of two similar objects are constant real values equal to the ratio of the dimensions of the objects.
8.3.4.3 Rotation Invariance If a contour is rotated in the spatial domain by an angle θ with respect to the axis of x (i.e., ejθ z(n)), by virtue of the translation theorem of the Fourier transform, the CFD descriptors are multiplied by the phase constant ejθ ˆ Z(u) ← Z(u)ejθ
(8.71)
Analogy with the property of the scale change where the coefficients are multiplied by the constant scaling factor.
8.3.4.4 Invariance with Respect to the Starting Point Different starting points (see Fig. 8.24b) influence the order of the pixels in the contour C = {x(n), y(n)}n=0,N −1
380
8 Representation and Description of Forms
and consequently modify the order of the coordinates in the periodic function z(n) and associated descriptors Z(u). If the starting point is moved clockwise by m pixels, in the spectral domain, for the modulation property, the following modification of the descriptors occurs: um (8.72) Zˆ ← Z(u)ej2π N In fact, applying the Fourier transform (8.66) to the periodic function z(n + m), we obtain F{z(n + m)} = =
N −1 n=0 N −1
nu
z(n + m)e−j2π N
z(k)e
k=0
j2π (k−m)u N
=e
j2π mu N
N −1
ku
z(k)e−j2π N = ej2π
mu N
Z(u)
k=0
setting k=n+m
(8.73) having indicated the Fourier transform operator with F. It is observed that for m = 0 the coefficients become identical. Finally, it is pointed out that the change of the starting point does not influence the Fourier spectrum but only alters the distribution of energy between its real and imaginary part, that is, it modifies the phase, that is, displaced proportionally with respect to the magnitude of the displacement itself.
8.3.4.5 FD Descriptors with Distance Function An alternative way to obtain Fourier descriptors, simultaneously invariant to translation and rotation, is to apply the Fourier transform to the distance function from the centroid [28–31]. This 1D function r(n) is derived from the distance of each pixel of the contour from the centroid (xc , yc ): (8.74) r(n) = [x(n) − xc ]2 + [y(n) − yc ]2 where xc =
N −1 1 x(n) N
yc =
n=0
N −1 1 y(n) N
(8.75)
n=0
The centroid can also be calculated with Eqs. (8.51) and (8.52). The function r(n) being defined by subtracting the coordinates of the pixel of the contour from those of the centroid is invariant to the translation. The Fourier transform of the distance function r(n) is R(u) =
N −1 n=0
nu
r(n)e−j2π N
u = 0, 1, . . . , N − 1
(8.76)
8.3 Description of the Forms
381
The Descriptors R(u) resulting from the distance function r(n) are invariant only to the translation and rotation. They must, therefore, be elaborated to make them invariants also to the change of scale and of the starting point.
8.3.4.6 Fourier Descriptors Normalization In the preceding paragraphs we analyzed the properties of FD descriptors. At this point we can define a general formulation to make them invariants simultaneously to the translation, rotation, scale and change of the starting point with respect to the contour considered reference (original). By combining the properties of the FD descriptors previously analyzed and indicating with Zo (n) the descriptors of the origˆ inal contour, the general formula of the Fourier descriptors Z(n), of the transformed contour, is given by ˆ (8.77) Z(u) = ejum · ejθ · α · Zo (u) where m is the number of pixels whose starting point has been shifted, θ is the angle of rotation of the contour centered on the centroid, and α is the scale factor. A normalized NFD(n) version of the descriptors expressed by (8.77) is obtained by dividing them with the first descriptor NFD(u) =
ˆ Z(u) ejum · ejθ · α · Zo (u) = jm jθ ˆ e · e · α · Zo (0) Z(0) Zˆ o (u) [j(u−1)m] = = NFDo (n)e[j(u−1)m] e Zˆ o (0)
(8.78)
where NFDo (u) are the normalized descriptors of the original contour. The normalized descriptors NFD(u) of the transformed contour (in terms of translation, scale, rotation, and starting point) and those of the original contour (reference) NFDo (u) differ to less than the exponential e[j(u−1)m] . If the phase information is ignored and only the module of descriptors is considered, then we have that modules |NFD(u)| and |NFDo (u)| are identical. Follows, that |NFD(u)| are invariants to the translation, scale, rotation, and change of the starting point. The final descriptors to be considered to characterize the shape of an object are the modules of the normalized descriptors |NFD(u)| represented by the set {|NFD(u)|, 0 < u < N }. The similarity between the shape of a reference object and the one in question can be evaluated using the Euclidean metric, i.e., by calculating the distance d between the modules of the relative normalized descriptors. Having used the distancer(n) function, which is real, only N /2 distinct NFD descriptors are to be considered. With respect to moments, the Fourier NFD descriptors are calculated more efficiently and allow a more compact coding of the object. In fact, the contour can be reconstructed considering only a few descriptors (10 to N /2 NFD are sufficient to approximate the contour) in relation also to the type of geometric accuracy to be achieved. All the energy of the contour is concentrated on the low frequency descriptors. With respect to moments, they seem more effective in discriminating the shape
382
8 Representation and Description of Forms
of objects and overcoming the sensitivity to noise. The best results are obtained using the centroid distance function. The NFDs are not normally used to recognize occluded objects, an approach is described in [32]. Several variants of Fourier descriptors are reported in the literature to better describe the geometric properties of objects [33] and the local information of the contour. For the latter aspect in [34,35] the elliptic properties of the Fourier coefficients are reported to analytically describe a closed contour through the elliptic Fourier descriptors (EFD).
8.3.4.7 Region-Based Fourier Descriptors Fourier descriptors developed to characterize a region shape are called Generic Description Fourier-GFD. The GFD descriptors derive from the application on an image of the Modified Polar Fourier Transform—MPFT [36,37]. In essence, to implement the MPFT, the polar image (defined in the polar space) is treated as a normal rectangular image in the Cartesian space. Therefore, the shape of an object in the image f (x, y) is described by calculating the MPFT defined by: PF(ρ, φ) =
r
i
r 2π i φ f (r, θi ) exp j2π ρ + R T
(8.79)
where (r, θi ) are the polar coordinates expressed relative to the center of mass (xc , yc )(calculated with the (8.75)) of the region, R and T are, respectively, the radial and angular resolutions. In particular, it is 0 ≤ r < R (calculated with the (8.74)), θi = i(2π/T ) with 0 ≤ i < T , 0 ≤ ρ < R and 0 ≤ φ < T . The physical meaning of ρ and φ is the analogue of the spectral variables u and v in the 2D Fourier transform or represent, respectively, the radial frequency and the angular frequency. The values of ρ and φ capture the shape characteristics of the region in low frequencies. The descriptors calculated by (8.79) are invariant to translation because the coordinates are defined relative to the centroid which is the origin of the polar space. To derive GFD descriptors that are invariant to rotation and scale, the PF coefficients are normalize d as follows: GFD =
|PF(0, n) |PF(m, 0) |PF(m, n) |PF(0, 0) |PF(0, 1) , ,..., ,..., ,..., A |PF(0, 0)| |PF(0, 0)| |PF(0, 0)| |PF(0, 0)|
(8.80) where A is the area of the circular boundary where the object resides in the polar image, m indicates the maximum number of radial frequencies and n the maximum number of selected angular frequencies. The maximum values of m and n define the resolution with which we want to characterize the region. It is observed that the normalization of the first coefficient occurs with the area A while all the other coefficients are normalized with the ratio between their module and that of the first coefficient. With the acquisition of shape characteristics in both radial and circular directions, the reconstructed shapes are more accurate. The calculation of the descriptors GFD
8.3 Description of the Forms
383
is simpler than those derived from the ZMD moments considered that the latter are defined on a unit disk.
8.3.4.8 Implementation Considerations The contour of the object must be digitized uniformly with a constant pitch to more accurately approximate its geometric structure. Not always the contour, after digitization, accurately reproduces the shape of an object. The Fourier transform, based on the FFT, imposes that the number of points of the contour is a power of 2. The normalization process can introduce coefficient quantization errors that can significantly influence the phase. A possible error on the orientation and the starting point, due to the digitization and normalization, introduces significant variations in the reconstruction of the shape of the object. This problem can be obtained despite the normalization of the coefficients with respect to the one with the highest value. In Fig. 8.25 the normalized descriptors of Fourier are given for the recognition of three different objects. The contour considered to characterize the objects is the outermost one (the contours of the details inside the objects have been neglected). For the adjustable wrench object, in addition to the reference NFD descriptors, 8 NFD descriptors were calculated by acquiring the contour of the same, first reduced by 0.5, then rotated by 30◦ and enlarged by factor 1.5 and finally rotated at the same angle and reduced by 0.5 obtaining NFD descriptors equivalent to the reference ones demonstrating the invariance to translation, rotation, scale change, and starting point. Adjustable wrench
Pincer
T-shape
Rotation 30°
Reduced 0.5 Rotation 30° Reduced 0.5 Rotation 30°
Fig. 8.25 Example of object recognition with Fourier descriptors. The contour of three objects (adjustable wrench, pincer, and T shape) is digitized in 702, 1116 and 1252 pixels, respectively, and calculated the first 8 normalized Fourier coefficients NFD (Normalized Fourier Descriptors). The NFDs of the first object are then calculated, first reduced by 0.5, then rotated by 30◦ and magnified by the factor 1.5 and then reduced by 0.5 thus obtaining FD descriptors equivalent to the reference ones, demonstrating the invariance to translation, rotation, change of scale and starting point, and highlight the diversity with respect to the NFDs of the other two objects pincer and T shape (NFD reported in Table 8.2)
384
8 Representation and Description of Forms
Table 8.2 8 Fourier descriptors (missing the first and last not significant) for the objects of Fig. 8.25. The last column indicates the Euclidean distance between the coefficients of the first line (the adjustable wrench object) and the coefficients of the other lines which highlights the equivalence of the descriptors for the same object while highlighting the measure of diversity with the other two objects. It is observed that NFDs have a greater diversity between the adjustable wrench object and the very different one with the T shape Object
NFD2
NFD3
NFD4
NFD5
NFD6
NFD7
Diff
A.wrench
0.1533
0.0922
0.0439
0.0376
0.1935
0.6547
–
A.wrench 30◦
0.1564
0.0122
0.0129
0.0591
0.1780
0.6784
0.0086
A.wrench 30◦ M1.5
0.1694
0.0080
0.0085
0.0556
0.1889
0.6766
0.0094
A.wrench 30◦ R0.5
0.1421
0.0113
0.0181
0.0620
0.1676
0.6769
0.0091
A.wrench R0.5
0.2028
0.0167
0.0184
0.0496
0.2173
0.6745
0.0099
Pincer
0.0702
0.0905
0.0344
0.0276
0.0830
0.4943
0.0422
T-shape
0.0315
0.1290
0.0693
0.0162
0.2402
0.0698
0.3616
It also highlights the diversity compared to the NFD of the other two objects, pincer, and shape T, as shown in Table 8.2. In the same table, the last column indicates the Euclidean distance between the coefficients of the first line (the adjustable wrench object) and the coefficients of the other rows relative to the different geometric transformations of the first object to highlight the equivalence of descriptors (distance within the order of 10−3 ) and diversity with the other two objects of 0.04 for the pincer and 0.36 for the T-shape. Not indicated in the table the first coefficient with value 1 (normalization of the scale) and the last NFD, not very significant.
References 1. H. Freeman, On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. EC 10, 260–268 (1961) 2. H.Y. Feng, T. Pavlidis, Decomposition of polygons into simpler components. IEEE Trans. Comput. 24, 636–650 (1975) 3. M.W. Koch, R.L. Kashyap, Using polygons to recognize and locate partially occluded objects. IEEE Trans. Pattern Anal. Mach. Intell. 9(4), 483–494 (1987) 4. J. Matas, J. Kittler, Junction detection using probabilistic relaxation. Image Vis. Comput. 11, 197–202 (1993) 5. M.A. Jayaram, H. Fleyeh, A convex hulls in image processing: a scoping review. Am. J. Intell. Syst. 6(2), 48–58 (2016) 6. E.R. Davies, A.P.N. Plummer, Thinning algorithms: a critique and a new methodology. Pattern Recognit. 14, 53–63 (1982)
References
385
7. T.Y. Zhang, C.Y. Suen, A fast parallel algorithm for thinning digital patterns. Commun. ACM 27(3), 236–239 (1984) 8. S.U. Pillai, A. Papoulis, Probability, Random Variables, and Stochastic Processes (Tata McGraw-Hill, 2002). ISBN 0070486581 9. G.L. Cash, M. Hatamian, Optical character recognition by the method of moments. Comput. Vis., Graph., Image Process. 39, 291–310 (1987) 10. M.R. Teague, Image analysis via the general theory of moments. J. Opt. Soc. Am. 70(8), 920–930 (1980) 11. M.K. Hu, Visual pattern recognition by moment invariants. IRE Trans. Inf. Theory 8(2), 179– 187 (1962) 12. Jan Flusser, Moment invariants in image analysis. Int. J. Comput., Electr., Autom., Control. Inf. Eng. 1(11), 3708–37013 (2007) 13. T. Suk, J. Flusser, B. Zitová, Moments and Moment Invariants in Pattern Recognition (Wiley, 2009). ISBN 978-0-470-69987-4 14. T.H. Reiss, Recognizing Planar Objects Using Invariant Image Features. Lecture Notes in Computer Science (1993). ISBN 978-3-540-56713-4 15. A. Wallin, O. Kubler, Complete sets of complex zernike moment invariants and the role of the pseudoinvariants. IEEE Trans. Pattern Anal. Mach. Intell. 17, 1106–1110 (1995) 16. R. Mukundan, A new class of rotational invariants using discrete orthogonal moments, in Proceedings of the Sixth IASTED International Conference on Signal and Image Processing (2004), pp. 80–84 17. T. Moons, L. van Gool, D. Ungureanu, Affine/photometric invariants for planar intensity patterns, in Proceedings of the 4th ECCV’961, vol. LNCS 1064 (1996), pp. 642–651 18. T. Moons, F. Mindru, L. van Gool, Recognizing color patterns irrespective of viewpoint and illumination, in Proceeding of the IEEE Conference Computer Vision Pattern Recognition CVPR’99. vol. 1 (1999), pp. 368–373 19. L. Gupta, M.D. Srinath, Contour sequence moments for the classification of closed planar shapes. Pattern Recognit. 20(3), 267–272 (1987) 20. C. Teh, R.T. Chin, On image analysis by the method of moments. IEEE Trans. Pattern Anal. Mach. Intell. 10(4), 496–513 (1988) 21. A. Khotanzad, Y.H. Hong, Invariant image recognition by zernike moments. IEEE Trans. Pattern Anal. Mach. Intell. 12(5), 489–497 (1990) 22. P.A. Lee, R. Mukundan, S.H. Ong, Image analysis by tchebichef moments. IEEE Trans. Image Process. 10(7), 1357–1364 (2001) 23. R. Mukundan, Radial tchebichef invariants for pattern recognition, in Proceedings of the of IEEE TENCON Conference (2005), pp. 2098–2103 24. M.S. Kankanhalli, B.M. Mehtre, W.F. Lee, Shape measures for content based image retrieval: a comparison. Pattern Recognit. 33(3), 319–337 (1997) 25. A.C. Kak, A. Rosenfeld, Digital Picture Processing, vol. 2 (Academic Press, London, 1982) 26. B. J¨ahne, Digital Image Processing, 5th edn. (Springer, 2001). ISBN 3-540-67754-2 27. R.E. Woods, R.C. Gonzalez, Digital Image Processing, 2nd edn. (Prentice Hall, 2002). ISBN 0201180758 28. T. Pavlidis, Algorithms for shape analysis of contours and waveforms. IEEE Trans. Pattern Anal. Mach. Intell. 2(4), 301–312 (1980) 29. E. Persoon, K.S. Fu, Shape discrimination using fourier descriptors. IEEE Trans. Syst., Man Cybern. 7, 170–179 (1977) 30. P.A. Wintz, T.P. Wallace, An efficient three-dimensional aircraft recognition algorithm using normalized fourier descriptors. Comput. Graph. Image Process.13, 99–126 (1980) 31. D. Zhang, G. Lu, A comparative study of curvature scale space and fourier descriptors for shape-based image retrieval. Vis. Commun. Image Represent. 14(1), 39–57 (2003) 32. C.C. Lin, R. Chellappa, Classification of partial 2D shapes using fourier descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 9(5), 686–6908 (1987)
386
8 Representation and Description of Forms
33. N. Kiryati, D. Maydan, Calculating geometric properties from fourier representation. Pattern Recognit. 22(5), 469–475 (1989) 34. F. Kuhl, C. Giardina, Elliptical fourier features of a closed contour. Comput. Graph. Image Process. 18, 236–258 (1982) 35. L.H. Staib, J.S. Duncan, Boundary finding with parametrically deformable models. IEEE Trans. Pattern Anal. Mach. Intell. 14(11), 1061–1075 (1992) 36. A.K. Gupta, R.B. Yadava, N.K. Nishchala, V.K. Rastogi, Retrieval and classification of shapebased objects using fourier, generic fourier, and wavelet-fourier descriptors technique: a comparative study. Opt. Lasers Eng. 45(6), 695–708 (2007) 37. D.S. Zhang, G. Lu, Enhanced generic fourier descriptors for object-based image retrieval, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (2002), pp. 3668–3671
9
Image Enhancement Techniques
9.1 Introduction to Computational Levels The digital processing of one or more images, through the different computational levels aims to locate and identify the objects present in the scene. The information associated with the different computational levels vary according to the purpose and the type of algorithms used which, starting from the source data (data acquired with one or more sensors), can produce intermediate results (new images or characteristic data of the objects) until the complete location and identification of the objects in the scene. In relation to the computational level, the operations performed on the data can be classified into three different categories: (a)
Point operators. They put together all the algorithms that perform elementary operations on each pixel of the image without depending on the neighboring pixels. Given an input image II , the point operator Opoint performs only a gray level transformation relative to each pixel, producing a new image of IO (see Fig. 9.1a): (9.1) IO (i, j) = OPoint [II (i, j)]
(b)
Local operators. They group all the algorithms that calculate the new pixel value on the basis of local operations, that is, by considering the pixel intensity value near the pixel being processed in the input image. Given an input image II and considered a square window F(i, j) centered on the input image at the pixel to be processed, the output image IO is obtained by applying the local operator Olocal (see Fig. 9.1b) as follows: IO (i, j) = OLocal {II (ik , jl ); (ik , jl ) ∈ F(i, j)}
(9.2)
For example, algorithms that calculate the local average, or contour extraction, belong to this category of operators. © Springer Nature Switzerland AG 2020 A. Distante and C. Distante, Handbook of Image Processing and Computer Vision, https://doi.org/10.1007/978-3-030-38148-6_9
387
388
9 Image Enhancement Techniques
(a)
(b)
Fig. 9.1 Image processing operators. a Point Operator Scheme and b Local Operator Scheme with 3 × 3 window
(a)
(b) (e) 3000 2500 2000 1500
(c)
(d)
1000 500 0 0
50
100
150
200
250
Fig. 9.2 Application to the original image (a) of the point operator (b), in which a threshold has been applied to each pixel of the original image, if greater than 150 is replaced with white (gray level 255) vice versa with black (level of gray 0). c Original image, with added salt-and-pepper noise, which is filtered with a local operator: a median filter with a 3 × 3 window whose result is the image in (d). In e is shown the histogram of the original image in a which is an example of global information
(c)
Global Operators. They group all the algorithms that extract global information, analyzing all the pixels of the image. Given an input image II and considered a global operator Oglobal (as can be the histogram, or a co-occurrence matrix, etc.), we get in output R (which can be an image, a list, etc.) R = OGlobal [II (i, j)]
(9.3)
In Fig. 9.2 is shown an example of image processing for each type of operator: point, local, and global. Note that for the global operator, in this case, calculates the histogram (Fig. 9.2e) of an image at gray levels, the operator output is a vector normally of 256 elements of which each element contains the number of times that the value of gray k (for k = 0, 1, . . . , 255) is present in the input image. The details of these operators will be described in the following paragraphs.
9.2 Improvement of Image Quality
389
9.2 Improvement of Image Quality The images acquired by any device (scanner, camera, etc.) often have defects caused by the instability of the sensors, the variation of the lighting conditions, and the lack of contrast. Such defects can be mitigated with point operators that appropriately modify the gray levels of the image to improve the visual qualities of the image and increase its contrast. Among the techniques of image improvement, there are two major approaches used: statistical approach of the values of the gray levels of the image, and the spectral approach based on the spatial frequencies present in the image. The transformation of the gray levels can be made with operators that depends only on the gray level value of the input image (point operators Eq. 9.1) or by operators that also consider the position (i, j) of the pixel (local operators Eq. 9.2). The algorithms for contrast and histogram manipulation belongs to the former.
9.2.1 Image Histogram In Sect. 6.9.1 of Chap. 6 the concept of a histogram has been described considering the set of pixels of an image at gray levels deriving from a stochastic process. Figure 9.3a and b, respectively, show the image and the distribution of gray levels present in it. The graph of Fig. 9.3b represents the histogram of gray levels which is a function of the frequency of gray levels present in the image. The frequency H (x) is shown on the ordinate axis, i.e., the number of times the gray level x appears in the image while the x levels are shown on the abscissas. The histogram HI (x) can be seen as a result of the global operator (9.3) applied to an image of input I with the characteristic of determining the population of the pixels for each level of gray x. In this case, in HI (x) the spatial information about how the different gray values are organized in the image is lost. The histogram shown in Fig. 9.3b provides a unique distribution of the population of pixels constituting the image of Fig. 9.3a, but no information is given for how spatially they are distributed in the image pixels. It can be said that the histogram of an image is unique, but no image-histogram bijection can be asserted, in the sense that the inverse situation, histogram-image, is not guaranteed: to the same histogram can correspond to different images. For example, moving objects in the scene will get a new image, without the histogram changing significantly (see Fig. 9.4).
9.2.2 Probability Density Function and Cumulative Distribution Function of Image Despite the loss of spatial information, the HI histogram of an image I has useful properties (see Sect. 6.9.1) that can be used to understand some characteristics of the image (Fig. 9.5) in terms of brightness, contrast, isolate homogeneous regions, etc. Recalling the statistical significance of the frequencies relative to each level of gray
390
9 Image Enhancement Techniques
2000
1500
1000
500
0
50
100
150
200
250
2500
2000
1500
1000
500
0
50
250
200
150
100
Fig. 9.3 In (a) the original image has a dominance of the dark gray levels (x < 100) according to the histogram in ( ). The equalization of the histogram is performed on the original image in (a) where the result obtained is represented by the image in ( ), with gray levels uniformly distributed over the image as shown in the relative histogram in (d)
7000
7000
6000
6000
5000
5000
4000
4000
3000
3000
2000
2000
1000
1000 0
0 0
50
100
150
200
250
0
50
100
150
200
250
Fig. 9.4 Example of spatial invariance of a histogram. Moving the position of the objects from the scene a as seen in the scene the relative histograms in and in d respectively do not change
9.2 Improvement of Image Quality
(a)
391
(b)
(c)
4
3 x 10
12000
2
8000
10000
1.5
6000 4000
1
2000
0.5
0 0
15000
2.5
10000
5000
0
0 50
100
150
200
250
0
50
100
150
200
250
0
50
100
150
200
250
Fig. 9.5 Qualitative analysis of the histogram: a Image with normal brightness and contrast with gray levels that affect the whole dynamic of the interval; b Histogram truncated to low gray values due to underexposure of the image; c Histogram truncated to high gray level values resulting in an overexposed image
we can consider the area of the image as the sum of the areas represented by each frequency HI (x). For an image I of N rows and M columns with 256 levels of gray, the AI image area is obtained as follows: AI =
255
HI (x) = N × M pixel
(9.4)
x=0
If each value of the H (x) histogram is normalized with respect to the image area (9.4), the probability density function (pdf) of the image is obtained by considering the pixel population as a stochastic process px (k) = p(x = k) =
HI (k) 255
(9.5)
HI (j)
j=0
where the probability density function of the image px (k) is normalized between 0 and 1. The histogram calculated for the images in Fig. 9.2a and b can be useful to estimate, with Eq. (9.4), the area of the regions corresponding to coins, isolated from the background. Note the area of the single coin we can also estimate their number present in the image. The cumulative (probability) distribution function (cdf) from an image histogram px is given by k px (j) (9.6) cdfx (k) = j=0
392
9 Image Enhancement Techniques
which represents the normalized histogram accumulated up to the kth gray level considered in the image. Statistical theory does not establish an optimal criterion for defining the accumulation intervals of a histogram, i.e., in how many intervals the population of the data has to be divided to calculate the relative frequencies. For the images it may be useful to subdivide the entire range of gray levels (Lmax − Lmin ), represented by the independent variable x, in width intervals x and the corresponding histogram HI (k) would share the pixel population in NMax intervals NMax =
Lmax − Lmin x
where x represents the degree of discretization of the gray level range of the original image which can have a minimum value of 1 if the number of levels of the original image is to be left unchanged.
9.2.3 Contrast Manipulation Many times, pictures of a photo or on the monitor, appear rather dark not getting a good representation of the objects present. This defect is generally caused by the uneven distribution of gray levels in the definition range (almost always between 0 black and 255 white). These images are said without contrast, i.e., they have a grayscale dominance only for a narrow range. In Fig. 9.3b, looking at the histogram, we note that the gray levels of the image accumulate only in the range 0 to 80 of the available range, that is, between 0 and 255. A technique, known as contrast manipulation, makes it possible to transform the gray levels of the input image and produces clearer images (using a gray level transformation function T ). If x is the gray levels of the input image II (i, j), with y we indicate the gray levels of the output image IO (i, j), we are interested in finding a transformation (manipulation) represented by the function T (x) of the gray levels that satisfies the relationship: y = T (x), (9.7) The effect of this transformation (not dependent on the (i, j) position of the pixel I (i, j)), is to modify the contrast of the image in relation to the type of function T , which can have the following characteristics: (a) Piecewise linear contrast with a single piece; (b) Piecewise linear contrast with several pieces; (c) Nonlinear contrast. With these point transformations, we tend to expand the histogram (contrast stretching) to extremes to use all the dynamics of gray levels or color components making the image more contrasted. In some cases, if the contrast is great, the opposite trans-
9.2 Improvement of Image Quality
393
255
0
50
100
255
0
50
100
255
0
50
100
255
Fig. 9.6 Example of manipulation of the contrast by applying a piecewise linear transformation (with single piece) y = T (x) to the input histogram defined in the range 50 ÷ 100. With the linear transformation the gray levels are expanded to cover the entire range 0 ÷ 255
formation will be carried out with the aim of compressing the levels in the central area of the histogram.
9.2.3.1 Piecewise Linear Contrast with a Single Piece Let [xa , xb ] be the range of gray levels in an image, and let y be the range of gray level output of the image you want to get. The linear transformation, with a single piece, is defined as follows: (9.8) y = β(x − xa ) with xa ≤ x ≤ xb and β is the coefficient of expansion β = y x (stretching) obtained from the ratio between the interval of the output levels y to be obtained and the interval of the input levels x = xb − xa . For example, looking at Fig. 9.6 we have that the input histogram of an image result in the range of the gray levels between 50 and 100. This input interval, with x = 50 levels can be expanded up to 256 levels so y = 256 with the expansion coefficient β = 256 50 ≈ 5, i.e., the maximum dynamics of gray levels. This gray level stretching allows us to improve the visibility of the image, using all the dynamic range of gray levels, very useful, for example, to display the image on a high dynamic range monitor. More generally, to expand the gray levels in the output interval y = yb − ya the transformation function can be explicit as follows: y = (x − xa )
yb − ya + ya xb − xa
(9.9)
The Fig. 9.7 shows the application of the level expansion operation, Eq. (9.9), for the image of Fig. 9.3a with the aim of improving the visual quality of the image by making it clearer.
394
(a)
9 Image Enhancement Techniques
(b) 2000
1500
1000
500
0 0
50
100
150
200
250
Fig. 9.7 Example of expansion of the levels with piecewise linear transformation with a single piece. The stretching operation of the histogram is applied to the image in Fig. 9.3a. Observe how the operation performs an expansion of the input histogram of Fig. 9.3b in the output histogram shown in figure b obtaining the resulting image shown in a Fig. 9.8 Piecewise linear transformation with multiple pieces
y
x 0
9.2.3.2 Piecewise Linear Contrast with Multiple Pieces Referring to Fig. 9.8 where the first piece of the gray level expansion curve is characterized by the coefficient α = yxaa , the second piece is defined by β, as above, i.e., Lmax −yb
y a β = yxbb −y −xa , and the last piece is characterized by δ = Lmaxx −xb , where ya and yb are the constants that serve to increase or attenuate the overall brightness of the image, and Lmaxy and Lmaxx are, respectively, the maximum gray levels of output and input. If the coefficients α and δ are equal to zero, we have the particular case that the expansion of the dynamics of gray levels concerns only the interval [xa , xb ] while the gray levels smaller than xa and above xb (Fig. 9.9) are excluded (clipping). This type of transformation is useful when using captured images in particular light conditions and, it occurs, that only in a certain interval [xa , xb ] correspond to the gray levels of pixels belonging to the object of interest (see Fig. 9.9a). Furthermore,
9.2 Improvement of Image Quality Fig. 9.9 When it is known that in particular lighting conditions, the gray levels of the object of interest fall in the intervals [xa , xb ], the transformation is used as in a. In b instead, the background can be isolated from the object by an xT threshold
395
(a)
(b)
y
y α=δ=0
δ
β
α 0
x xa
xb
L max_x
0
xT
x
if the following condition occurs: xa = xb ≡ xT
(9.11)
where xT is a particular gray level value called the threshold value indicating the gray level of separation between the pixels belonging to the background of the image (pixels with gray level x < xT ) and those belonging to the object (foreground pixels with gray level x ≥ xT , see Fig. 9.9b). In this special case, the output image obtained becomes a binary image (or bitmap image), where normally the pixels of the background have value 0 while those of the object have value 1.
9.2.3.3 Nonlinear Contrast Previous linear transformations have uniformly expanded the dynamics of the gray or color levels in the image. For some images, it may be useful to apply an uneven transformation instead by modifying the histogram locally on the low or high values of the gray levels. These transformations can improve the visual qualities of the images, generating a better quality contrast. We begin to examine some simple nonlinear transformations of the contrast. The quadratic transformation is given by y = x2
(9.12)
which tends to expand the dynamics of the higher gray levels and compress those of lower value. The square root transformation is given by √ y= x (9.13) It produces effects that are opposed to the previous one; tends to expand the dynamics of the lower gray levels (darker image pixels) and compress those of higher value (lighter image pixels).
396
9 Image Enhancement Techniques
Fig. 9.10 Example of logarithmic transformation. a Fourier power spectrum of an image. b The result of the logarithmic transformation, having applied Eq. (9.14) to the original spectrum a)
The logarithmic transformation is given by: y=
loge (1 + x) loge [1 + max(x)]
(9.14)
assuming x ≥ 0. This transformation is applied to stretch a narrow range of input gray level values (dark region with pixels having an extremely large range of values) into a wider and appropriate range of output values. This often occurs when fundamental transformations are applied to the image (e.g., Fourier transform, described in Sect. 9.11.1). The logarithmic transform is applied in this case to rescale the pixel values representing the power spectrum in order to reveal more detail (see Fig. 9.10). In general, we use the log transformation to expand the values of dark pixels into an image while compressing (or suppress) the higher-level values. The inverse log transformation performs the opposite transformation.
9.2.3.4 Negative and Inverse Image The negative transform is obtained by complementing the gray values of the input image with respect to the maximum value Lmax and producing an image called “Negative” in output. (9.15) y = Lmax − x The “Negative” of a color image is obtained by applying the previous equation for each color component. The inverse transform is given by y=
1 x
con x > 0
(9.16)
with y normalized between [0, 255]. These transformations are useful for displaying some very dark details of an image (Fig. 9.11).
9.2 Improvement of Image Quality
397
Fig. 9.11 Original image, negative and inverse Fig. 9.12 Example of nonmonotonic transformation
0
255
9.2.3.5 Nonmonotonic Transformation The nonmonotonic transformation is used to display, in an acceptable way, an image with a wide range of gray levels, on a support (monitor or printer) with a low range of gray levels available (Fig. 9.12).
9.2.3.6 Bit-Plane Manipulation The manipulation consists of considering only the most significant bits of each pixel of the input image, which we can think of being eight 1-bit planes, with plane 0 containing the lowest order bit of all pixels in the image, and plane 7 all the bits of higher order. Normally, for an 8-bit quantized image, it is verified that only the first six most significant bits are useful because they contain the most relevant information of the image. The output image is produced leaving at 1 only the significant bit of interest and zeroing all the others. From Fig. 9.13 it is observed that the digitalization process has introduced in the image a not negligible background noise in the lower planes. It is further observed that each bit-plane is a binary image.
9.2.3.7 Intensity Level Slicing This transformation makes it possible to divide the intensity range (gray levels) into small intervals that identify certain homogeneous regions of interest in the image
398
9 Image Enhancement Techniques
Immagine originale
Bit 0
Bit 2
Bit 3
Bit 5
Bit 6
Bit 1
Bit 4
Bit 7
Fig. 9.13 Significant bit-plane. For this 8-bit image the most significant bit planes are the last 3, from plan 5 to 7, showing the most information content. The less significant floors highlight the image noise due to the acquisition process and the instability of the sensor
plane. The possible transformations are Lmax xa ≤ x ≤ xb y= 0 otherwise y=
L xa ≤ x ≤ xb x otherwise
(9.17)
(9.18)
In this last relation the image pixels outside the interval [xa , xb ] are left intact. This technique is useful when an image contains several homogeneous regions with corresponding different intensity level ranges. For example, for a satellite image we want to distinguish the clouds from the seas and the earth. This technique is effective in this case because the three objects have different intensity values for their different reflectance properties.
9.2 Improvement of Image Quality
399
9.2.4 Gamma Transformation This nonlinear transformation can be considered, more in general, the power-law transformations of an image and is given by y = cxγ
(9.19)
where c and γ are positive constants. The c constant is useful for normalizing the pixel value in the range of interest. Normally the pixel value is encoded in a memory byte and takes values between 0 and 255. If we indicate with Lmax the maximum value of the levels present in the image transformation (9.19) becomes γ γ x x y = =⇒ y = Lmax (9.20) Lmax Lmax Lmax The exponent γ (the Greek letter from which the transformation takes its name) can take values greater than 1, and as we will see, it produces darkening effects on the image (Fig. 9.14), in particular, in pixels with medium-low values. For values of γ < 1 the transformation has the opposite effect as those generated with values of γ > 1 and performs similar effects to the logarithmic transformation, i.e., it tends to expand the dynamics of the pixels by the low values while the dynamics are compressed in the high values of x (Fig. 9.14). In fact, we can observe the opposite effect generated on the image by the transform with γ = 1/2, and it is as if we had applied the inverse of the Eq. (9.19) to the pixels, that is, to raise to the power 1/γ y = cx1/γ
(9.21)
Fig. 9.14 Gamma trasformation: a with γ < 1, clearing effects on the image are produced, while b with γ > 1 there are darkening effects on the image (c = 1, in both gamma transformations)
400
9 Image Enhancement Techniques
In other words, the inverse transform (9.21) has the effect of compensating the effects produced with the (9.19) and this process is called gamma correction or gamma encoding. The gamma transformation is normally used to encode and decode the luminance or color components (tristimulus) of the images for various electronic display devices (CRT and LCD monitors) and acquisition devices (TV-cameras, cameras, scanners). In fact, these electronic devices have input/output characteristics (in both analog and digital signals) nonlinear modifying the signal with an approximation expressed by the gamma transformation with γ > 1. In this context, when working with the value of γ < 1 it is said to have performed a gamma encoding on the image while operating with the value of γ > 1 the transformation is called gamma decoding. It follows that, if for example, you want to display with a CRT monitor an image acquired or generated with gray levels scaled linearly these values would be altered by the gamma expansion (γ > 1) introduced by the CRT obtaining a dark image on the monitor. To avoid this drawback, note the law of gamma expansion of these devices, normally the value of the pixels of the image are previously codified with the (9.19) with the value of γ < 1 (also called gamma correction), thus obtaining a correct visualization of the image to compensate for the gamma expansion of the monitor which, in fact, alters the pixels in an inverse way according to 9.21. Each device has its own gamma correction value. For a CRT monitor, the standard range is γ = 2.2 and the gamma correction previously applied is 1/γ = 1/2.2 = 0.45. For a color CRT, we can have three different ranges for their RGB color components or a single gamma value for the three components. For an LCD monitor, the approximate value of the range is γ = 2.5 with the consequent gamma correction of 1/γ = 1/2.5 = 0.4. In addition to the nonlinearity of reproduction and capture devices, there is another motivation to apply gamma correction to images. The human vision has a greater sensitivity towards low variations in light intensity rather than high values. Therefore, it is useful to encode the pixel values also according to the perceptual characteristic, making the whole range of intensity levels uniform (0÷255). In fact, the gamma correction needed to compensate for the effect of the monitor (gamma expansion) on the image is very similar to the uniform perceptual scale of the human visual system. This explains why image capture devices (TV-cameras, cameras, scanners) save the pixel value in a file (for example TIFF, JPEG, MPEG for video data) already encoded to take into account gamma correction, i.e., with nonlinear values. Figure 9.15 shows the sequence of transfer functions applied to the image in the display context with the aim of having a uniform perception (equal intervals of perceived brightness) compared to the nonlinear values of the correct gamma image. The figure shows how with the preventive gamma correction it eliminates the effects of darkening of the image intrinsically introduced by the monitor. Gamma corrections should be adequate with respect to the above standard values when the acquisition and visualization devices present special features. Normally in these cases the special gamma corrections would be programmed and transferred on ad hoc memory available in these devices (particularly in the video cards in the case of monitors).
9.3 Histogram Modification
401
(a)
(b)
Fig. 9.15 Gamma correction to compensate for the darkening effect on the image introduced by a CRT monitor: a In this case, the effect of the darkening is observed not having corrected the image in the gamma; b The scanned image is saved in a JPG or TIFF file or other formats, but corrected with γ = 0.45 to compensate for the effect of the modeled transfer function for a CRT from the gamma transform with γ = 2.2
9.3 Histogram Modification The contrast manipulation techniques examined above improve the visual qualities of the image by expanding the distribution range of gray levels adequately without altering the shape of the distribution (histogram). Several real images, despite having applied the previous linear and nonlinear transformations, show a dominance of gray levels towards high values (very illuminated image) or towards low values (dark image). A method to improve the visual qualities of these images consists of modifying its histogram, i.e., modifying the distribution function of the gray levels present.
9.3.1 Histogram Equalization An automatic method of nonadaptive modification of the histogram is that based on the equalization of the histogram which consists in producing an output image with uniform distribution of gray levels, i.e., constant value of the probability density function. In other words, the histogram equalization transformation function modifies the histogram HI (x) of the input image II (i, j), such that, of the equalized output image IO (i, j), the output histogram HO (x) results with a constant pixel frequency at each level of gray x (i.e., a flat histogram, see Fig. 9.16). With reference to the symbolism used in Sect. 9.2.3, if we indicate with x the gray levels of the input image II (i, j), with y the gray levels of the output image IO (i, j), we
402
9 Image Enhancement Techniques
I
y
x
Fig. 9.16 Effects of equalization on gray levels: on the left, generic input histogram; on the right, a uniform (flat) histogram of output
are interested to find an (automatic) transformation represented by the T (x) function of the gray levels that satisfies the expression y = T (x),
(9.22)
such that the HI (x) histogram of the input image (generically presented in Fig. 9.16) is transformed into the uniform output histogram, i.e., HO (y) = constant, as shown in Fig. 9.16. Let’s now see what conditions should be met to find the T function of equalization. The transformation function (9.22) that produces the equalization of the histogram is supposed with monotonous increasing characteristics in the interval (0, Lmax ), that is, (9.23) xi ≤ x ≤ xi+1 ⇒ T (xi ) ≤ T (x) ≤ T (xi+1 ) ∀i ∈ [0, Lmax ] The inverse function of the monotonic function is also assumed x = T −1 (y) with 0 ≤ y ≤ Lmax .
(9.24)
One way to compute the T transformation function is to consider the histogram equalization process, taking into account the gray levels as variables of a stochastic process characterized by the probability density functions px (x) and py (y), respectively, for the image of input and output. If the probability density function px (x) and the transformation T (x) are known, it is possible to calculate py (y). In fact, we consider the relationship between the probability density function px (x) of the input image and the output of py (y) as shown in Fig. 9.17, for example, of an infinitesimal range of gray levels (x1 , x1 + dx). The pixels in the input range are all transformed into the corresponding output range (y1 , y1 + dy) via T (x) being the latter monotone. It follows that, the areas underlying each probability curve px (x) and py (y) must be of equal value. Considering an infinitesimal interval the areas are approximated to rectangles thus obtaining the following equality: py (y)dy = px (x)dx.
(9.25)
9.3 Histogram Modification
403
Fig. 9.17 The effect of the point transformation function T (x) on the histogram of the image
from which py (y) = px (x)
px (x) dx = dy dy/dx
(9.26)
Recalling the probability density function px (x) of the image defined by Eq. (9.5), considering that y = T (x), and substituting in (9.26) we obtain HO (y) =
HI (x) d dx T (x)
(9.27)
In Eq. (9.27) we have two independent variables, y in the first member and x in the second member. But considering the existence of the inverse function x = T −1 (y), the (9.27) is put into more general form as follows: HO (y) = where
HI (T −1 (y)) T [T −1 (x)]
T =
dT dx
(9.28)
(9.29)
If we want a point transformation that transforms the histogram of the output image with the characteristic of being uniform, i.e., HO (y) = constant, the second member of (9.28) suggests that this is obtained from the ratio of the two functions of the same variable. It follows that in order to have HO (y) for each value of gray level, the numerator and denominator of (9.28) must be identical to less than a constant factor. Therefore, it is possible to impose the following condition: A HI (x) = = constant Lmax T (x)
(9.30)
404
9 Image Enhancement Techniques
where the constant imposed is given by the ratio between the area A of the image (i.e., the total number of pixels in the image) and Lmax the maximum value of gray level. Resolving with respect to the derivative of T , from (9.30) we obtain
T (x) = HI (x)
Lmax A
(9.31)
By integrating both members of (9.31), equality is satisfied if the following relationship is maintained x Lmax T (x) = HI (z)dz (9.32) A 0
Let us now remember the definitions of probability density function and the cumulative distribution function for the image, respectively, defined by Eqs. (9.5) and (9.6), by virtue of (9.32) we have HI (x) px (x) = ; A
x cdf (x) =
1 p(z)dz = A
0
x HI (z)dz
(9.33)
0
From the last equation it emerges that in order to satisfy the equalization condition, the transformation T (x) must coincide with the normalized cumulative distribution function cdf (x) and the Eq. (9.32) of the image equalization HI can be rewritten as follows: x Lmax y = T (x) = HI (z)dz = Lmax · cdf (x) (9.34) A 0
It should be noted that the cdf function has the characteristic of being nonnegative, well-defined, and nondecreasing. In discrete form, the transformation of equalization (9.34) becomes y(k) = T (xk ) = Lmax
k i=0
p(xi ) =
k Lmax HI (xi ) A
with
k = 0, . . . , Lmax
(9.35)
i=0
If the equalization interval is [ymin , ymax ] this function becomes k y(k) = (ymax − ymin )
i=0 HI (xi )
A
+ ymin .
Figures 9.3d and 9.18b show that the equalized output histogram HO (y) is not completely flat, but this is due to the finite number of discretization levels. It can also occur that some significant input levels will not be present while others will be but can be insignificant. For some images, it may be useful to equalize with a smaller number of levels (see Fig. 9.18c and d) obtaining a more uniform equalized image
9.3 Histogram Modification
(b)
(a)
(c)
3000
2500
2000
1500
1500
1000
1000
500
500 0
0 50
100
150
200
250
(d)
5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0
3500 3000 2500 2000 1500 1000 500 0
2500
2000
0
405
0
50
100 150 200 250
0
50
100 150 200 250
0
50
100 150 200 250
Fig. 9.18 Image Equalization: a Original image and relative histogram; b Equalized image with 256 levels; c 127 levels, and d 64 levels
histogram. The histogram HO (yk ) in the discrete context must be considered as an approximation of the probability density function of a continuous function. The advantage of the equalization function, compared to other transformations that improve the appearance of the image, is given by the automatism of the calculation procedure. If an image has already been equalized, a second equalization does not produce any effect obtaining an identical equalized histogram. The pseudo code of the equalization algorithm is reported in Algorithm 1.
9.3.1.1 Example 1: Linear Point Operator As the first example of modifying the histogram we consider the point operator described in the Eq. (9.10) that we rewrite y = β (x − xa ) + ya
(9.36)
valid for xa ≤ x < xb . The derivative of the Eq. (9.36) is β and its function inverse is given by (y + βxa − ya ) (9.37) x = T −1 (y) = β which replacing in Eq. (9.28) we obtain HO (y) =
1 (y + βxa − ya ) HI β β
(9.38)
It can be observed from Eq. (9.38) that if ya < 0 the histogram of the output image is shifted to the right, while it will be shifted to the left if ya > 0. The effect of the angular coefficient β of the point operator produces enlargement of the histogram
406 Fig. 9.19 Example of linear transformation for a Gaussian histogram
9 Image Enhancement Techniques Input
Output HO(y)
HI (x) 1
1/ β
μ
x
μ+ ya β
y
HO (y) if β > 1 with a reduction of the amplitude to keep the area of the histogram constant. The opposite effect occurs with β < 1. Now let’s take a concrete look at the effect of a linear operator in the hypothesis that the histogram of the input image HI (x) is of Gaussian form given by HI (x) = e−(x−μ)
2
(9.39)
as shown in Fig. 9.19. From the previous considerations replacing the Eq. (9.39) in (9.38) we have 1 −[ x −(μ+ yβa )]2 HO (y) = e β (9.40) β with the peak translated in correspondence of μ + yβa and the height is reduced to 1/β. Observe in Fig. 9.19b as the output histogram is still of Gaussian form.
9.3.1.2 Example 2: Nonlinear Point Operator Let’s consider the nonlinear quadratic point operator by rewriting the Eq. (9.12) y = T (x) = x2
(9.41)
to apply to an image that has the following histogram: HI (x) = e−x
2
(9.42)
corresponding to the right half of a Gaussian distribution, as shown in Fig. 9.20a. In Fig. 9.20b the nonlinear quadratic operator is represented. Applying the Eq. (9.28) the histogram of the output image HO (y) is modified as follows: e−y (9.43) HO (y) = √ 2 y as highlighted in Fig. 9.20c.
9.3 Histogram Modification
407
(a)
(b)
(c)
0.8
400
0.7
350
0.18
0.6
300
0.16
0.5
250
0.4
200
0.3
150
0.08
0.2
100
0.06
0.1
50
0
0
0.2
0.14
0.12
0.1
0.04
0.02
0
5
10
15
20
25
0
0
2
4
6
8
10 12 14 16 18 20
0
2
4
6
8
10 12 14 16 18 20
Fig. 9.20 Example of nonlinear point transformation: in a the right part of HI (x) Gaussian; b e−y T (x) = x2 and c output histogram HO (y) = 2√ (y)
9.3.1.3 Example 3: Singular Cases For the validity of the Eq. (9.28), the inverse of the point transformation function T (x) is assumed and that this function has finite values without singular points (zero slope or infinite gradient at some point in the input range). From Fig. 9.17 we can easily observe how the histogram HO (y) would change at such singular points: in the interval with zero slope we would have an infinitesimal strip of HO (y) producing an infinite pulse (see Eq. 9.28), on the contrary, a strip infinitesimal of HI (x) would be observed in a finite interval in HO (y) producing a very small value for the output histogram. From Fig. 9.16 it can be verified how the Eq. (9.14) is valid only within these extreme cases of singularity. If the transformation function T (x) does not allow inverse, generally when T (x) is a nonmonotonic function, Eq. (9.28) would lose its validity and could not be used directly. In these cases, it is possible to apply the point operator by dividing the input range into sub-levels where the singularity conditions do not exist and the Eq. (9.28) can be applied by piecewise producing an output histogram for piece.
9.3.2 Adaptive Histogram Equalization (AHE) Since our eyes adapt to local image contexts for content evaluation rather than the whole image, it is useful to optimize the quality improvement locally using an approach in the literature known as Adaptive Histogram Equalization (AHE). To do this, the image is subdivided into a grid of rectangular context regions, in which the optimal contrast must be calculated. The optimal number of regions depend on the type of input image you have available. The division of the image into 8 × 8 contextual regions usually provides good results, which means having 64 context regions of size 64 × 64 pixels when the AHE algorithm is executed on a 512 × 512 pixel image. For each of these regions, the histogram is equalized. The local equalization of the histogram can give rise in the whole image to the visualization of the border zones of the locally processed contextual regions. To overcome this problem, a bilinear interpolation is carried out,
408
9 Image Enhancement Techniques
Algorithm 1 Histogram equalization algorithm 1: Let II be the input image, N × M the image size and Lmax the maximum gray level 2: Zero-initializes the array HI (xk ) for k = 0, 1, . . . , Lmax − 1 {Create the histogram HI (xk )} 3: for i = 0, . . . N − 1 do 4:
for j = 0, . . . M − 1 do
5: 6:
x ← II (i, j) HI (x) ← HI (x) + 1
7:
end for
8: end for {Creates the Hc cumulative histogram by initializing it to zero} 9: Hc (0) ← HI (0) 10: for k = 1, . . . , Lmax − 1 do 11:
Hc (k) ← Hc (k − 1) + HI (k)
12: end for 13: Calculate the mapping function: y = T (x) = round
Lmax − 1 Hc (k) N ×M
0 ≤ x < Lmax
14: for i = 0, . . . N − 1 do 15:
for j = 0, . . . M − 1 do
16:
IO (i, j) = T (II (i, j))
17:
end for
18: end for 19: IO is the equalized image obtained
which attempts to restore a uniform contrast condition. Unfortunately, this technique does not suppress background noise in the image.
9.3.3 Contrast Limited Adaptive Histogram Equalization (CLAHE) The noise problem associated with the AHE algorithm can be reduced by limiting the improvement of the contrast in homogeneous areas. These areas can be characterized by a high peak in the histogram associated with contextual regions since many pixels fall within the same range of gray levels.
9.3 Histogram Modification
409
With the Contrast Limited Adaptive Histogram Equalization (CLAHE) [8] approach, the slope associated with the gray level assignment scheme is limited. This can be accomplished by allowing only a maximum number of pixels in each gray level (that is, in the histogram the single bar indicating the gray level frequency value) of the histogram associated with the local region under analysis. After the cutting of the local histogram, the remaining pixels are equally redistributed over the entire histogram in order to maintain the same number of occurrences of starting pixels. The limit cut (or contrast factor) is defined as a multiple of the average content of the histogram. With a low factor, the maximum slope of the local histogram will be low, and therefore, results in a limited improvement of the contrast. The factor value of 1 prohibits the improvement of the contrast (in essence the original image is obtained); the redistribution of the histogram frequencies can be avoided by using a very high limit cut (1000 or higher), which is equivalent to using the previously AHE technique. By dividing the starting image II (i, j) into adjacent sub-regions, the histogram of each of them is calculated. Assuming to analyze the region (k, l), and indicating with Hkl (x) the corresponding local histogram with x = 0, 1, . . . , Lmax , we rewrite the mapping function of the histogram equalization as follows: Lmax − 1 Hkl (x) x = 0, 1, . . . , Lmax N ×M x
T (x) =
(9.44)
i=0
The problem with the equalization of the histogram is that the contrast of the regions is significantly increased to the maximum. To limit the contrast to a desired level, the maximum slope of the Eq. (9.44) is limited by a cutting procedure as follows. We set a β threshold called clip limit for which the histogram will be cut at that value. This value is linked to a cut factor α (clip factor) in percentage as follows: β=
α NM
(smax − 1) 1+ Lmax 100
(9.45)
In this way, for a cutting factor α = 0, the clip limit β = NM /Lmax results in a uniform redistribution of gray levels of the pixels. The maximum cutting value is obtained for α = 100 where β = (smax · NM /Lmax ). This means that the maximum possible slope is smax . According to [6] a value of smax = 4 is set for X-ray images, leaving other applications free to experiment with other values. When α varies between zero and one hundred, the maximum slope between 1 and smax is changed. Therefore, subdividing the whole image into sub-regions, for example, for an image of 512 × 512 pixels, subdividing rows and columns into 8 intervals, we obtain 64 regions. In each of them, the histogram is evaluated and it is cut to the value thus redistributing all the values as shown in the algorithm 2. Figure 9.21 shows the result of the CLAHE algorithm. In [6] an example of CLAHE implementation on dedicated hardware is shown.
410
9 Image Enhancement Techniques
Algorithm 2 Pixel redistribution algorithm for CLAHE 1: ExcessPixels ← 0 2: for x = 0, 1, . . . , Lmax − 1 do 3: 4: 5: 6:
if h(x) > β then ExcessPixels ← ExcessPixels+h(x) − β h(x) ← β end if
7: end for 8: m ← ExcessPixels/Lmax 9: for x = 0, 1, . . . , Lmax − 1 do 10: 11: 12: 13: 14: 15: 16:
if h(x) < β − m then h(x) ← h(x) + m ExcessPixels ← ExcessPixels−m else if h(x) < β then ExcessPixels ← ExcessPixels−β + h(x) h(x) ← β end if
17: end for 18: while ExcessPixels > 0 do 19:
for x = 0, 1, . . . , Lmax − 1 do
20:
if ExcessPixels > 0 then
21: 22:
h(x) ← h(x) + 1 ExcessPixels ← ExcessPixels−1
23:
end if
24:
end for
25: end while
9.4 Histogram Specification The equalization of the histogram produces a single result: an image with uniform gray levels. For some applications, it is useful to change the shape of the histogram in such a way to enhance some gray levels of the image. This can be done with the
9.4 Histogram Specification
411
Fig. 9.21 Application of the CLAHE algorithm. a Image of an unlighted seabed uniformly; b The image of the seabed (Roman columns) after the equalization of the histogram; c final result obtained with the CLAHE adaptive equalization
(a)
(b)
(c)
0
0
Fig. 9.22 Point transformation by changing the histogram to a prespecified shape: a Histogram of the input image; b Intermediate equalized histogram; and c Modified output histogram
adaptive and prespecified adjustment of the histogram, to improve the visual quality of the image by also attenuating the limits of the resulting discretization with the equalized histogram. This can be achieved with attempts by interactively specifying the histogram model to be obtained from the process of transforming the gray levels (Fig. 9.22). Assume that we have an input image with the probability density function p(x) and that we want to transform gray levels x to a prespecified probability density function p(z) with z output levels. An approach is considered based on an intermediate step that first equalizes the input image. As is known, the equalization function is x p(t)dt (9.46) y = T (x) = (Lmax − 1) 0
If we assume that we have already built the final image we can write the transformation that equalizes its histogram in a similar way z q = G(z) = (Lmax − 1) pz (t)dt (9.47) 0
412
9 Image Enhancement Techniques
with t simple integration variable and pz (t) the histogram of the desired output image. The inverse equation of the desired transformation is given by z = G −1 (q)
(9.48)
It is observed that the images corresponding to the transformations with gray levels y and q have the same equalized histogram, and therefore, are identical images, that is, y = q. It follows that, since the two Eqs. (9.46) and (9.47) of transformation, being the variables y and q equivalents, can be eliminated between the two expressions thus obtaining the compound transformation z = G −1 (q) = G −1 (y) = G −1 [T (x)]
(9.49)
where the transformations T and G are defined, respectively, by the histogram of the input image with gray levels x and the desired histogram. If we assume that G and its inverse are monotonic, from the previous equation the desired transformation function z = F(x) is given by (9.50) F = T ◦ G −1 where “◦” is the symbol of composition of functions. If we consider the probability density functions expressed with discrete values for the estimation of the histograms, the following procedure can be applied 1. Calculate the px (j) histogram of the input image; k px (j); 2. Calculate the equalized histogram cdfx (k) = j=0
3. Specify the histogram or the desired probability density function pz (j); k pz (j) which corresponds to the 4. Calculate the equalized histogram cdfz (k) = j=0
transformation function q = G(z); 5. Construct the desired function F(k) of the histogram by analyzing T and comparing with G (expressed respectively by cdfx (k) and cdfz (k)): for each gray level k, find cdfx (k) and then a gray level j such that cdfz (j) is the best comparable, i.e., |cdfx (k) − cdfz (j)| = mini |cdfx (k) − cdfz (i)| and save the gray levels in F(k) = j; 6. Apply the z = F(x) function to the original image. In real applications, working with images with discrete intensity values, the inverse transformation from y to z does not have single values. These drawbacks can be overcome by specifying the histogram interactively by means of an electronic graphic device. A very often used transformation is the so-called hyperbolization of the histogram, based on Weber’s law which hypothesizes a model of perception of light intensity. This technique can be integrated with a previous histogram equalization process. It is also often used to standardize brightness between two separate cameras
9.4 Histogram Specification
413
in processes such as the Brightness Transfer Function (BTF). BTF effectively compensates for changes in lighting in the environment and for radiometric distortions caused by the difference in camera sensors [7].
9.5 Homogeneous Point Operations Contrast manipulation and histogram modification techniques are point operations that calculate the new intensity value y by considering only the intensity value of input x and without depending on the position (i, j) of the pixel in the image plane. The point operations that are not dependent on the position of the pixel are also called homogeneous and are of the type y = T (x)
with
0 ≤ x ≤ 255, 0 ≤ y ≤ 255
From a computational point of view, the homogeneous point operation, called the square root, given by √ y = 255 · x with 0 ≤ x ≤ 255, 0 ≤ y ≤ 255 (9.51) applied for an input image with 512 × 512 = 262144 pixels, it would require a total of 262.144 multiplications and √ as many square roots. If it were possible to precalculate the intensity values yi = 255 · xi for all input values xi , which in real applications are often 256 levels of gray, and save them in as many locations in a table, the point operation would be to use the input values xi as the address of the 256 possible locations of the table containing the new gray level values yi already pre-calculated (see Fig. 9.23). Normally this table is called Look-up Table (LUT). With this stratagem, homogeneous point operations are performed with less calculation time replacing the intensive calculation operations (roots, logarithms, cubic, etc.) with simple addressing instructions. In image processing systems LUTs are implemented in hardware,
Fig. 9.23 Homogeneous point√ operation for yi = 255 · xi
255 220
175
12
175
55 55 0
414
9 Image Enhancement Techniques Video Controller
Video Processor
B
G
R
G B
R
TV-camera LUT-Out
LUT-In VideoMemory
Fig. 9.24 Diagram of the components of a vision and image acquisition system
and therefore, many homogeneous point operations are instantaneous and efficient for interactive activity and for real-time processes. In these systems (Fig. 9.24), input LUTs are also available to perform homogeneous point operations in the image acquisition phase (for example from a TV camera) and output LUT to perform homogeneous point operations in the display phase to improve the visual quality of the image. The following aspects are to be considered: (a) The homogeneous point operations performed electronically are used only for qualitative analysis during the interpretation and consultation of the image. (b) For quantitative analysis there are limits in the use of LUTs as they normally introduce numerical truncation errors and above all, they are limited in resolution and almost always are 8-bit. (c) The modern video cards of personal computers have different hardware LUTs (for each RGB component) that can also be used by users to modify, for example, the native curves of the monitor for the gamma correction described in Sect. 9.2.4.
9.6 Nonhomogeneous Point Operations This category of point operations includes all those transformations that calculate the new intensity value in relation to the position (i, j) of the pixel x(i, j). In this case, the LUT cannot be used because the point operations depend on the position of the pixel and we are forced to evaluate the new intensity value for each pixel of the image. All nonhomogeneous point operations require considerable calculation time. Consider two cases of nonhomogeneous point operations.
9.6.1 Point Operator to Correct the Radiometric Error The process of digitizing images does not always include in the model of image formation the dependence between the intensity and position of the sensitive elements.
9.6 Nonhomogeneous Point Operations
415
In reality, the acquired image shows uneven intensity values even when the acquisition takes place under ideal light conditions. These defects are due to the instability of the sensors (for example CCD cameras), to the nonregular lighting conditions of the scene, to the uneven sensitivity of the sensors themselves and finally to the degradations introduced by the optical components. Even if a digitizing system electronically reduces this irregularity by input LUT, it is necessary to further reduce such defects with nonhomogeneous point operations. These defects are not easily observable on images with high contrast and with many details, but they are evident in images where the background is dominant. The image irregularity can easily be verified when trying to isolate the main object of the image from the background. The gray levels of the background are not easily identifiable from those of the object. A method to attenuate the noise of the image caused by the uneven illumination and instability of the sensors consists in calculating a reference image (sample image) obtained with the average between different acquisitions made of the same scene in the identical lighting conditions. The reference image IR (i, j) can be used as a correction image with respect to the input image I (i, j) of the same scene by applying the nonhomogeneous division point operator to obtain a correct version Ic in output image Ic (i, j) = c
I (i, j) IR (i.j)
(9.52)
where c is an appropriate constant that serves to adapt the intensity values to the desired range.
9.6.2 Local Statistical Operator Unlike the homogeneous and nonhomogeneous point operators, this local operator calculates the new intensity value for each I (i, j) pixel of the input image, considering the statistical parameters (mean and standard deviation) evaluated for a predefined window centered in (i, j) and for the whole image. The statistical parameters of the windows are evaluated for each pixel of the image requiring considerable calculation time. Let II (i, j) be the input image, IO (i, j) the output image, the statistical operator is applied for each pixel as follows: IO (i, j) = μ(i, j) + k
M · [II (i, j) − μ(i, j)] σ (i.j)
(9.53)
where μ(i, j) and σ (i, j), respectively, indicate the mean and the standard deviation of the gray levels of a window centered on the pixel being processed (i, j), M represents the average of the gray levels of the entire image and k is an appropriate constant (with 0 ≤ k ≤ 1) necessary to adapt the new intensity values with respect to the desired output interval. The window dimensions (3 × 3, 5 × 5, 7 × 7, 9 × 9, . . .) vary according to the local structures in the image.
416
9 Image Enhancement Techniques
The brightness of the transformed image can be changed by modifying the first addend of the previous relation from μ(i, j) to α · μ(i, j) considering only a fraction of the local average (with 0 ≤ α ≤ 1) obtaining the effect of attenuating the strong gray level variation that would occur in isolated areas of the image.
9.7 Color Image Enhancement So far, monochromatic images have been considered. Let us now propose to improve the quality of color images by introducing the methods of false color and pseudocolor.
9.7.1 Natural Color Images From color theory, it is known that natural color images can be obtained by superimposing the three primary components Red, Green, and Blue. The techniques for improving the visual qualities of gray level images (known in the literature as Enhancement techniques), analyzed in the previous paragraphs, they can be applied separately to the three color components. Various color spaces are available (R, G, B), HSI (hue, saturation, and intensity), XYZ (described in Chap. 3). The various Enhancement techniques can be applied by considering the individual components of a color space. Depending on the type of visual model, the enhancement operations on each component are applied, and then the appropriate transformations are applied to move from one color space to another. Also for color images it is possible to use Look-Up Tables, one for each color component. A possible improvement of the visual qualities of an image can be achieved with the following transformations of the individual color components (RGB) I Input Color Image
⇒
(HSI )I
⇒ (HSI )O
Enhancement
⇒
(RGB)O Output Image
The first transformation of the previous sequence converts the color components (R, G, B) of the input image into the HSI color space normally used to perform objects recognition operations in the color image. To improve visual qualities, the point and local operators described above can be applied to the three HSI components. Subsequently, to see the effects of improving the image quality, a reconversion from the HSI to the RGB space is performed. The latter used to display the transformed image on a color monitor. In the example of Fig. 9.25 each LUT contains 256 4-bit cells, and therefore, can be generated 24 × 24 × 24 = 4096 different colors. Modern pictorial graphic terminals specialized for the display of color images have at least 3 LUT with 8-bit cells (28 × 28 × 28 different colors) and at least one 8-bit or 12-bit LUT for graphics.
9.7 Color Image Enhancement
417
Video Frame Memory
0
1
0
0
0
0
1
1
N−1
y
pixel (x ,y0 )
67 0
x
M−1
Display 255
y
LUT
pixel (xi,yi) 1001 1010 0001 R G B
x
100110100001 0
Fig. 9.25 Example of look-up table for color images
9.7.2 Pseudo-color Images The pseudo-color display method is used to represent an inherently monochromelike image in color. The monochrome image can be from a black/white camera or scanner or created synthetically to simulate a particular physical phenomenon. The pseudo-color technique is used to improve the visual quality of a monochromatic image by exploiting the best sensitivity of the human visual system to color. This technique consists of applying three different point operators that are homogeneous to the black/white image, obtaining in output the three components IR , IG , and IB of the color IR (i, j) = TR {I (i, j)} IG (i, j) = TG {I (i, j)} (9.54) IB (i, j) = TB {I (i, j)} where TR , TG , and TB are the transformations to obtain from the monochromatic image I , the three components of color Red, Green, and Blue. We now describe some methods of conversion from gray level (or intensity) to pseudo-color, always based on homogeneous point transformations.
418
9 Image Enhancement Techniques
(a)
(b)
(d)
(c)
Fig. 9.26 a Geometric representation of the intensity slicing color conversion method b Monochrome images; c and d Results of the density slicing method for the intensity pseudo-color conversion realized with 16 color ranges
The simplest operator is to manually match the value of a pixel intensity I (1) with the color triad (R1 , G 1 , B1 ) provided by a specific color sample table (LUT). Instead of displaying gray level, the image displays a pixel with a defined amount of each color. This is laborious and inefficient. A simple approach to assign a color to a gray level is based on the intensity slicing criterion. The monochromatic image is considered as a 3D function I (i, j, l) where the spatial coordinates (i, j) represent the 2D image plane while the vertical axis l indicates the intensity levels from 0 to Lmax (see Fig. 9.26). We can think of positioned K parallel planes Pk , k = 0, 1, 2, . . . , K to the coordinate plane (i, j), at different heights lk , k = 0, 1, 2, . . . , K corresponding to intensity value (where l0 corresponds to the black color, i.e., to the coordinate plane (i, j), and lK ≤ Lmax ), and perpendicular to the l axis of the intensities. Therefore, with the K planes, the intensity slicing partition is realized producing K + 1 intervals Sm , m = 1, 2, . . . , K + 1. The intensity pseudo-color conversion is performed by assigning the color Ck to the pseudo-colored image IC (i, j) as follows: IC (i, j) = Ck
if
I (i, j) ∈ Sm
m = 1, 2, . . . , K + 1
Basically, the color Ck is assigned to the intensity pixels I (i, j) that belong to the intensity interval Sm = [lm−1 , lm ]. Another simple gray level (or intensity) to pseudo-color conversion method is to generate RGB color components whose values are calculated from different fractions of the intensity of each pixel of the monochrome input image. The RGB color component images are obtained with the transformations (9.54), as follows: IR (i, j) = TR {I (i, j)} = a · I (i, j) IG (i, j) = TG {I (i, j)} = b · I (i, j) IB (i, j) = TB {I (i, j)} = c · I (i, j)
(9.55)
where I (i, j) is the input gray level image, while 0 < a, b, c ≤ 1 are the constants that indicate the fraction of the gray value (or intensity) to be assigned to the related component.
9.7 Color Image Enhancement
419
G
(a)
(b)
E
D R G B
F
K'
H
B
1 C
0
R
M
0
1/3L C
D
E
2/3L F
H
L M
C
Fig. 9.27 Gray pseudo-color conversion methods. a Pseudo- colors generated through a path CDEFHMC in the RGB space. b Gray pseudo color conversion using nonlinear transformations based on sinusoidal functions
An alternative method consists in defining a path (see Fig. 9.27a) in the space (R, G, B) of the color parametrically defined by the values of the monochromatic image I (i, j). In essence, a piecewise linear function of the gray levels is used to generate colors. A smooth nonlinear function can also be used for the three independent transformation functions (9.54) on the intensity of each pixel. Figure 9.27b shows an example of nonlinear transformations based on sinusoidal functions whose phase and frequency characterizes the value of the RGB color components in the intensity pseudo-color conversion. In this case, the horizontal axis represents the intensity values from 0 to Lmax of the monochromatic input image. The three color values RGB is given by the intersection of the vertical line (corresponding to each intensity level present in the image) and the sinusoidal curves which represent the transformation functions RGB. Pixel with a value of the intensity that falls in the steep areas of the sinusoids is associated with a very strong color, while in the area of the peaks there are constant color values. A small change in the phase between the three sinusoids produces few changes in pixels whose gray level corresponds to the peaks in the sinusoidal curves. More complex conversion methods in pseudo-color have been proposed based on the value of the gradient and others based on the conversion of color space models [2,5]. The commonly used RGB color space is ineffective for the type of uniformly colored image processing due to the high cross correlation between color channels. Other color spaces like HSV (hue, saturation, value) and HSI (hue, saturation, intensity) are generally used for perceptually uniform colors. The HSI color space is often used in pseudo-color coding. It is a color model based on psychological motivation, in which the hue is used to visualize the values or the taxonomic space. In [2] is reported a pseudo-color encoding based on HSI space that transforms a gray level image into HSI components and then the pseudo-color encoding is accomplished by converting HSI ⇒ RGB. In the literature also pseudocolor codings are reported based on the CIE LaB color space. The latter methods of coding pseudo-color applied to monochrome images may work better than traditional grayscale analysis as long as the colors in the images are reasonably well saturated. In such situations, the hue will tend to remain relatively constant in the presence of
420
9 Image Enhancement Techniques
shadows and other variations in illumination. As a result, images based on the HSI space can work properly. We conclude the paragraph by presenting a method of coding pseudo color operating in the frequency domain to improve the visual qualities of the images. The approach consists of applying one of the fundamental transformations (described in Chap. 2 vol. II, such as the Fourier transform, Walsh-Hadamard, …) on the monochromatic image. Subsequently, a high pass filter (see Sect. 1.19 vol. II), a bandpass filter (see Sect. 1.21 vol. II), and a low pass filter (see Sect. 9.13) are applied to the transformed image. Thus, three components (three filtered images) of the original image are obtained. To these filtered images is then applied the inverse transform (of Fourier or Walsh-Hadamard, ...) finally obtaining the three components in the spatial domain, that is, used as RGB components or other color spaces, for pseudo-color coding of the monochrome image.
9.7.3 False Color Images The false color technique is a point operator that transforms, with linear functions, the values of the color (or spectral) components of the input image into new values of the color space. The final aim is to visualize the image by representing the objects of the scene with colors that are completely different from the usual (in false color), highlighting, in particular, some aspects of the scene. For example, in a scene that includes a river or a beach, the river or the sea is represented with red color making it strange to an observer habituated oneself to see the same objects normally in blue or tending to black. Another necessity to use the false color technique is for the visualization of multispectral images acquired by satellite. In this case the various bands are combined appropriately to produce the three color components R, G, B, which offer an approximate representation in the natural colors of the observed scene. The relationship between the color components of the display (R, G, B)D and that of the input sensors (R, G, B)S s is given by RD = TR {F1 , F2 , . . . } G D = TG {F1 , F2 , . . . } (9.56) BD = TB {F1 , F2 , . . . } where TR , TS , TB are the transformations to be applied to the spectral bands Fi or to the input color components.
9.8 Improved Quality of Multispectral Images Point operators that apply to multispectral images are essential for improving the visual quality of the image and for the exaltation of some characteristic structures (roads, rivers, etc.) that are fundamental for the classification phase and interpretation
9.8 Improved Quality of Multispectral Images
421
of the scene. Multispectral images are normally acquired by aircraft or satellite (Landsat, Envisat, GeoEye-1, Worldview-1, Ikonos and QuickBird, . . .) and consist of several component images of the same size with different spatial resolutions that in relation to the field of application can reach the resolution even below the meter. The pixels of each component (also called the spectral band) represent the values of radiance (from ultraviolet to infrared) for a particular range of the spectrum. Each band is chosen appropriately to discriminate particular objects of the territory, for example, the blue-green band (450 ÷ 520 nm) is used for the study of seas, soil, and vegetation; the infrared band (2080 ÷ 2350 nm) for the study of rocks, etc. To accentuate the variations in reflectivity between the m and n bands, the pixel-by-pixel subtraction operation may suffice Dm,n (i, j) = Im (i, j) − In (i, j) Even the ratio between bands (pixe-by-pixel) can produce effective results Rm,n (i, j) =
Im (i, j) con In (i, j) = 0 In (i, j)
or Lm,n (i, j) = log[Rm,n (i, j)] = log[Im (i, j)] − log[In (i, j)] to be used as a remedy for not producing high values when the band In has very small radiance values. With N multispectral images you can combine N (N − 1) with possible differences or ratios between bands. Such combinations can be reduced by performing the differences or ratios with the following image obtained as the arithmetic mean A(i, j) between the N components N 1 In (i, j). A(i, j) = N n=1
9.9 Towards Local and Global Operators The homogeneous and nonhomogeneous point operators described above, essentially transform the gray levels of an image to improve the visual quality and to attenuate in some cases the noise introduced by the irregularities of the sensors. The informative content of an image is distributed over the various pixels that compose it. The gray level values of each pixel vary from black to white and vice versa based on the structures in the image. The intrinsic information present in the image can be associated with the basic structures present that can produce spatially, a low or high variability in the gray levels.
422
9 Image Enhancement Techniques
A structure-free image will present pixels with gray levels more or less with constant values. When there are several transitions with abrupt spatial variability in gray levels, it is said that the image has high dominant spatial frequencies (see Sect. 5.5). Conversely, when there is a dominance of transitions with low variability in gray level values, it is said that the image has low dominant spatial frequencies. The presence in the image of high spatial frequencies presuppose the existence of small structures with the size of one or more pixels in which high variations of gray levels occur, normally caused by the presence of edges or corners. The search for the presence of particular structures in the image, cannot be realized with point operators, but they need operators that for each pixel under consideration, calculate the new value of gray level, in relation to the value of the pixels in its nearby. For this purpose, local or global operators are required to process the input image to accentuate or remove a band of spatial frequencies, as well as for a particular low or high spatial frequency. For the analysis of the spatial frequencies (horizontal, vertical, oblique) present in the image, a very effective method is to perform a spatial frequency transform. This transform converts the image information from the spatial domain of the gray levels to the frequency domain (expressed in terms of module and phase). The most widespread is the Fourier transform along with others (Hadamard, Haar, sine, cosine) of lesser use. Previously, the transformation to the principal components (also known as Karhunen Loeve transform) was used to select the significant components of a multispectral image. The advantage of these transforms is to generate a new image that decomposes and significantly displays all the spatial frequencies in the input image. Analysis in the frequency domain can be an effective tool for understanding the most significant geometric structures present in the source image. Local or global image processing operations that alter the information content of an image is called (a) spatial filtering operators, if the basic frequencies in the spatial domain are manipulated (local operators); (b) frequency filtering operators, if the basic frequencies in the frequency domain are manipulated (global operators). Filtering techniques can be used 1. to accentuate the features present (extraction of edges and corners, enhance sharpness and textures) in the image; 2. to improve the quality of the image appropriately, for example, by leveling (smoothing) strong local variations of the gray levels or color values. In order to deepen the modeling of the filtering operators, in the following paragraphs, we will again refer to the theory of linear systems that have already been introduced in Sect. 5.7 together with the more general formalism represented by the convolution integral to model the physical process of image formation. Basically, some theoretical
9.9 Towards Local and Global Operators
423
foundations of the image formation process will be recalled to motivate the analogy with the image filtering operators.
9.9.1 Numerical Spatial Filtering Spatial filters are implemented through the spatial convolution process that processes the pixel value based on the values of the pixels in its neighboring. This model of elaboration has the mathematical foundations deriving from the theory of linear systems (linear operators) which assume significant importance for the analysis of the signals and images, and have been widely used in the telecommunications field. Filter and operator are words that in the following are used interchangeably to indicate a generic image transformation.
9.9.1.1 Linearity This transformation can be modeled through a linear 2D system of the type IO (i, j) = O(I (i, j)) I (i, j) → O{·} → IO (i, j) Input Linear Output Operator where the operator O{·} transforms the input image I into the output image IO or more formally it can be said that IO is the response of the operator O to the input image I . The linear operators are defined by the following principle of superimposition. Definition 1 If I1 (i, j) and I2 (i, j) are two images with dimensions N × M pixels, a and b arbitrary constants, and O represents an operator that transforms an image into another with the same dimensions, it is said that the operator O is linear, if and only if O{a · I1 (i, j) + b · I2 (i, j)} = a · O{I1 (i, j)} + b · O{I2 (i, j)}
(9.57)
This property tells us that if the input to the system consists of a weighted sum of different images, the system’s response is the superimposition (that is, weighted sum) of the responses to the individual input images (Fig. 9.28b). Definition 2 If I2 (i, j) = 0 it follows that O{a · I1 (i, j)} = a · O{I1 (i, j)}
(9.58)
424
9 Image Enhancement Techniques
(a)
(b)
Fig. 9.28 Graphical representation of the properties of linear systems. a Homogeneity, multiplying by a factor of two the input signal of the system the signal is amplified by the same factor; b Superimposition, the more input signals to the system generate the superimposition output, that is, the weighted sum of the system responses to the single input signals
called the homogeneity property of the linear operator O which has the following meaning: by multiplying the input image I1 by a constant a, the linear system responds with the appropriate value corresponding to the input image multiplied by the same constant a (Fig. 9.28a). The superimposition property can be extended to n images and makes the linear operator very useful in real problems. In fact, it is possible to decompose a complex image into several components thus using the operator results and subsequently recomposing the global results from the results of the individual components. Any system that does not satisfy these properties is called nonlinear and it is complex to analyze it.
9.9.1.2 Impulse Response and Point Spread Function With the superimposition property of linear operators we demonstrate whether it is possible to have information about the nature of the operator applied to an image observing only the output image obtained. This is achieved as follows:
9.9 Towards Local and Global Operators
425
1. Decomposing the input image I (i, j) into elementary components; 2. Evaluate the system’s response to each elementary component; 3. Calculate the overall system response for the desired input by simply adding the individual outputs. The appropriate mathematical tool for image decomposition is the Dirac Delta function. In fact, for the sifting property of the Dirac impulse, the image I (i, j) is defined as a linear combination of translated delta functions I (i, j) =
−1 N −1 M
I (l, k) · δ(i − l, j − k) =
−1 N −1 M
l=0 k=0
I (l, k) · δ(i, j; l, k)
(9.59)
l=0 k=0
where δ(l, k) = 0 l = 0, k = 0 δ(l, k) = 1 l = k = 0 and I (l, k) indicates the weight factor of the impulse function δ at the pixel (l, k) of the image. The last member of the (9.59) expresses the image decomposed (Fig. 9.29) into an image consisting of elementary functions weighed and shifted under the property of sifting δ(i, j; l, k) = δ(i − l, j − k). If the output of a linear system is defined as N −1 M −1 IO (i, j) = O{I (i, j)} = O I (l, k) · δ(i − l, j − k) (9.60) l=0 k=0
since the operator O is linear, for the superimposition of the output components, we can write O{I (l, k) · δ(i − l, j − k)}. (9.61) IO (i, j) = l
k
Moreover, I (l, k) is independent of i and j, therefore, from the homogeneity property, it follows: IO (i, j) =
−1 N −1 M l=0 k=0
I (l, k) · O{δ(i − l, j − k)} =
−1 N −1 M
I (l, k) · h(i, j; l, k)
(9.62)
l=0 k=0
where the function h(i, j; l, k) is the response of the system O[δ(i, j; l, k)] to the impulse δ localized in (l, k). In other words, we can affirm that h(i, j; l, k), called impulse response, is the operator’s response O at the input pixel (input pulse) at the position (i, j) of the input image. In optical systems the impulse response is called (Point Spread Function (PSF)) of the system. In Sect. 5.7 Image formation with a real optical system, the input image I (i, j) is normally a real image or the scene (objects) to be captured, while the output image IO (i, j) corresponds to the image
426
9 Image Enhancement Techniques
Fig. 9.29 Decomposing the image for the sifting property of the Dirac delta function δ(i, j, l, k), where (l, k) is the position of the impulse and (i, j) are considered the independent variables in the image plane
captured through the physical process of image formation with the aim of obtaining IO (i, j) very similar to the input image I (i, j). This result of the theory of linear systems is fundamental and suggests that if the operator’s response O to a pulse is known, by using Eq. (9.62) the response to any pixel I (i, j) can be calculated. The operator O is completely characterized by the impulse response. Because operators are defined in terms of PSF, an operator’s PSF represents what we get if we apply the operator on a point source O{point source} = PSF, that is, O{δ(i − l, j − k)} = h(i, j; l, k) where δ(i − l, j − k) is a point source of brightness 1 centered at the point (l, k). An image is a collection of point sources (pixels) each with its own brightness value. We can say that an image is the sum of these point sources.
9.9.1.3 Spatial Invariance The linear operator O{·} with the input–output relation IO (i, j) = O{I1 (i, j)} it is called spatially invariant or invariant to translation (shift invariant) if the response of the operator does not depend explicitly on the position (i, j) in the image. In other words, the operator O is spatially invariant if an input translation also causes an appropriate translation into output. Figure 9.30 gives a 1D graphic representation of a spatially invariant linear system. From the previous results, in particular, the Eq. (9.62), considering the input impulse at the origin (l, k) = 0, follows: h(i, j; l, k) = O{δ(i − l, j − k)} = h(i − l, j − k; 0, 0) = h(i − l, j − k)
(9.63)
9.9 Towards Local and Global Operators
427
Fig. 9.30 Linear spatially invariant operator: the impulse response of the system is the same for all input locations and consequently, a translation of the input signal produces an identical translation in the output signal
from which it turns out that the operator O is spatially invariant if the operation performed on the input pixel depends only on the two translations (i − l) and (j − k), and not from the position (i, j). Consequently, the shape of the impulse response does not change when the process moves in the various pixels (i, j) (input pulses) of the input image. If the Eq. (9.63) is not satisfied, the linear operator is called a spatial variant. Experimentally it is possible to verify if a linear system is spatially invariant. The system output is first determined by placing the impulse at the origin and then the impulse is translated. It is immediate to verify if the system’s response is translated by the same entity and has the same shape to satisfy the conditions of spatially invariant linear system.
9.9.1.4 Convolution Considering as valid for a linear system the properties of superimposition, homogeneity, and spatial invariance, we have the following linear operator: IO (i, j) =
−1 N −1 M
I (l, k) · h(i − l, j − k)
(9.64)
l=0 k=0
which is the convolution operator in the spatial domain between the input image and the impulse response function generally expressed in the following notation: IO (i, j) = I (i, j) ∗ h(i, j)
(9.65)
and it can be said that the output image IO derives from the input image I convoluted with the function h. It has been proven to be the only existing linear and spatially invariant operator. This operator, with the linearity and spatial invariance properties, described in the preceding paragraph, is known more generally as Linear Shift invariant-LSI system, understood as spatially invariant (or time-invariant, in relation to the domain of application), such that if the input signal is shifted spatially (or over time) then the output is also shifted by the same amount.
428
9 Image Enhancement Techniques
(a)
(b)
(c) +
+
Fig. 9.31 Convolution properties: a Commutative; b Associative and c Distributive
Compared to the spatially variant systems are much simpler as described by the impulse function h(i, j) with only two variables instead of the more general function h(i, j; l, k) with 4 variables (previously described).
9.9.1.5 Properties of the Convolution The convolution operator has interesting properties useful in image processing, especially in the implementation phase of the filtering algorithms. We list below the most significant ones. Commutative Property. mathematical form
The commutative property for convolution is given in
I (i, j) ∗ h(i, j) = h(i, j) ∗ I (i, j) =
−1 N −1 M
h(l, k) · I (i − l, j − k)
(9.66)
l=0 k=0
The commutative property allows the interchangeability between input image and impulse response without changing the output result (see Fig. 9.31a). Only linear invariant space systems have a commutative property. Although mathematically this interchangeability between input image and impulse response is possible from a physical point of view is usually not significant. Associative Property. Associative property suggests that the convolution of multiple functions is independent of the order in which it is performed. It follows that [I (i, j) ∗ h1 (i, j)] ∗ h2 (i, j) = I (i, j) ∗ [h1 (i, j) ∗ h2 (i, j)]
(9.67)
Any LSI system is associative. In Fig. 9.31b is schematized as the impulse response of two linear operators in sequence is given by the convolution of the single impulse responses h(i, j) = h1 (i, j) ∗ h2 (i, j). This property will be useful in the implementation of complex operators with the convolution (for example, to apply in cascade to an image the noise removal operator and then the contour sharpening one). Distributive property. The distributive property (see Fig. 9.31c) emerges directly from the definition of convolution and is expressed by the following relationships: I ∗ [h1 + h2 ] = I ∗ h1 + I ∗ h2
[I1 + I2 ] ∗ h = I1 ∗ h + I2 ∗ h
(9.68)
9.9 Towards Local and Global Operators Fig. 9.32 Diagram of a linear operator O
429
j
j Linear Operator O(•)
I(i,j)
i
Characterized from the PSF
Input image
IO(i,j)
i
Output image
Any LSI system enjoys distributive ownership. It follows that, two operators made in parallel have an impulse response identical to that of an operator having as an impulse response the sum of the two, i.e., h = h1 + h2 . As shown in the figure, two or more LSI systems can share the same input image I and then produce the output image g from the sum of the individual results. Property of Separability. If the input image and the impulse response are separable, we have, respectively, that I (i, j) = I1 (i) · I2 (j) and h(i, j) = h1 (i) · h2 (j) and follows: [I1 (i) · I2 (j)] ∗ [h1 (i) · h2 (j)] = [I1 (i) ∗ h1 (i)] · [I2 (i) ∗ h2 (j)]
(9.69)
where I1 , I2 , h1 , h2 are 1D versions. The result of the convolution between two separable images produces a separable image.
9.9.1.6 Summary Linear operators, and therefore filters, with spatial invariance and superimposition properties can be used to study a class of physical and biological phenomena. Unfortunately, several physical phenomena associated with the formation and elaboration of the image are not amenable to spatially invariant linear systems. However, in these cases it is attempted to approximate the PSF, which describes the system, to a spatially quasi-invariant system, for example, operating on limited regions of the image. The LSI theory was already tested when the process of image formation with the convolution operation was studied, and the optical system was completely described by its PSF. The The process of digital image processing (filtering operator) and the process of the image formation (physical process) are both described as convolution operations, i.e., they are realized by a physical process that can be modeled with a linear spatially invariant system. The fundamental characteristic of linear systems is to be able to study the effects of an operator applied to an input image starting only from the impulse response. In the case of image processing, it is possible to study the characteristic of a linear operator by applying it to a sample image whose structure is known a priori. Next, we analyze whether or not the output image has the desired results with that particular operator applied. The impulse response is the output of the system produced in response to an input impulse. The typical diagram of a linear operator O is schematized in Fig. 9.32.
430
9 Image Enhancement Techniques
Real applications, in the field of image processing of linear systems, concern the following operations: (a) High pass filtering; (b) Low pass filtering; (c) Bandpass filtering. These operations can be applied both in the spatial and in the frequencies domain (Fourier analysis). In this last domain, we will see that the convolution will be simplified.
9.10 Spatial Convolution Let’s now see how the convolution process can be achieved by remembering the initial interest of realizing local operators that process the value of the output pixel based on the pixel value of its around in the input image.
First aspect, how to combine the values of the gray levels of pixels in the vicinity of the pixel being processed. Second aspect, how large the area of neighboring pixels involved by the operator should be. This area is called with one of the following names: window, kernel, filter mask, or convolution mask.
The discrete convolution operator functions as a linear process in that it performs the sum of the value of the elements (pixels) multiplied by the constant values (weighted sum). The elements are the intensity values of the pixels that overlap the mask and the constant values are the weights that are the convolution coefficients stored in the mask itself.
9.10.1 1D Spatial Convolution In the previous paragraph we introduced the linear spatially invariant system (LSI), expressed in terms of convolution, highlighting that the response of an LSI system for an input signal (or image) is given by the convolution of that signal with the response of the system to the impulse modeled by PSF. More formally we can say: the action of a function h(x), that in each point alters (or weighs) another function f (x), is an operator known as integral of convolution (that is, the integral of the product of the function f (x) and h(x)) and can affirm that f (x) is convoluted with the function h(x) or vice versa. The 1D expression of the convolution of two spatially
9.10 Spatial Convolution
431
continuous functions f (x) and h(x) is given by ∞ g(x) = f (x) ∗ h(x) =
f (τ ) · h(x − τ )d τ
(9.70)
−∞
In the discrete context, indicating the impulse response h(i), in correspondence of a 1D input signal f (i), the output signal g(i) is given by the following convolution expression: f (l) · h(i − l) (9.71) g(i) = f (i) ∗ h(i) = l
To concretely understand the functioning mechanism of the convolution operator, we consider as a simple example a discrete input signal given by f (i) = {2, 3, 4} and the impulse response given by h(i) = {2, 1}. From Fig. 9.33a it is noted that the input signal has nonzero values for i = 0, 1, 2 while the impulse response has values for i = 0, 1, 2. By applying the (9.71) the convolution values of the output signal g(i) are calculated at points i = 0, 1, 2, . . . as the overlap (i.e., the weighted sum) of all points overlapping between the two functions h(i) and f (i). For the example in question, according to Fig. 9.33, the convolution values are calculated as follows: g(0) = g(1) = g(2) =
∞ l=0 ∞ l=0 ∞ l=0
g(3) =
∞ l=0
g(4) =
∞
f (l) · h(0 − l) = f (0) · h(0 − 0) + f (1) · h(0 − 1) = 2 · 2 = 4 f (l) · h(1 − l) = f (0) · h(1 − 0) + f (1) · h(1 − 1) = f (0) · h(1) + f (1) · h(0) = 2 + 6 = 8 f (l) · h(2 − l) = f (0) · h(2 − 0) + f (1) · h(2 − 1) + f (2) · h(2 − 2)+ f (0) · h(2) + f (1) · h(1) + f (2) · h(0) = 2 · 0 + 3 · 1 + 4 · 2 = 0 + 3 + 8 = 11 f (l) · h(3 − l) = f (0) · h(3 − 0) + f (1) · h(3 − 1) + f (2) · h(3 − 2) + f (3) · h(3 − 3)+ f (0) · h(3) + f (1) · h(2) + f (2) · h(1) + f (3) · h(0) = 2 · 0 + 3 · 0 + 4 · 1 + 0 · 0 = 4 f (l) · h(4 − l) = 0.
l=0
It is observed that the first value of the convolution g(0) was calculated with only one term f (0) · h(0), all the others are zero and do not contribute to the calculation of the convolution value (see Fig. 9.33b). It is also noted that, from the value of i ≥ 4, the output signal is null (there is no overlap between the domains of the two definitions), and therefore, the output signal values are g(i) = 4, 8, 11, 4. We now consider in detail the terms of each output value g(i) which for better readability are rewritten g(0) = f (0) · h(0) g(1) = f (0) · h(1) + f (1) · h(0) g(2) = f (0) · h(2) + f (1) · h(1) + f (2) · h(0) g(3) = f (0) · h(3) + f (1) · h(2) + f (2) · h(1) + f (3) · h(0)
432
9 Image Enhancement Techniques
g(i)=f(i)*h(i) f(i)
h(i)
(a)
=
* i
i
i
(d)
(b) h(0-i)
g(0)
i
h(2-i)
i
(e) g(1)
i
i
i
(c) h(1-i)
g(2)
g(3)
h(3-i)
i
i
i
Fig. 9.33 Sample graphical representation of a 1D convolution between two discrete signals f (i) = 2, 3, 4 and h(i). In a the two convoluted signals and the output signal g(i) are represented; In b the specular function of h is represented at point 0, i.e. h(0 − i), and the convolution value g(0); In c, d and e are the values translated by the specular function of h and the convolution values respectively for the points of i = 1, 2, 3 where the domains of the two functions overlap. For i ≥ 4, the convolution values are zero
from which emerges a characteristic of the inner products sequences between the two functions. In fact, in the calculation at each point of the output signal g(i), it appears that the sum of the inner products between elements of function f (i) and elements of function h(i), the index i of f (i) is increasing while in h(i) the index i decreases up to the value 0. In other words, the elements of the function h(i) are reversed in the calculation of the inner product between the two functions. In Fig. 9.33b is graphically schematized the entire convolution process between the two functions for calculating g(i) in the only points of the output signal where the convolution value is nonzero. From the analysis of this example, we can deduce in general that the convolution process, seemingly simple, provides for the calculation in each point x of the output signal the following steps: 1. Produce the specular of h, i.e., h(−τ );
9.10 Spatial Convolution
433
Fig. 9.34 1D convolution operator
2. Shift h(−τ ) in the output point x thus obtaining h(x − τ ) where the convolution value g(x) is to be computed; 3. Perform the inner product with f (τ ) i.e. f (τ ) · h(x − τ ); 4. Calculate the integral (summation in the discrete case) among (−∞, +∞) with Eqs. (9.70) or (9.71), respectively, in the continuous or discrete case; 5. Repeat the previous steps for each x point in the output range. A graphical representation of the 1D convolution operator between more complex functions in the continuous case is shown in Fig. 9.34. For the function h(τ ), its mirror image h(0 − τ ) is first derived from the origin, and then it is translated in the point x, obtaining h(x − τ ). With respect to point x, the convolution operation is calculated by executing the product point by point between the nonzero input signal f (τ ) and the specular function h(x − τ ) of the function h(τ ). The convolution operation is repeated analogously for other x points to obtain the complete output signal from the linear system. Each value of g(x), of the output signal, depends on the degree of overlap between the input signal and the characteristic function of the linear system which is translated for each value of x (see Fig. 9.34). As shown in the graph, the result of the convolution g(x) represents the overlapping area between the two functions to be convoluted f (x) and h(x). Let’s return now to consider the function f as the input signal and the function h as the response of an LTI impulse system that determines the characteristics of the output signal g. The first aspect highlighted at the beginning of the paragraph, for how to combine, for example, the pixel value (input signal) with the impulse response, in the vicinity of the pixel processing, with the convolution process is resolved. The second aspect, however, is that the domain of influence of the impulse response has been only partially considered. In the convolution examples considered, the definition intervals of the two convoluted functions were limited. In real applications, they are normally limited and the extension of the output signal definition range is given by the sum of the definition intervals of the two functions. The limits of the integration intervals (or summation) must be determined from time to
434
9 Image Enhancement Techniques 0
1
2
0
1
2
3
4
2
1
*
f(i)
=
0
1
2
3
4
8 11
4
h(i)
g(i)
i
-1
0
1
2
-1
0
1
2
-1
0
1
2
-1
0
1
2
f(i)
0
2
3
4
0
2
3
4
0
2
3
4
0
2
3
4
x
h(0-l)
1
x
2
1
h(1-l)
+
g(i)
4 g(0)
x
2
h(2-l)
+
4
8 g(1)
1
0 x
2
h(3-l)
+
4
3
8 11 g(2)
1
2 +
4
8 11
4 g(3)
Fig. 9.35 Example of a 1D discrete convolution with management of the edges of the input signal
time at each output point x for calculating the convolution value g(x). The lower limit is determined by the maximum between the lower limit of f (x) and h(x − τ ) while the upper limit is given by evaluating the minimum between the upper limit of f (x) and h(x − τ ). In the discreet case if the two functions have N and M elements, respectively, the output signal length is given by N + M − 1 elements. Always in the discrete, being in conditions of limited intervals, you will have to manage the calculation at the extreme edges of the output signal. In Fig. 9.35 is shown the simple example of convolution of Fig. 9.33 with a graphic that highlights the situation of calculating the output signal at the edges. For better readability of the convolution process, the values of the functions to be convolved and the output signal is indicated as vectors. The convolution process starts by placing the mask vector h(0−l) at the first element of the input signal vector f (0) to compute the first element of the output signal vector g(0). In this edge situation, the input signal was extended with the element f (−1) = 0, for convenience of calculation, to execute the inner product between the mask h and subvector f (l), l = −1, 0 and then added to produce the first output element g(0). The convolution process continues by translating an element to the right of the mask to calculate the other output elements until it reaches the calculation of the last output element g(3) (remembering that the output signal length is four elements as above exposed) for which it was also necessary to extend the input signal to the right by adding the element f (3) = 0.
9.10.1.1 Edge Management in 1D Convolution Basically, since the convolution is a local operator (acts on a surrounding of the input signal) the problem of the edges naturally emerges in the calculation of the output elements that are at the ends of the vector. As shown in Fig. 9.35 when calculating the element g(0) there are no elements to the left of f (0), and therefore, there are not enough elements to calculate g(0) according to the convolution definition. Therefore,
9.10 Spatial Convolution
435
Fig. 9.36 1D convolution operator, three ways to manage edges: Zero Padding, Periodic Replication, and Specular of the input signal
0
1
2
0
1
2
3
4
*
2
1
1
2
4
0
1
2
4
2
1
2
4
4
f(i)
0
2
4
0
3
3
1
2
0
2
1
2
4
2
1
2
2
2
3
3
3
=
g(i)
h(i)
0
0
=
4
8 11
4
3
4
=
8
8 11
8
3
2
=
6
8 11 12
the need arises to find a heuristic solution for the situation of the edges in relation to the type of application. In the considered example it was assumed that f (i) was, for example, an audio signal with zero volume before the acquisition started and at the end. In the literature there are several heuristics for the management of the edges based on the addition, at both extremes of the input signal, of virtual elements with different values, which are summarized as follows: (a) (b) (c)
Fill the additional elements with zeros. Periodic replication of the input signal. Specular replication of the input signal.
Figure 9.36 shows an example of edge management for the three heuristics indicated above. The dimension of the mask having an odd number of elements is observed, a solution normally used to make the calculation of the convolution symmetrical with respect to the element of the input signal being processed. In the example, with the three-element mask, each element of the output signal is generated by the weighted sum of the corresponding input elements and those of the mask with the center element positioned symmetrically with respect to the element being processed. For edge management, the additional elements at the two extremes of the input signal are blue. Yellow shows the modified elements of the output signal due to the heuristics applied to the edge management. Other heuristics can be taken, for example, by properly trunking the output signal.
9.10.2 2D Spatial Convolution In the Sections, Image Formation 5.7 and Numerical Spatial Filtering 9.9.1, the theory of Linear Spatially Invariant (LSI) systems applied to 2D signals (images) was
436
9 Image Enhancement Techniques
introduced from which the convolution integral was derived, which in the continuous context we rewrite ∞ ∞ g(x, y) = f (x, y) ∗ h(x, y) =
f (τ, β) · h(x − τ, y − β)d τ d β
(9.72)
−∞ −∞
where the function h(i, j) characterizes the convolution operator by altering the image (the 2d signal) of input f (x, y), giving output the image g(x, y). In the discrete context, indicating the impulse response h(x, y), in correspondence of a 2D input image f (x, y), the output image g(i, j) is given by the following convolution expression with double summation g(i, j) =
∞
∞
f (l, k)h(i − l, j − k)
(9.73)
l=−∞ k=−∞
In analogy to the 1D case, the process of 2D convolution between 2D functions involves the following steps: 1. Produce the specular of h, i.e., h(−τ, −β) that, in fact, is equivalent to the 180◦ rotation of h(τ, β) with respect to its origin; 2. Translate h(−τ, −β) to the point (x, y) of output thus obtaining h(x − τ, y − β) where we want to calculate the convolution value g(x, y); 3. Execute the inner product with f (τ, β) i.e. f (τ, β) · h(x − τ, y − β) (the two 2D functions are multiplied element by element); 4. Calculate the double integral (double sum in the discrete case) between (−∞, +∞) to obtain the output value g(x, y) with Eqs. (9.72) or (9.73), respectively, in the continuous or discrete case; 5. Repeat the previous steps for each point (x, y) of the output image g. In Fig. 9.37 the entire 2D convolution process is shown for a simple image f (i, j) of 4 × 4 elements considering a mask h(i, j) of 3 × 2 elements (Fig. 9.37a). We observe (see Fig. 9.37b) that before starting the convolution process we obtain h(−l, −k) by rotating 180◦ h and we translate it at the point (0,0) of the output image to calculate the corresponding element g(0, 0) (see Fig. 9.37c). Then h(−l, −k) is translated into each point (i, j) of the input image to execute the product between the portion of the image superimposed on the mask, element by element, and the result of the products is summed to obtain each output element g(i, j). It is graphically observed that the elements g(i, j) are calculated only where there is an overlap between the two matrices representing f and h. Nonoverlapping zones, at the edges of the input image, are assumed to have null elements added virtually.
9.10 Spatial Convolution
437
(a) (b)
(c)
Fig. 9.37 Example of the graphical representation of 2D convolution between 2D image f (i, j) and 2D mask h(i, j) of 4 × 4 and 3 × 2 elements, respectively
9.10.2.1 Extension of the Impulse Response In the discrete, as evidenced by the example of 2D convolution, the convolution works on limited dimensions of both the input image and the impulse response. If we consider the input image f (i, j) of dimensions M ×N and the impulse response h(i, j) of L1 ×L2 , the output image g(i, j) will be of dimensions [(M +L1 −1)×(N +L2 −1)]. In the example of Fig. 9.37 with the input image size 4×4 and with the mask size 3×2, the output image size is 6 × 5 elements. As we will see in the following, normally a square mask is chosen represented in matrix form with variable dimensions, for example 3 × 3, 5 × 5, etc., and the pixel in the elaboration that receives the result of the convolution is located at the center of the window, at the odd size.
438
9 Image Enhancement Techniques
Fig. 9.38 Graphical representation of the calculation of the 2D convolution in the edges of the input image in reference to the previous example by applying heuristics: Zero Padding, Replication Periodic, and Specular
9.10.2.2 Edge Management in 2D Convolution In analogy to the convolution 1D even in the 2D context, it becomes necessary to manage the edges. Figure 9.38 shows the situation for the convolution calculation at the edges of the output image for the 2D convolution example highlighted in Fig. 9.37. In the latter example, for the calculation of the convolution at the edges, we had considered zero padding in the elements of the missing input image. Also for the convolution 2D, we can adopt the solutions proposed for the 1D context that we summarize with other heuristics (a) (b) (c) (d)
Fill with zeros additional elements (Zero Padding). Periodic replication of the input signal (Wrap Around). Specular replication of the input signal (Reflection). Spatially varying the mask at the edges violating the conditions of spatial invariance. (e) Reduce the size of the output image by deleting rows and columns for which convolution cannot be calculated. The figure highlights the elements of the output image that have been modified with the edges management with periodic replication and specular against the zero padding handling, always considering the same impulse response.
9.10.2.3 Computational Complexity The convolution process requires considerable calculation, in particular, in the 2D context. The computational complexity of the 2D convolution, considering an input image of M × N elements and a mask of dimensions L1 × L2 , implies for each element of the image, L1 × L2 multiplications and sums. It follows that the resulting computational complexity is of the order of O(M · N · L1 · L2 ). For some applications it may be useful to apply the convolution with two masks h1 and h2 in sequence. In this case, for the associative property of the convolution (see Sect. 9.9.1) one would
9.10 Spatial Convolution
have
439
f (i, j) ∗ h1 (i, j) ∗ h2 (i, j) = [f (i, j) ∗ h1 (i, j)] ∗ h2 (i, j) = f (i, j) ∗ [h1 (i, j) ∗ h2 (i, j)]
(9.74)
From the analysis of the computational cost it is observed that [f (i, j) ∗ h1 (i, j)] ∗ h2 (i, j) involves O(2·M ·N ·L1 ·L2 ) operations, whereas, for f (i, j)∗[h1 (i, j)∗h2 (i, j)] are necessary O(L21 · L22 + M · N · L1 · L2 ) operations. Computational complexity is significantly reduced with the size of the mask much smaller than the input image (L1 M and L2 N ).
9.10.2.4 Peculiarities of the Convolution Mask The convolution mask h(i, j), a rectangular matrix of coefficients, has a central role in the calculation of the output image that has the purpose of (a)
Determine which elements of the input image are involved in the vicinity of the pixel being processed; (b) Imposing how these elements in the vicinity of the pixel being processed can influence the output image; (c) Weigh the input elements with the coefficients, which are then added together to produce the output pixel value; (d) Overlap the mask reference element with the corresponding element of the input image to be processed; (e) Influencing the computational complexity. In real applications, as highlighted above, images and masks are represented by matrices with finite rectangular dimensions. If we consider the input image f (i, j) of size M × N and the mask of size L1 × L2 , Eq. 9.73 of discrete convolution 2D becomes −1 M −1 N f (l, k)h(i − l, j − k) (9.75) g(i, j) = l=0 k=0
The convolution value at the pixel (i, j) in the output image is obtained from the sum of the pixel-by-pixel product between the input function f (l, k) and the impulse response h(i − l, j − k), derived from h(l, k) by first executing a rotation of 180◦ with respect to the origin, thus obtaining h(0 − l, 0 − k), and then translated from the origin to the position (i, j) of the element to be processed (Fig. 9.39). In this case the reference element of the mask is considered as (0, 0). The first index of the arrays represents the columns, and the second indicates the rows. The convolution operator summation has null components (product f · h is zero) in nonoverlapping areas between the f and h matrices.
440
9 Image Enhancement Techniques
(a)
(b) Fig. 9.39 Graphical representation (a) and analytic (b) of the discrete 2D convolution process
For the commutative property of the convolution operator, the previous equation can be convenient to express it as follows: g(i, j) =
−1 M −1 N
f (i − l, j − k) · h(l, k)
(9.76)
l=0 k=0
These last two equations, of the convolution operator, suggest that, before the product pixel-by-pixel, one of the two, the portion of the input image involved or the impulse response, can be rotated by 180◦ and translated indifferently. In fact, as we will see in the example (Fig. 9.39b), using both the Eqs. (9.75) and (9.76), the indices generated by the double summation realize the equivalent effect of the rotation of the convolution mask h(l, k) and the correspondence with the elements of the input image to be processed, f (i − l, j − k) is realized with respect to the element to be processed (i, j). Given the computational complexity required for 2D convolution and for implementation reasons it is convenient to use convolution operators in the form g(i, j) =
r r l=−r k=−r
f (i − l, j − k) · h(l, k)
(9.77)
9.10 Spatial Convolution
441
with the impulse response window h(l, k) with the indices l and k referenced with respect to the central pixel h(0, 0). The square dimensions L × L of the window are chosen with odd value and with r = (L − 1)/2. The value of the convolution g(i, j) at the point (i, j) is calculated only for the pixels of the input image that overlap with the h window whose center pixel h(0, 0) is positioned with the pixel f (i, j) to be processed of the input image. Figure 9.39a shows the graphic scheme of the convolution process for calculating the generic element g(i, j) assuming a 3 × 3 mask with the central pixel h(0, 0). The elements of the mask are indicated in lowercase letters, and the input image elements involved in the convolution are shown with uppercase letters. Note the rotation of the mask h(−l, −k) before starting processing, its translation into the generic element (i, j) to be processed in the input image f (i, j), the intermediate process of the multiplication element by element between mask and portion of the input image, and final process of the summation of the products for calculating the output element g(i, j). It is observed that Fig. 9.39b analytically implements convolution by developing the calculations for the element g(i, j) based on Eq. (9.75) which actually calculates the indices of h and f so that the product element by element occurs with the effect of 180◦ rotated mask as shown graphically in Fig. 9.39a. The same results are obtained by developing with the (9.77) as follows: g(i, j) = f (i + 1, j + 1) · h(−1, −1) + f (i + 1, j) · h(−1, 0) + f (i + 1, j − 1) · h(−1, 1) +f (i, j + 1) · h(0, −1) + f (i, j) · h(0, 0) + f (i, j − 1) · h(0, 1) +f (i − 1, j + 1) · h(1, −1) + f (i − 1, j) · h(1, 0) + f (i − 1, j − 1) · h(1, 1) = N ·a+F ·d +C·g +H · b + E · e + B · h +G · c + D · f + A · n
(9.78) where the constants A, B, . . . indicate the local elements of the input image f (i, j) involved in the processing, a, b, . . . are the coefficients of the convolution mask, i.e., the discrete values results from the sampling of the impulse response h(i, j) of the operator itself, and g(i, j) is the obtained result of the generic element of the output image identical to the value obtained with Eq. (9.77) as shown in Fig. 9.39. This convolution operation is repeated for all pixels in the input image f (i, j) of dimension M × N .
9.10.2.5 Symmetric Mask We have seen that the convolution mask h(i, j) is rotated by 180◦ around the reference element (0, 0) before being translated obtaining h(−i, −j). If h is intrinsically symmetric, i.e., h(−i, −j) the rotation required by convolution is unnecessary. It proves that if the mask is symmetric the convolution is equal to the cross correlation.
442
9 Image Enhancement Techniques
9.10.2.6 Circular Symmetric Mask In image processing applications it would be useful to have filters that alter the image uniformly in all directions: isotropic filters. An isotropic filter can be made with a mask with circular symmetry characteristics. A function h(x, y) is said to have circular symmetry if and only if it is definable as the square root of the sum of its variables, i.e., there is an hR with positive values such that h(x, y) = hR x2 + y2 (9.79) or equivalently h(x, y) = h(x cos θ + y sin θ, −x sin θ + y cos θ )
(9.80)
and in the latter definition it is said to be an invariant to rotation. In the discrete, considering the rectangular form of the images, one can obtain an approximate form of circular symmetry. A convolution mask with circular symmetry can be achieved by adequately sampling the 2D Gaussian function. In that case we would have the following convolution mask h(l, k) = hG
1 (l 2 + k 2 ) l2 + k 2 = √ e− 2σ 2 2π σ
(9.81)
In the following we will see how σ the standard deviation can characterize the output image. In Fig. 9.40 is represented the convolution mask with circular symmetry hG based on the sampled values of the Gaussian discrete impulse response together with the 1D projections characterized with the same value of σ . Designing a filter means calculating the coefficients h(i, j) that represent a good approximation of the impulse response, and therefore, of the operator you want to apply to the input image. Remember that convolution is a spatially invariant operation and consequently the convolution coefficients (filter weights) do not change from pixel to pixel during the convolution process.
Fig. 9.40 Function h: sampled Gaussian impulse response
9.10 Spatial Convolution
443
9.10.2.7 Separability of Convolution This property of the convolution allows to realize a 2D convolution through two sequential operations of 1D convolutions. This is possible if the convolution mask is separable, that is, (9.82) h(i, j) = hc (i) · hr (j) where hc (i) and hr (j) indicate the existence, respectively, of a vector-column of L × 1 elements and a vector-line 1 × L, respectively. These vectors correspond to the vertical and horizontal projections of decomposition of the 2D mask. The convolution equation for a separable mask is so rewritten g(i, j) = =
−1 M −1 N l=0 k=0 M −1
f (l, k) · hc (i − l) · hr (j − k)
hc (i − l)
l=0
N −1
(9.83)
f (l, k) · hc (i − l)
k=0
where the internal summation represents the 1D convolution of each line of the image f (i, j) with the hr line-mask. The external summation is instead the convolution of each column of the intermediate result with the hc column-mask. For the associative property of the convolution, that is, g(i, j) = [hc (i) · hr (j)] ∗ f (i, j) = hc (i) ∗ [f (i, j) ∗ hr (j)] = hr (j) ∗ [hc (i) ∗ f (i, j)]
(9.84)
The separable convolution 2D can be performed by inverting the order of the 1D convolutions, first by row then by column, or vice versa, between the input image and the related 1D masks hr and hc , respectively. Of course, not all 2D masks can be separable. It is possible to realize several starting from the vertical and horizontal projections and then calculating the corresponding 2D mask with the (9.82). In the literature, there are some approaches to verify the separability of the h matrix. A method is based on decomposition to individual values known as SVD (Singular Value Decomposition) (described in Sect. 2.11 Vol. II). Simple examples of separable masks are shown in Fig. 9.41 made from the 1D profile of the rectangular and Gaussian impulse response. For the latter, the variance is the same for both 1D projections and 2D Gaussian mask.
9.10.2.8 Computational Complexity of the 2D Separable Convolution In this context, the computational complexity, for an image of M × N elements and a mask of dimensions L × L, is reduced by O(M · N · L2 ) to O(M · N · (L + L)) = O(2L · M · N ), that is, it reduces by a factor L/2 operations. In order to limit the calculation times, you should limit the size of the mask to 3×3 or 5×5 matrices. The nonseparable convolution operator for a 3 × 3 mask and for an image of 512 × 512 it would require 226400 × 9 ≡ 2037000 multiplications and 2037000 additions. The overall operations would be reduced instead to 1358400 multiplications and 1358400 additions with a separable mask.
444
9 Image Enhancement Techniques
Fig. 9.41 Examples of 2D separable convolution masks rectangular and Gaussian function
With modern multiprocessors and dedicated processing systems (example pipeline architectures available on PC video cards) convolutions can be achieved with reasonable calculation times. In implementing the convolution, you will also have to take into account the time of access to the mass storage devices to access the input image (which can be large, even of several gigabytes) and to save the output image in processing. In the convolution equation, the already processed g(i, j) pixels are not involved during the convolution process. This implies that the convolution produces as a result of a new image g(i, j) that must be saved in a memory area separate from the input image f . In the hypothesis of working on sequential processor, the necessary memory could still be optimized, saving the processed pixels of r lines above the i-th line being processed. An alternative approach is to temporarily save in a buffer (r + 1) lines of the input image being processed, while the pixel under consideration (i, j) is saved in the same position as the input image. In the case of a separable mask, while the calculation time is optimized, on the other hand, additional memory space is required to manage the intermediate result of the first 1D convolution that must be reused by applying the second 1D convolution.
9.11 Filtering in the Frequency Domain Previously we introduced the usefulness of studying the spatial structures of an image in the frequency domain that more effectively describe the periodic spatial structures present in the image itself. To switch from the spatial domain to the frequency domain, there are several operators that are normally called transformation operators or simply transformed. Such transforms, e.g., that of Fourier which is the best known, when applied to the images, they decompose it from the gray level structures of the spatial domain to the components in fundamental frequencies in the frequency domain. Each frequency component is expressed through a phase and modulus value. The inverse transform converts a structured image into frequencies, reconstructing the original spatial structures of the image backward. The complete treatment of the Fourier transform and the other transforms are described in the following chapters.
9.11 Filtering in the Frequency Domain
445
The next paragraph briefly describes the discrete Fourier transform (DFT) for digital filtering aspects in the frequency domain.
9.11.1 Discrete Fourier Transform DFT The DFT is applied to an image f (k, l) with a finite number of elements N × M , obtained from a sampling process at regular intervals, and associates to f (k, l) the matrix F(u, v) of dimensions N × M given by F(u, v) =
−1 N −1 M
f (k, l)B(k, l; u, v)
(9.85)
k=0 l=0
where F(u, v) represent the coefficients of the transform (or Fourier image), and B(k, l; u, v) indicates the images forming the base of the frequency space identified by the u-v system (variables in the frequency domain) each with dimension k × l. In essence, the F(u, v) coefficients of the transform represent the projections of the image f (k, l) on the bases. These coefficients indicate quantitatively the degree of similarity of the image with respect to the bases B. The transformation process quantifies the decomposition of the input image f (k, l) in the weighted sum of the base images, where the coefficients F(u, v) are precisely the weights. The Eq. (9.85) can also be interpreted considering that the value of each point F(u, v) of the Fourier image is obtained by multiplying the spatial image f (k, l) with the corresponding base image B(k, l; u, v) and adding the result. The values of the frequencies near the origin of the system (u, v) are called low frequencies while those farthest from the origin are called high frequencies. F(u, v) is a continuous and complex functions. The input image f (k, l) can be reconstructed (retransformed) in the spatial domain through the coefficients of the transform F(u, v) with the equation of the inverse Fourier transform, that is, f (k, l) = F −1 (F(u, v)) =
N −1 M −1
F(u, v)B−1 (k, l; u, v)
(9.86)
u=0 v=0
The Fourier transform applied to an image produces a Fourier coefficient matrix of the same image size that fully represents the original image. The latter can be reconstructed with the inverse Fourier transform (9.86) which does not cause any loss of information. As we will see later, it is possible to manipulate the pixels of the image f (k, l) in the spatial domain and see how it changes in the frequency domain (or spectral), or vice versa to modify the Fourier coefficients F(u, v) in the spectral domain and see how the original image is modified after reconstruction. The basic images of the transformation are represented by sine and cosine functions and the transformation of the image f (k, l) is given by F(u,v)= √ 1
NM
N −1 M −1 k=0 l=0
vl vl f (k,l)· cos 2π uk +j sin 2π uk N +M N +M
(9.87)
446
9 Image Enhancement Techniques
in which the variables (u, v) represent the spatial frequencies. The function F(u, v) represents the frequency content of the image f (k, l), which is complex and periodic in both u and v with period 2π . The cosine represents the real part and the sine is the complex part, thus obtaining the general expression F(u, v) = Re (u, v) + jIm (u, v)
(9.88)
9.11.1.1 Magnitude, Phase Angle, and Power Spectrum The real component Re (u, v) and the imaginary component Im (u, v) of the complex coefficients F(u, v) constituting the Fourier image do not have useful informative content. A more effective representation is obtained by representing each complex coefficient F(u, v) through its magnitude |F(u, v)| and phase (u, v). The spectral magnitude (or amplitude) is defined by |F(u, v)| = R2e (u, v) + Im2 (u, v) (9.89) which specifies how much of the intensities (magnitudes) of the base images are present in the input image, while the information about the orientation and shifts of the object are encoded by the phase angle given by −1 Im (u, v) (u, v) = tan (9.90) Re (u, v) The Fourier transform can be written in terms of its magnitude and phase F(u, v) = Re (u, v) + jIm (u, v) = |F(u, v)| ej(u,v) The power spectrum or spectral density P(u, v) of an image is defined as follows: P(u, v) = |F(u, v)|2 = R2e (u, v) + Im2 (u, v)
(9.91)
The Fourier transform (9.87) is useful to represent it also in a complex exponential form by applying to the trigonometric form the Euler relation ejx = cos x + j sin x, when rewritten becomes F(u, v) = √
1
−1 N −1 M
NM
k=0 l=0
f (k, l) · e
−2π j u Nk +v Ml
(9.92)
and the inverse DFT is f (k, l) =
N −1 M −1 u v 1 F(u, v)e2π j k N +l M NM
(9.93)
u=0 v=0
In physical reality a 2D signal (image) is obtained by the finite superposition of sinusoidal components ej2π(uk+vl) for real values of (u, v) frequencies. The 2D DFT,
9.11 Filtering in the Frequency Domain
447
being applied to a function with limited support (it is also said with limited band) or a sample image with a finite number of elements, constitutes a particular form of the continuous Fourier transform. The DFT realizes with Eqs. (9.92) and (9.93) a bijection and linear correspondence between spatial domain represented by the image f (k, l) and spectral domain F(u, v). In other words, fixed the Fourier bases, given the input image f (k, l) can be obtained with the (9.92) the spectral information given by F(u, v) (also called the spectrum of f ), and vice versa, starting from the frequency domain F(u, v) we can reconstruct the original image with the inverse transform given by Eq. (9.93).
9.11.1.2 DFT in Image Processing In general, DFT is a complex function. Therefore, the results of the transform F(u, v), a matrix of complex coefficients, can be visualized through the decomposition of the latter in terms of the three matrices, respectively, of the magnitude, phase angle, and spectral power. For an image with dimensions M × N , these quantities are defined with matrices of the same dimensions of the image. In the Fourier domain, the sinusoidal components that make up the image are characterized by these magnitudes that encode the information in terms of spatial frequencies, amplitude, and phase. The magnitude represents the information associated with the contrast or rather the variation (modulation) of intensity in the spatial domain at the various frequencies. The phase represents how the sinusoidal components are translated from the origin. All sinusoidal components of which the image is composed are encoded in terms of magnitude and phase in the Fourier domain for all spatial frequencies in the discrete range for u = 0, 1, 2, . . . , M − 1 and v = 0, 1, 2, . . . , N − 1. In other words, the DFT encodes all sinusoidal components from the zero frequency to the maximum possible frequency (according to the Nyquist frequency) depending on the spatial resolution possible for the image. In correspondence of (u, v) = (0, 0) there is no modulation, in fact, from the Eq. (9.92) we have that F(0, 0) is proportional to the average value of the image intensity F(0, 0) =
−1 N −1 M
f (k, l)
(9.94)
k=0 l=0
which is referred to as the DC component (also called a continuous or constant component) of the DFT. The DC component represents the sum of the input values f , to give it the meaning of the average often in the Fourier transform is inserted the term M1·N . A term with DC zero would mean an image with an average brightness of zero, which would mean the alternating sinusoid between positive and negative values in the input image. In the context of images represented by real functions, the DC component has positive values. It is also highlighted that the DFT of a real function generates a real part equal F(u, v) = F(−u, −v) and an odd imaginary part. It follows that the DFT of a real function is conjugated symmetric F ∗ (u, v) = F(−u, −v) (also called Hermite function), with the resulting spectrum a function
448
(a)
9 Image Enhancement Techniques
(b)
(c) 14 12 10 8 6 4
(d)
(e)
(f) 0 2 4 6 8 10
Fig. 9.42 The discrete Fourier transform applied to the a gray level image; b the spectrum not translated; c the spectrum translated with the DC component at the center of the matrix; d the original image a reconstructed with the inverse DFT using only the phase matrix; e the original image reconstructed using only the spectrum; and f the phase angle of the image
equal to the origin, i.e., |F(u, v)| = |F(−u, −v)| while the phase has odd symmetry with (u, v) = −(−u, −v). For better visualization of the magnitude and the power spectrum, considering that the values of the spectrum are very high starting from the DC component and then being very compressed in the high frequencies, it is convenient to modify the dynamic range of amplitudes with a logarithmic law, that is, FL (u, v) = c · log [1 + |F(u, v)|]
(9.95)
where the constant c is used to scale the variability range of F(u, v) with the dynamic range of the monitor (normally between 0 and 255). Figure 9.42 shows the results of the DFT applied to an image. In Fig. 9.42b the spectrum is displayed after the logarithmic transformation has been applied, the Eq. (9.95). It is observed that the higher values of the spectrum are at the four angles near the origin of the transform (the DC component, i.e., F(0, 0) results in an angle).
9.11 Filtering in the Frequency Domain
449
A peculiarity of the DFT is to be periodic.1 The sinusoidal components of which an image is composed are repeated indefinitely, and therefore, the image of the DFT repeats indefinitely with period M and N , respectively, along the axes u e v F(u, v) = F(u + M , v) = F(u, v + N ) = F(u + M , v + N )
(9.96)
It is shown by the Fourier analysis that the DFT of a matrix of M × N dimensions of real data (the case of images) generates a data matrix of the transform of the same dimensions and indexing u = 0, 1, 2 . . . , M − 1 and v = 0, 1, 2 . . . , N − 1 but with the following meaning of the frequencies. The one relative to (u, v) = (0, 0) represents the DC term given by (9.94). The frequencies indexed by (u, v) = (1, 1), (2, 2), . . . , (M /2 − 1, N /2 − 1) are considered positive while the negative ones are relative to the interval from (u, v) = (M /2 + 1, N /2 + 1, . . . , (N − 1). At the frequency (u, v) = (M /2, N /2) applying Eq. (9.92) the value of the transform is −1 −1 M
M N N Mk Nl −2π j 2M + 2N , = f (k, l) · e F 2 2
=
k=0 l=0 N −1 M −1 k=0 l=0
f (k, l) · e−π j[k+l] =
N −1 M −1
(9.97) (−1)k+l · f (k, l)
k=0 l=0
which can be considered as an alternative to the DC term sum of the input values. This term is real and constitutes the intersection point of the four parts in which the spectrum is divided at the frequency (u, v) = (M /2, N /2). Similarly, the symmetry property and the relationship of complex conjugates between the symmetric components with respect to the frequency (u, v) = M /2, N /2) can be verified
F M2 + 1, N2 + 1 = F ∗ M2 − 1, N2 − 1
F M2 + 2, N2 + 2 = F ∗ M2 − 2, N2 − 2 (9.98) ··· F(1, 1) = F ∗ (M − 1, N − 1) It can be concluded that for real input data, all the information of the spectrum resides in (u, v) = (0, 0) corresponding to the DC component, in the components indexed by u = 1, 2, . . . , M /2 − 2, M /2 − 1 and v = 1, 2, . . . , N /2 − 2, N /2 − 1, and to the component indexed by (u, v) = (M /2, N /2). All the other components, complex conjugate of the previous ones, indexed by u = M /2+1, M /2+2, . . . , M −2, M −1 and v = N /2 + 1, N /2 + 2, . . . , N − 2, N − 1 are not used because they do not give an added value.
1 The
periodicity property, given by the Eq. (9.96), is demonstrated by applying the equation of the Fourier transform (9.92) in the point (u + M , v + N ) and simplifying we get F(u + M , v + N ) = F(u, v).
450
9 Image Enhancement Techniques
(a)
(c)
(b)
F(0,0)
(d)
F(u,v)
1
4
3
2
3
4
2 1
Fig. 9.43 Graphical representation of the periodicity and symmetry properties of the DFT with an infinite number of replicas of the spectrum: a The DFT of the 1D rectangle function with the nontranslated spectrum; b with the spectrum translated with the term DC centered; c the DFT of the 2D rectangle function with the nontranslated spectrum indicated by the central rectangle; d the DFT translated into the spatial domain with the exponential factor (−1)k+l with the aim of reallocating the spectrum centered with respect to the DC term
The properties of periodicity Eq. (9.96) and of conjugated symmetry Eq. (9.98) of the DFT inform us that in a period (interval [0 : M − 1, 0 : N − 1]) in the spectral domain there are four quadrants in which are located the samples indexed, respectively, by u = 1, 2, . . . , M /2 − 1 and u = M /2 + 1, M /2 + 2, . . . , M − 1 and by v = 1, 2, . . . , N /2 − 1 and v = N /2 + 1, N /2 + 2, . . . , N − 1, and the replicates of related complex conjugate samples. Figure 9.43 displays the situation better in the 1D and 2D context. As highlighted above all the information of the real data transform is available in an entire period that can be better used with a translation into the location (u, v) = (M /2, N /2) of the current spectrum origin, coinciding with the DC component, as suggested by Eq. (9.97). It should be noted that the inverse transform of the DFT regenerates the periodic input function f (k, l) with the period inherited from the DFT. From the translation property of the DFT it is also known that a displacement of (u, v) in the frequency domain produces F(u, v) ⇐⇒ f (k, l) =⇒ F(u − u, v − v) ⇐⇒ e
2π j u Mk +v Nl
f (k, l), (9.99)
that is, translation introduces an exponential multiplicative factor in the spatial domain. The reallocation of the spectrum with the displacement of the origin in the location (M /2, N /2), in the central point of the spectral matrix is obtained as follows: F(u − M /2, v − N /2) ⇐⇒ f (k, l) · eπ j[k+l] = f (k, l) · (−1)k+l
(9.100)
9.11 Filtering in the Frequency Domain
451
From these relationships we can, therefore, reallocate the spectrum by deriving the following: (9.101) F(u − M /2, v − N /2) = F[f (k, l) · (−1)k+l ] which suggests carrying out the translation in the spatial domain by multiplying each input datum by the factor (−1)k+l before the transform or by operating after the transformation with a direct translation in the frequency domain. In Fig. 9.42c is shown the spectrum of Fig. 9.42b translated with the term DC at the center of the spectral matrix. It is observed that the translation does not modify the content of the spectrum (magnitude invariance with the translation of the input data in the spatial domain). This is also highlighted by (9.99) where the coefficient of the exponential term is 1. Also remembering that e
2π j u Nk +v Ml
k l k l = cos 2π j u + v + j sin 2π j u + v M N M N
(9.102)
and it is instead noted that the phase is altered in proportion to the degree of translation (u, v). The effects of the rotation of an image in the frequency domain is better explained by operating in polar coordinates given by the following expressions: k = r · cosθ
l = r · sinθ
u = ωcosφ
v = ωsinφ
(9.103)
Expressing the Fourier pair with f (r, θ ) and F(ω, φ), and considering a rotation of the image f (k, l) of θ0 , the Fourier pair results f (r, θ + θ0 ) ⇐⇒ F(ω, φ + θ0 )
(9.104)
from which it emerges that with the rotation of f (k, l) of an angle θ0 in the spatial domain there is an identical rotation of the power spectrum F(u, v) and vice versa a rotation in the spectral domain corresponds to the same rotation in the spatial domain (see Fig. 9.44). From previous considerations we can highlight that, in the case of images, the information content of the power spectrum, through the magnitude of sinusoidal components, captures the intensity levels (levels of gray, color, . . . ), while the phase spectrum, through the shift values relative to the origin of the sinusoids, captures the information associated with the image morphology, orientation, and movement of the objects. In Fig. 9.42d we can see the reconstruction of the original image (a) using the inverse transform only the phase angle matrix displayed in Fig. 9.42f. Despite the lack of intensity information, the entire image is reconstructed with all the original morphological information corresponding to the person’s face. Therefore, without information on the phase, the spatial structure of the image is completely devastated to such an extent that it makes it impossible to recognize the objects present. Instead, the reconstruction of the original image using only the magnitude matrix is shown in Fig. 9.42e which highlights the total absence of the morphological structures of the image and how it was expected presents the spatial distribution of the intensity
452
(a)
9 Image Enhancement Techniques
(b)
(c) 15
10
5
(d)
(e)
(f) 15
10
5
Fig. 9.44 The discrete Fourier transform applied to the a gray level image; b the phase spectrum of the image (a); c the power spectrum of the image (a); d the original image (a) rotated by 45◦ (e) the phase spectrum of the image (d) rotated which is modified; and f the power spectrum of the rotated image which is identical to that of the original image
of the source image. With DFT and Inverse DFT (IDFT) it is possible to reconstruct complex images by adequately combining the phase angle matrices obtained from different images useful to create particular effect images. From the analysis of the Fourier transform, it follows that in image processing applications, operating in the frequency domain, the filtering algorithms can only modify the spectrum of the amplitude while that of the phase cannot be modified.
9.11.1.3 Separability of the DFT With reference to Eq. (9.92), the exponential e ing it in the following form:
−2π j u Nk +v Ml
can be factored by rewrit-
N −1 M −1 l 1 1 −2π jv Mk e−2π ju N F(u, v) = √ f (k, l) · e √ N k=0 M l=0 N −1 l 1 =√ F(k, v)e−2π ju N N k=0
(9.105)
with the expression in large brackets indicating the 1D transform F (k, v). Therefore, the 2D DFT transform is realized through 2 1D transforms: first a 1D transform is performed on each column of the input image obtaining the intermediate result
9.11 Filtering in the Frequency Domain
453
F(k,v) and then the 1D transform is performed on each row of the intermediate result F(k,v).
9.11.1.4 Fast Fourier Transform (FFT) To optimize the computational load of the DFT transform several algorithms have been developed that minimize redundant operations through the DFT separability property, symmetry, and intelligent data reorganization. The most common algorithm is Cooley-Tukey [1,3] which coined the term FFT (Fast Fourier Trasnsform). An image of size N × N would require a computational complexity of O(N 4 ) operations to obtain the DFT transform. The FFT algorithm reduces the computational load to O(N 2 logN ). With the separability of the DFT, the FFT transform is performed directly as two 1D transforms. Given the various applications of DFT in the field of image processing, the analysis of audio signals, the analysis of data in the field of physics and chemistry, the firmware versions of the FFT algorithm has been implemented.
9.11.2 Frequency Response of Linear System Previously we described the linear spatially invariant (LSI) systems for which the principle of superimposition is valid: the response to a linear combination of signals is the linear combination of responses to individual signals (like step or ramp signals). We know that the behavior of such systems is completely defined by the response of the system to the pulse function δ(x). In fact, if O is the linear operator of an LSI and h(x) is the response h(x) = O{δ(x)} of the system to the pulse δ(x), then the response g(x) = O{f (x)} to any input f (x) is given by the convolution g = f ∗ h. What happens to an LSI system if it is stressed by a frequency signal represented as a complex exponential of the type f (x) = ejux ? ejux
=⇒
H (u) · ejux
It can be observed that the LSI system produces in response an analogous signal of a complex exponential type with the same frequency u and only changed in amplitude by the factor H (u). By applying the theory of linear systems we can demonstrate this property. In fact, if we consider an LSI system with impulse response h(x), we know that for every input f (x) = ejux the response of the system g(x) is determined
454
9 Image Enhancement Techniques
by (9.70) the convolution integral, as follows: ∞ g(x) = h(x) ∗ f (x) =
h(τ ) · f (x − τ )dτ −∞ ∞
h(τ ) · eju(x−τ ) dτ
= −∞
∞ =e
jux
h(τ ) · e−juτ ) dτ
−∞
If the integral of the last member is defined, the signal at the output of the system is in the form g(x) = H (u) · ejux with
∞ H (u) =
h(τ ) · e−juτ ) dτ
(9.106)
−∞
where the function H (u), the frequency response, is called the Transfer Function of the system. It is noted that the function H (u) given by the (9.106) corresponds to the Fourier transform, i.e., it represents the spectrum of the impulse response h(x). The extension to 2D signals f (x, y) with a 2D transfer function H (u, v) is immediate by virtue of the DFT Eq. (9.92). This remarkable result justifies the interest of the Fourier transform for the analysis of LSI systems. At this point we can be interested to understand what is the frequency response G(u, v) of the system knowing the transfer function H (u, v) and the Fourier transform F(u, v) of the 2D input signal f (x, y). The answer to this question is given by the Convolution Theorem.
9.11.3 Convolution Theorem Given two functions f (x) and h(x), note the corresponding Fourier transforms F(u) and H (u), we know that the spatial convolution between the two functions is given by ∞ h(τ ) · f (x − τ )dτ h(x) ∗ f (x) = −∞
9.11 Filtering in the Frequency Domain
455
If with F we indicate the Fourier transform operator applied to the convolution h(x) ∗ f (x) we obtain ∞ ∞ F[h(x) ∗ f (x)] = h(τ ) · f (x − τ )dτ e−jux dx −∞ −∞ ∞
= =
−∞ ∞ −∞
h(τ )
∞ −∞
f (x − τ )e−jux dxdτ
h(τ )e−juτ · F(u)dτ
= F(u)
∞ −∞
(9.107)
h(τ )e−juτ dτ
= F(u) · H (u) where the first expression represents the Fourier transform of h(x) ∗ f (x) (shown in square brackets), in the second expression the integrals are exchanged, in the third one the spatial translational property of the transform is applied, while in the fourth, it is observed that the integral corresponds to the Fourier transform of the function h(x), that is, the Eq. (9.106). With the convolution theorem it has been shown that F[h(x) ∗ f (x)] = H (u) · F(u)
(9.108)
and it can be affirmed that
the convolution between two functions in the spatial domain corresponds to the simple multiplication of the functions in the frequency domain. By applying the inverse Fourier transform to the product H (u) · F(u) we get back the convolution of the functions in the spatial domain F −1 [H (u) · F(u)] = h(x) ∗ f (x)
(9.109)
The convolution theorem allows the study of the input/output relationship of an LSI system to be realized in the spectral domain thanks to the transformed and antitrasformed pair (Eqs. 9.108 and 9.109) rather than in the spatial domain through the convolution that we know is computationally complex and less intuitive than simple multiplication in the frequency domain. The convolution theorem also responds to the question posed at the conclusion of the previous paragraph on the prediction of the G(u) response of an LSI system when stimulated in input by a sinusoidal signal f (x) known the transfer function H (u) of the system.
456
9 Image Enhancement Techniques
Systemresponse:
. Systemresponse:
*
Fig. 9.45 Operative domains of an LSI system: spatial and frequency domain
Being able to obtain the Fourier transform F(u) of the input signal f (x), the output of the LSI system, results F(u)
H (u)
=⇒
G(u) = H (u) · F(u)
The convolution theorem asserts, in other words, that an LSI system can combine in the spatial domain the signals f (x) and h(x) in such a way that the frequency components of F(u) are scaled by the frequency components of H (u) or vice versa. Figure 9.45 graphically schematizes the behavior and response of an LSI system when it is stimulated by signals in the spatial domain and in the frequency domain.
9.11.3.1 Frequency Convolution The frequency convolution property states that the convolution in the frequency domain corresponds to multiplication in the spatial domain ∞ F(u) ∗ H (u) =
F(τ ) · H (u − τ )dτ = F[f (x) · h(x)]
(9.110)
−∞
derived by virtue of the duality (symmetry) property, i.e. F[F(x)] = f (−u) of the Fourier transform. We can say that: a function f (x) is modulated by another function h(x) if they are multiplied in spatial domain.
9.11.3.2 Extension to 2D Signals The extension of the previous formulas for 2D signals is immediate and using the same symbolism we rewrite the equations of the convolution theorem in the 2D spatial domain g(x, y) = f (x, y) ∗ h(x, y) (9.111)
9.11 Filtering in the Frequency Domain Fig. 9.46 Filtering process in the frequency domain using the FFT/IFFT Fourier transform pair
457
,v) F(u l)
I(k,
FFT IFFT
*
and G(u, v) = F(u, v) · H (u, v)
(9.112)
and the convolution in the frequency domain F(u, v) ∗ H (u, v) = F[f (x, y) · h(x, y)]
(9.113)
considering G, F, and H (transfer function) the Fourier transforms of the images, respectively, g and f , and of the impulse response function h.
9.11.3.3 Selection of the Filtering Domain What is the criterion for deciding whether to use a filtering operator in the spatial domain or in the frequency domain? The filtering, in the frequency domain, is very selective, allowing you to remove, attenuate or amplify specific frequency components or frequency bands of the input signal by designing the mask H (u, v) adequately. We can see that the filtering term is more appropriate by operating in the frequency domain, as well as the convolution mask term, that recalls the meaning of masking (filtering) some frequencies of the input signal. In Fig. 9.46 the whole filtering process is highlighted: convolution in the frequency domain. The input image f (i, j) (normally with integers, gray levels) is transformed by FFT in the frequency domain (complex numbers). A frequency mask is generated for the purpose of removing some of them. This is obtained, for example, by setting the frequencies to be eliminated in the mask to zero, while they are set to 1 at the frequencies that must be preserved. The convolution process in the frequency domain easily eliminates a specific frequency component u1 , v1 of the input image. In fact, for the (9.112) the response spectrum of the system G(u, v) is realized by multiplying the spectrum of the input
458
9 Image Enhancement Techniques
image F(u, v) with the spectrum of the mask H (u, v) which takes zero value in (u1 , v1 ), and therefore, this output frequency is canceled G(u1 , v1 ) = F(u1 , v1 ) · H (u1 , v1 ) = F(u1 , v1 ) · 0 = 0 The output image G(u, v) obtained in the frequency domain is later transformed back into the spatial domain through the IFFT (inverse FFT). The filtering in the frequency domain can be more advantageous than the spatial one, especially in the case of images with additive noise of a periodic (not random) nature easily describable in the frequency domain. In Fig. 9.46, high-intensity point areas represent in the frequency domain the spatial frequencies in the noise band. By means of an appropriate mask, it is possible to eliminate the frequencies represented by these point areas and applying the IFFT the source image is reconstructed with the attenuated noise. It should be remembered that this filter also eliminates the structures of the image that correspond to the same frequency band eliminated and this often explains the nonperfect reconstruction of the source image. Figure 9.47 illustrates an application of the functional diagram represented in Fig. 9.46 with the aim of eliminating/attenuating the periodic noise present in an image operating in the frequency domain. From the spectrum analysis, it is possible to design an ad hoc filtering mask to selectively cut the frequencies associated with the noise. We will return on the topic when we discuss more generally the argument of restoration of images using the Fourier approach and in particular notch filters. Basically, the spatial convolution is computationally expensive especially if the convolution mask is large. In the frequency domain, the use of the FFT and IFFT transforms considerably reduces the computational load and is also suitable for real-time applications using specialized hardware, in particular, in the processing of signals and images. In the artificial vision different processes are not traceable to linear spatially invariant (LSI) operators. This limits very much the use of FFT and IFFT. When an image elaboration process can be modeled or approximated to a LSI system, the filtering masks are normally small (3 × 3) or (5 × 5) and consequently, the spatial convolution is convenient as an alternative to using the FFT. This is the reason for the widespread use of linear filters. Finally, we remember the limitations of the use of FFT for rounding errors due to numerical computation, stability of the inverse filter, multiplicity of solutions obtained, and limitations of the domain of gray values. After defining the theory and implementative aspects of a linear operator, based essentially on the convolution process in spatial domain (9.112) and in the frequency domain (9.113), we are now able to illustrate in the following paragraphs two categories of local operators: smoothing (leveling gray levels or color) and edging (extraction or enhancement of contours).
9.12 Local Operators: Smoothing
(a)
(b)
(d)
(e)
459
(c)
(f)
Fig. 9.47 Application of the filtering process in the frequency domain to remove periodic noise in an image: a Original image; b Image with periodic noise; c Rebuilt image with attenuated noise using the process schematized in Fig. 9.46; d the spectrum of the original image; e the spectrum of the image with periodic noise; and f the ad hoc mask used to filter the high frequency zones responsible for periodic noise
9.12 Local Operators: Smoothing These operators are intended to eliminate or attenuate the additive noise present in the gray or color values of the image. This category of local operators includes smoothing algorithms. This is achieved through local linear and nonlinear operators that essentially try to smooth out (level) the irregularities in the image without altering the significant structures of the image itself. Linear smoothing filters can be defined in the spatial domain or in the frequency domain. For the spatial filters, in the convolution mask, the weights that characterize the particularity of the filter are appropriately defined. For filters in the frequency domain, similar effects on the image are obtained by removing the high frequencies. In the description of the various filters to obtain a quantitative evaluation of the filter effects, a frequency analysis of the spatial filter will often be used, and vice versa.
460
9 Image Enhancement Techniques
9.12.1 Arithmetic Average If n images of the same scene are available, it is possible to hypothesize a model of stochastic noise with a value V in each pixel. The latter represents the independent random variable with mean zero and standard deviation σ . The smoothing operator in this case results from the arithmetic mean of the values pixel-by-pixel for n images I1 , I2 , . . . , In with corresponding noise V1 , V2 , . . . , Vn . The following expression: V1 + V2 + · · · Vn I1 + I2 + · · · I n + n n indicates with the first term the value of the image average, given by 1 Ik (i, j) n n
Im (i, j) =
k=1
The second term turns out to be the additive noise of the image after the arithmetic √ mean operation which always has zero mean and √ standard deviation with σ/ n value and therefore the noise is reduced by a factor n.
9.12.2 Average Filter When only one image is available, the local arithmetic mean operator can be used, with which each pixel of the image is stored with the average value of the neighboring pixels 1 f (l, k) i, j = 0, 1, . . . , N − 1 (9.114) g(i, j) = M l,k∈Wij
where Wi,j indicates the set of pixels in the vicinity of the pixel (i, j) including the same pixel (i, j) involved in the calculation of the local average, and M is the total number of pixels included in the window Wi,j centered on the pixel in processing (i, j). If the window considered is 3×3 pixels we have g(i, j) =
i+1 j=1 1 f (l, k) 9
(9.115)
l=i−1 k=j−1
In this case, the average local operator turns out to be a special case of the spatial convolution operator, which has a convolution mask 1 1 1 1 1 1 19 19 19 1 (9.116) h = · 1 1 1 = 9 9 9 9 1 1 1 1 1 1 9 9 9
9.12 Local Operators: Smoothing
461
Fig. 9.48 Region without structures (flat)
Fig. 9.49 Region with structures (discontinuous gray levels)
The size of the W convolution mask adjusts the effect of the filter. With a small mask, for example 3×3, the effect of the average filter is to attenuate the noise (uniform signal) present in the image and to introduce an acceptable blurring on the image. With larger masks, for example 5×5, 7×7, etc., the blurring effect and the loss of some details becomes more and more evident. A compromise is required between noise attenuation and loss of detail. The evidence of filter effects is shown in Figs. 9.48 and 9.49. In the first case (Fig. 9.48) with a constant value image, the filter has no effect and the image is left intact. This is dependent on the fact that in that region the image has no structure (no change in gray levels), that is, the spatial frequency is zero. This is typical of a low pass spatial filter that leaves low spatial frequency components intact. In the second case (Fig. 9.49), there are variations in gray level from white to black and vice versa and the effect of the average filter attenuates these variations. In other words, the black/white transitions that represent the high frequency components of the input image are attenuated to transitions with minimal gray level changes. The attenuation of the high spatial frequencies is desired by a low pass filter. When designing a smoothing filter, as a general criterion, it is better to choose the weights of the convolution mask with a high value, that is, closer to the pixel under examination, whereas with ever lower values the more distant ones. This leads to a single peak in the mask, arranged also following a certain spatial symmetry. For example, a typical convolution mask of a smoothing filter is as follows: 1 2 1 1/16 1/8 1/16 1/8 1/4 1/8 = 1 · 2 4 2 (9.117) 16 1 2 1 1/16 1/8 1/16
462
9 Image Enhancement Techniques
Fig. 9.50 Smoothing average filter applied to the sample image with concentric rings with sinusoidal profile. In sequence from right to left are: original image and the images filtered respectively with 3b × 3, 5b × 5, 7b × 7, 9 × 9 masks
With this mask the blurring problems highlighted above are attenuated. The effects introduced by the filtering operation on an image can be evaluated qualitatively by observing the filtered image in a subjective way. A quantitative estimate can be obtained by using a sample image (see Fig. 9.50) of which some geometrical structures are known (an example: images with concentric rings with sinusoidal profile and increasing wavelength starting from the center of the image). The 3 × 3, 5 × 5, 7×7 and 9×9 average filter has been applied and it is noted how the filter introduces strong variations in some regions of the image as the mask size increases. This highlights the limits of the average filter used as a low pass smoothing filter. These limits are due to the introduction of spurious oscillations in the input image, caused by the trend of the transfer function in the frequency domain. The average filter considered with mask (9.116) represents the impulsive response given by the rectangular 2D function: 1 if |l| ≤ W/2, |k| ≤ W/2 h(l, k) = rect(l, k) = 0 0 otherwise We show that the Fourier transform of this function corresponds to the function sinc(u, v) as follows: F [h(l, k)] = H (u, v) =
= = = = =
W/2
∞ ∞ −∞ −∞ W/2
h(l, k) · e−2π j(ul+vk) dldk =
W/2
W/2
1 · e−2π j(ul+vk) dldk
−W/2 −W/2 +W/2 +W/2 1 1 −2π jul −2π jvk e e · −2π ju −2π jv −W/2 −W/2
e−2π jul dl e−2π jvk dk = −W/2 1 1 −2π juW/2 − e+2π uW/2 · −2π jvW/2 − e+2π jvW/2 e −2π ju −2π jv e 1 ej2πuW/2 −e−j2πuW/2 1 ej2πvW/2 −e−j2πvW/2 · πu 2j πv 2j 1 1 W W π u sin 2π u 2 π v sin 2π v 2 uW ) sin(π vW ) W sin(π W π uW π vW
−W/2
= W · sinc(uW )W · sinc(vW )
(9.118) where the expressions in braces {•} are replaced with the Euler’s formula sin θ = (ejθ −e−jθ )/2j that relates the trigonometric function of the sine with the exponential
9.12 Local Operators: Smoothing
(a)
463
(b)
(c)
Fig. 9.51 a Impulse response represented by the 2D function h(l, k) with square base of L × L sides and unit height; b Transfer function H (u, v) represented by the 2D sinc(u, v) function; and in (c) a 1D projection of H (u)
functions. It is also useful to express the 2D transfer function H (u, v) = sinc(u) · sinc(v) = sinc(u, v) by the known sync function defined in the 1D as sinc(θ ) = sin(π θ) πθ . From the graphical representation (see Fig. 9.51) of H (u, v) we note that the sinc function has the maximum value for (u, v) = (0, 0) and decreases not in monotone mode passing through zero for u = v = π, 2π, 3π, . . .. This introduces oscillations around these frequencies even with convoluted masks of limited size of 3×3. A quantitative evaluation of the effects of the average filter will be deepened in the paragraph of the filtering in the frequency domain in the description of the ideal low pass filter. Example 1: Attenuation of structures with wavelength of 2 pixels: they had to be removed from the filter. .. . ··· 1 ··· 1 ··· 1 .. .
.. . −1 −1 −1 .. .
.. . 1 1 1 .. .
.. . −1 −1 −1 .. .
.. . ⎤ ⎡ 1 ··· ⎢1 1 1⎥ ··· ⎥ ⎢ 1 · · · ∗ 19 ·⎢⎢ 1 1 1 ⎥⎥= · · · ⎦ ⎣ 1 ··· 1 1 1 ··· .. .
.. . − 31 − 31 − 31 .. .
.. .
.. . 1 1 − 3 3 1 1 3 −3 1 1 − 3 3 .. .. . .
.. .
.. . 1 1 − 3 3 ··· 1 1 3 −3 ··· 1 1 − 3 3 ··· .. .. . .
Example 2: Removal of structures with λ = 3 pixels: they did not have to be removed. .. . ··· 1 ··· 1 ··· 1 .. .
.. . −2 −2 −2 .. .
.. . 1 1 1 .. .
.. . 1 1 1 .. .
.. . −2 −2 −2 .. .
.. . ⎡ ⎤ 1 ··· ⎢1 1 1⎥ ··· ⎢ ⎥ 1 ⎢ 1 · · · ∗ 9 ·⎢ 1 1 1 ⎥⎥= · · · ⎣ ⎦ 1 ··· 1 1 1 ··· .. .
.. . 0 0 0 .. .
.. . 0 0 0 .. .
.. . 0 0 0 .. .
.. . 0 0 0 .. .
.. . 0 0 0 .. .
.. . 0 ··· 0 ··· 0 ··· .. .
Better results are obtained in the case of a spatially variant filter where, the weights are dynamically adjusted in order to smooth more in the relatively uniform image area and, smooth less in areas with sharp variations in intensity.
464
9 Image Enhancement Techniques
9.12.3 Nonlinear Filters These filters perform the smoothing operation, essentially to reduce the noise, leveling the image only in regions with homogeneous gray levels, leaving the areas where there are strong variations in gray levels unaltered. From this, it follows that the coefficients of the convolution mask must vary appropriately from region to region. Particularly in the areas where the transitions are accentuated, it is assumed that the pixels belong to different regions so the convolution coefficients must be chosen small. Nonlinear filters based on absolute value are given (a)
h(l, k) =
1 0
if |f (i, j) − f (l, k)| < T otherwise
where T is an experimentally defined threshold value. (b) h(l, k) = c − |f (i, j) − f (l, k)| with c a normalization constant defined as $−1 # h(l, k) and c= l
h(l, k) ≥ 0
k
for each value of l and k. (c)
f (i, j) =
(d)
1 L×L
l
f (i, j)
k
f (l, k)
if f (i, j) −
1 L×L
l
k
f (l, k) > T
otherwise
1 f (i, j) = min f (l, k) f (m, n) − (m,n)∈L×L L×L l
k
in this case the strong transitions are not always leveled.
9.12.4 Median Filter Unlike average based filters, the median filter attenuates the loss of image sharpness and blurring level. The filter stores every pixel with the value of the median pixel obtained after the pixels of the neighborhood have been sorted in increasing order. The median pixel has the highest value of the first half of the pixels in the neighborhood and the lowest value of the other half of the remaining pixels. For example, using ×3 windows the filter operates as shown in Fig. 9.52. The value of the pixel under examination 200 is replaced with the value of the median one which is 87. In general for windows of dimensions L × L, the position of the median pixel is the (L × L/2 + 1)-th. The median filter is very effective for the reduction of impulsive noise for the so-called salt-and- pepper (white and black scattered points). The mode of operation of the median filter, is similar to that of linear filters, in the sense that, in analogy to the convolution mask, for each pixel of the image, a window is positioned, to identify the pixels to be placed in increasing order. In Fig. 9.53 are shown the effects of the median filter to remove impulsive noise (also called Salt and Pepper) added to the original image. It highlights the excellent action provided by the median filter to completely remove the impulsive noise, unlike the poor results obtained with the average filter. In particular, it is observed that the 3 × 3 median filter is sufficient to eliminate almost all the noise without altering the sharpness of the image (preserves the contour) unlike the average filter which also introduces a significant blurring.
9.12 Local Operators: Smoothing
465
Fig. 9.52 Functional diagram of the median filter
Fig. 9.53 Application of the median filter and comparison with the average filter. In sequence: original image; original image with 2% impulsive noise; filtered image with a 3 × 3 median mask; filtered image with 3 × 3 average mask; original image with 10% noise; filtered image with 3 × 3 and 5 × 5 median masks
9.12.5 Minimum and Maximum Filter In addition to the median filter, there are statistical nonlinear filters (also known as Rank filtering based on statistical analysis of local elements in the vicinity of the pixel being processed). The basic approach is to produce a list of the vicinity elements and order them in an increasing way in analogy to the median filter. The filter is characterized by the criterion of how the output element is chosen from the sorted list that can be: the minimum or the maximum. The minimum filter introduces an image darkening in the filtered image (moves the histogram of the levels towards the low values) while the maximum filter tends to lighten the image (moves the histogram of the levels towards the high values). While the median filter, as seen above is excellent for removing impulsive noise (pixel scattered in the image) the minimum filter tends to remove isolated white dots while the maximum filter tends to remove isolated black dots.
466
9 Image Enhancement Techniques
(a)
(b)
1 1
0.9
0.8
0.8 0.7
0.6
0.6 0.5
0.4
0.4
0.2
0.3 0 20
0.2 0.1
20
10 0
0 5
10
15
20
Fig. 9.54 Gaussian function a 1D e−l
2 /2σ 2
10 0
; b 2D with σ = 3
9.12.6 Gaussian Smoothing Filter Linear smoothing filters belong to this category, which define the coefficients of the convolution mask according to the Gaussian function. These filters are very effective for attenuating the Gaussian noise in the image. The impulse response function h(l, k), modeled by the discrete Gaussian function with zero mean, is given by h(l, k) = c · e
− (l
2 +k 2 ) 2σ 2
(9.119)
where σ is the standard deviation of the associated probability distribution, c is the normalization factor which is assumed to be equal to 1. The standard deviation σ is the only parameter that models the impulse response and defines the influence area of the Gaussian filter (see Fig. 9.54). The weight of the mask coefficients is inversely proportional to the pixel distance from the central one (pixels at a distance greater than about 3σ will have no influence for filtering). Among the properties of the Gaussian filter are (a) (b) (c)
Circular symmetry; Decreasing monotone (in the spatial and frequency domain); Separable.
9.12.6.1 Circular Symmetry In Sect. 9.10.2.6 we have already described a convolution mask with circular symmetry and we have shown that the Gaussian function is the only one that has the property of circular symmetry. The Gaussian filter performs the smoothing operation in the same way in all directions. From this it follows that the filter operates independently to the orientation of the structures present in the image. This property is demonstrated by converting the Cartesian coordinates (l, k) into polar coordinates (r, θ) in the Gaussian function h(r, θ) = c · e
−
r2 2σ 2
where the polar radius r = in (9.120) by the nondependence of h(r, θ) from the azimuth θ. is defined by r 2
(9.120)
l 2 + k 2 . The circular symmetry property is demonstrated
9.12 Local Operators: Smoothing
467
9.12.6.2 Separability Properties of the Gaussian Filter Let h(i, j) and f (i, j), respectively, the impulse response (given from Eq. 9.119) and the image to be filtered follows: g(i, j) = h(i, j) f (i, j) = h(l, k)f (i − l, j − k) l
=
k
=
e
l k 2 − l2 2σ
l
− (l
e
2 +k 2 ) 2σ 2
e
f (i − l, j − k)
2 − k2 2σ
(9.121)
f (i − l, j − k)
k
where the expression [•] indicates the convolution of the image f (i, j) with the 1D Gaussian function h(k) representing the vertical component. The result of the vertical filtering leads to the output intermediate image gv (i, j) which is given as input to the horizontal convolution operator using the 1D Gaussian function h(i). By exchanging the order of convolutions, first the horizontal and then the vertical, the results do not change, due to the associative and commutative property of the convolution. The essential steps are summarized as follows: 1. Perform convolution with a horizontal mask and save the result in a transposed way from the beginning (horizontal convolution). 2. Perform the convolution with the same horizontal mask (vertical convolution). 3. Transpose the image to bring it back to its original position (filtered image).
9.12.6.3 Design of Discrete Gaussian Filter Previously the limits of the average filter have been highlighted. Smoothing filters must be designed with the particularity that the convolution mask coefficients model a gradually decreasing monotone function with a tendency to reach zero. To obtain smoothing filters with these characteristics we consider two categories of filters: discrete Gaussian filters and binomial filters. The coefficients of the convolution mask are calculated analytically by considering the Gaussian distribution function h(i, j) = k · e
−i
2 +j 2 2σ 2
(9.122)
where k is a weight normalization constant. It is known that the effects of the filter are controlled by the values of the variance σ 2 and by the dimensions of the L × L mask. For example, a discrete Gaussian filter with a mask of size 7 × 7, a variance σ 2 = 2, with the normalization of the value maximum of the peak equal to 1 in the central position (0, 0) of the mask, applying the (9.122) would have the following discretization values:
[i, j] −3 −2 −1 0 1 2 3
−3 0.0111 0.0388 0.0821 0.1054 0.0821 0.0388 0.0111
−2 0.0388 0.1353 0.2865 0.3679 0.2865 0.1353 0.0388
−1 0.0821 0.2865 0.6065 0.7788 0.6065 0.2865 0.0821
0 0.1054 0.3679 0.7788 1.0000 0.7788 0.3679 0.1054
1 0.0821 0.2865 0.6065 0.7788 0.6065 0.2865 0.0821
2 0.0388 0.1353 0.2865 0.3679 0.2865 0.1353 0.0388
3 0.0111 0.0388 0.0821 0.1054 0.0821 0.0388 0.0111
In many image processing systems it is convenient to consider the filter coefficients with integer values. For this purpose an appropriate value of the constant k is calculated. If you want to match
468
9 Image Enhancement Techniques
a minimum value (for example 1) to the corner elements of the mask and a maximum value k in correspondence with the central coefficient, we have k=
NewValue_min_hN (−3, 3) 1 = = 91 ActualValue_h(3, 3) 0.0111
(9.123)
The adaptation of@ the other coefficients is obtained by multiplying each of them by the value of k previously calculated. The result obtained is the following: [i, j] −3 −2 −1 0 1 2 3
-3 1 4 7 10 7 4 1
−2 4 12 26 33 26 12 4
−1 7 26 55 71 55 26 7
0 10 33 71 91 71 33 10
1 7 26 55 71 55 26 7
2 4 12 26 33 26 12 4
3 1 4 7 10 7 4 1
With these new values of the coefficients, it is necessary to normalize the values of the convolution function with a constant λ, that is, g(i, j) = λ h(i, j) f (i, j) (9.124) and with the data of the previous example we would have ⎡ ⎤ 3 3 λ = 1/ ⎣ h(i, j)⎦ = i=−3 j=−3
1 1115
(9.125)
The normalization of convolution results ensures that homogeneous regions with constant intensity are left unaltered by the filter. Other discrete filters can be calculated by varying σ 2 and L, respectively, the variance and the size of the mask. The Gaussian filter was applied to the same sample image (the image with concentric rings having a sinusoidal profile with increasing wavelength from the center of the image) that was applied to the average filter. Comparing the result of the filtered images, we can observe the absence of artifacts in the filtered image with Gaussian filter (see Fig. 9.55) compared to that filtered with the average filter (see Fig. 9.50). The Gaussian filter introduces only a slight leveling of gray levels (slight blurring) useful in some cases to attenuate the presence of Gaussian noise present in the image (see Fig. 9.56).
Fig. 9.55 Gaussian smoothing filter is applied to the sample image with concentric rings having a sinusoidal profile. Sequentially from left to right are shown: original image and images filtered, respectively, with 3 × 3, 5 × 5, 7 × 7, 9 × 9 masks
9.12 Local Operators: Smoothing
469
Fig. 9.56 a Image with Gaussian noise and b image filtered, using the 7 × 7 Gaussian filter of Table 9.1 Table 9.1 Gaussian mask 7 × 7 1
1
2
2
2
1
1
1
2
2
4
2
2
1
2
2
4
8
4
2
2
2
4
8
16
8
4
2
2
2
4
8
4
2
2
1
2
2
4
2
2
1
1
1
2
2
2
1
1
In applying Gaussian filtering, the right compromise between the blurring action introduced by the filter and the possible benefit of noise attenuation and the eventual removal of details must be found. This compromise is regulated by the variance, i.e., the size of the filter mask and the amount of noise present in the image.
9.12.7 Binomial Filters A class of smoothing filters called binomial filters, can be obtained in a simple way considering some properties (a)
The Gaussian distribution can be approximated by the binomial distribution (especially with n 0) n h(x) = (9.126) px (1 − p)n−x x = 0, . . . , n x
where p represents the probability of the event x and n is the order of the binomial distribution. This distribution has mean μ = np and variance σ 2 = np(1 − p). In the discrete case, with n = 8, p = 1/2, and therefore, μ = 8 · (1/2) = 4 and σ 2 = 2, the Gaussian distribution and the binomial one, have very small differences (b) The separability property of the 2D Gaussian filter is known. This filter can be implemented as a convolution of two monodimensional Gaussian filters (horizontal and vertical). This is feasible using a single 1D Gaussian filter and transposing the image after each convolution (with the same 1D mask).
470
9 Image Enhancement Techniques
x 0 1 2 3 4 5 6 7 8 Distrib. bin. 1 8 28 56 70 56 28 8 1 Distrib. Gauss. 1.3 7.6 26.6 56.2 72.2 56.2 26.6 7.6 1.3
(c)
The associative property of the convolution. If we consider the convolution mask of size (n × n) of a Gaussian or binomial filter, its convolution with another size mask (m × m) produces a new convolution mask of dimensions (n + m − 1 × n + m − 1).
Let’s analyze in detail the last property and consider the 1D average filter h(i) =
1 11 2
(9.127)
If we apply to this filter n cascaded convolutions, by virtue of the associative property, we would have the following filter: [1 1] [1 1] · · · [1 1], that is,
hn (i) = h(i) h(i) · · · h(i)
n times
Example h2 (i) = h(i) h(i) = 1 1 1 1 = h3 (i) = 18 1 3 3 1 1 14641 h4 (i) = 16 1 1 8 28 56 70 56 28 8 1 h8 (i) = 256
1 4
121
(9.128)
For symmetry reasons, we are interested in odd-sized filters. The values of the 1D convolution masks shown above correspond to the discrete binomial distribution values. These values, obtained by applying the convolution consecutively with the base mask 1/2 [1 1], are equivalent to each line nth of the Pascal triangle which can be considered as a good approximation of n-points of the Gaussian filters. It follows that 1D binomial filters can be extracted from the corresponding values of each line of the Pascal triangle which developed up to n = 8 n 0 1 2 3 4 5 6 7 8
k 1 1/2 1/4 1/8 1/16 1/32 1/64 1/128 1/256
1 11 121 1331 14641 1 5 10 10 5 1 1 6 15 20 15 6 1 1 7 21 35 35 21 7 1 1 8 28 56 70 56 28 8 1
σ2 0 1/4 1/2 3/4 1 5/4 3/2 7/4 2
where k is the scale factor of filter 2−n and the variance σ 2 = np(1 − p) = n/4 with p = 1/2, represents the parameter that controls the effectiveness of smoothing filter.
9.12 Local Operators: Smoothing
471
Fig. 9.57 Binomial smoothing filter applied on the sample image with concentric rings having a sinusoidal profile. In sequence from left to right are shown: original image and the images filtered, respectively, with 3 × 3 and 5 × 5 masks
An binomial filter h(i) with odd size L = 2R + 1; R = 1, 2, 3, . . ., using the property (a), is defined by the following relationship: hL (i) = L pi (1 − p)L−i =
L! 1 · (R − i)!(R + i)! 2L
(9.129)
with i = −R, −(R − 1), . . . 0, . . . , R. The impulse response of the binomial filter is obtained considering that of the average filter h(i) is elevated at L-th power. With respect to the average filter, we notice that the binomial smoothing filter has an impulse response that decreases monotonously and tends to zero towards the high frequencies. Summarizing, the binomial filters can be implemented according to the following: 1D Filter With the separability property you can use a 1D binomial filter with values approximated by a row of the Pascal triangle. If the mask is larger than 10, the coefficients of the binomial expansion assume very large values and create implementation problems (they can no longer be stored in one byte). For the associative property of the convolution of the filters, it has been seen that it is possible to apply cascaded convolutions to the same filter several times, thus obtaining the equivalent effect of a large filter. 2D filter These filters can be generated by using two 1D binomial filters in a separable way, as follows: hL (i, j) = hL (i) hL (j) (9.130) For example, a 3×3 binomial convolution mask is obtained as follows: ⎡ ⎤ ⎡ ⎤ 121 1 1 1 1 2 2 2 ⎣2 4 2⎦ 1 2 1 ⎣2⎦ = h (i, j) = h (i) h (j) = 4 4 16 1 121
(9.131)
Figure 9.57 displays the impulse responses of the binomial filters at different sizes 3×3, 5×5 and the effects of smoothing on the sample image. It is noted that the binomial filter, an approximation of the Gaussian filter, presents components with some imperfect circular symmetry.
9.12.8 Computational Analysis of Smoothing Filters The computational load of a Gaussian filter of dimensions L × L is of L2 multiplications and of L2 − 1 additions. If the filter is made with 1D masks of the type 1/2[1 1], and is applied in the horizontal and vertical direction L − 1 times, 2(L − 1) additions are required. The multiplications
472
9 Image Enhancement Techniques
are performed efficiently with shift operations. For a 17×17 size mask, the filter requires only 32 sums and some shift operations against the 289 multiplications and 288 sums necessary with the direct approach. Rank, median, minimum, and maximum filters require significant computation time. Assuming a mask of size L × L and an image of size M × N the required operations would result (a) Sorting of L × L elements for each pixel being processed. This requires a time proportional to L2 · log(L2 ); (b) The computational load for filtering the entire image is proportional to M · N · L2 · log(L2 ).
9.13 Low Pass Filtering in the Fourier Domain The transfer from the spatial domain to that of frequencies one realized with the Fourier transformed and anti-transformed, in fact, produces similar effects to a global operator, in the sense that in the Fourier domain the spatial information of the image is lost (analogous to the histogram), and expressed in terms of sinusoidal periodic functions that characterize the spatial frequencies present in the image. Indeed from the Eq. (9.92) it is noted that each element of F(u, v) contains information relative to all the elements f (k, l) of the image encoded by the exponential terms. It follows the impossibility of direct associations between the spatial domain of origin and that of the frequencies. Nonetheless, useful relationships can be found between the two domains, in some cases through attempts, canceling or modifying some frequencies in the frequency domain, and then verifying these effects in the spatial domain by applying the Fourier inverse transform (9.93). An intuitive approach can be the analysis of spatial rapidity of local intensity variability (color, gray levels) in the image, that is, correlated to the frequency value. We already know that the origin of the Fourier domain, i.e., (u, v) = (0, 0) is proportional to the average value of the image intensity. Towards the high values of the frequencies correspond instead of strong variations of the intensity of the image. The latter correspond to areas with contours or zones with accentuated discontinuities normally due also to the presence of noise. In Sect. 9.11.3 we introduced a functional scheme of the convolution process to perform the filtering in the frequency domain and an example aimed at cutting the frequencies due to noise that introduce artifacts in the image. We have seen how the high frequencies due to noise alter the spectrum of the original image and how a specially designed mask can filter them. With the same functional scheme, smoothing filters based on low pass filtering can be realized through a suitable design of the transfer function H (u, v) and using the Eqs. (9.111) and (9.112) of the convolution theorem. The characteristic of the filter must be such as to pass the low frequencies, which we know correspond to slight changes in intensity (homogeneous zones in the image), and attenuate the high frequencies that in the spatial domain corresponds to strong variations of Intensity (contours and noise). With the transfer function H (u, v) using the (9.112) (the convolution operator in the Fourier domain) are altered or cut the frequencies of the transformed input image F(u, v) and the effects of filtering in the spatial domain are observed in the filtered image g(k, l) = F −1 [G(x, y)] obtained by applying the inverse Fourier transform Eq. (9.93). These effects are observable by displaying both the filtered image and by analyzing the H (u, v) spectrum where the energy is normally concentrated in the vicinity of the origin in the case of a noise-free image while in the high frequency zones can be present spectral components due to noise. It is often complicated to design a filter in the spatial domain. In these cases, it starts defining it in the frequency domain, then the inverse Fourier transform (IDFT) is applied to obtain the impulse response h(k, l) which can be further rectified in the spatial
9.13 Low Pass Filtering in the Fourier Domain
473
domain in the appropriate dimensions, required by the application and then implementing spatial convolution. Several low pass filters have been devised in the frequency domain. The best known are: the ideal low pass filter, the Butterworth filter, and the Gaussian one. These filters operate both on the real and imaginary parts of the Fourier transform and do not alter the value of the phase.
9.13.1 Ideal Low Pass Filter It is defined as an ideal filter when a system cancels the harmonic components in a given frequency range and behaves like a system that does not introduce distortions on the remaining frequencies. Attention, it is ideal not because it is optimal but only because it cuts the frequencies to be removed without altering the others. The transfer function that characterizes the ideal low pass filter is given by 1 if l(u, v) ≤ l0 (9.132) H (u, v) = 0 if l(u, v) > l0 where l0 represents the cutoff frequency, that is, the place of the concentric transition points at distance l0 from the origin, where transfer function H (u, v) passes from 1 to zero. By operating with images, the set of cutting frequencies, equidistant from the origin, assuming the centered spectrum, are defined by the circle of generic radius l which is given by l(u, v) = u2 + v 2 In this way, when changing l, the filter will include or exclude different frequencies, allowing the passage of different quantities of the spectral energy. All frequencies included in the circle of radius l pass intact while those external to the circle are completely eliminated. The filter is circular symmetry in the hypothesis that the transform was centered in the square domain of the spectrum. The results of the filter depend on the value of the chosen threshold that while it can attenuate the noise, it accentuates the blur of the image. An effective way to check the performance of the various filters is to consider the value of the power spectrum given by P(u, v) = |H (u, v)|2 = R2 (u, v) + I 2 (u, v)
(9.133)
where R and I respectively indicate the real and imaginary part of the complex values of H (u, v). At a cutoff frequency l0 we define a quantity Pl0 which is the spectral energy for all the u and v values included by the circle of radius l0 in the frequency domain (u, v) Pl0 = P(u, v) (9.134) u
v
It is useful to consider how in correspondence of the various cutting frequencies there are the percentages of the power spectrum that varies from a maximum of 100% up to a minimum PI = 100
Pl0 N −1 N −1
(9.135)
P(u, v)
u=0 v=0
where PI is the percentage of the power spectrum that the ideal low pass filter passes in correspondence with the chosen radius l0 . A typical ideal low pass filter is shown in Fig. 9.58 where a 2D and 1D graphical representation of the transfer function H (u, v) is given. Remember that its antitrasformed, i.e., the impulse response h in the spatial domain corresponds to the function sinc(i, j). This filter has been applied to the image of Fig. 9.59 and the effects of the filter are highlighted when
474
9 Image Enhancement Techniques
(a)
(b)
Fig. 9.58 Ideal low pass filter, a 3D representation of H (u, v) and in b 1D representation of H (u) profile. Only the low frequencies included in the cylinder of radius l0 pass
Fig. 9.59 Application of the ideal low pass filter. a Original image, b representation of the spectrum F(u, v) of the image in a with highlighted circles of radius l0 at the cutting frequencies; c–h follow the different results of the original image reconstructed at the various percentages of the residual power spectrum after filtering
the cutting frequency changes. In essence, the image is increasingly blurred as the filter removes an ever-increasing PI percentage of the power spectrum as the radius l0 increases. The effect of the blurring had already been highlighted previously with the space convolution operator. Now let’s see how to explain the blurring with the ring effect on the filtered image. In the frequency domain, the transfer function is essentially a rectangular function even if it can be defined with a 2D circular profile. With the inverse transform of H (u, v) the function h(i, j) = sinc(i, j) is obtained in the spatial domain. With the convolution in the spatial domain between the pulse function δ(i, j) and the sinc function, in fact, the replica of the sinc function is obtained in the position of the pulse. It achieves the following effect of convolution (i.e., filtering) through the action of the sinc function (the impulse response of the operator): with the central lobe it transforms (blurs) the impulse while the external lobes generate the ring’s effect (ringing). The same effect would be considered an image with a single pixel seen as an approximation of the impulse. A normal image can be considered as a set of pulses of intensity proportional to the
9.13 Low Pass Filtering in the Fourier Domain Fig. 9.60 Effect of scale change between spatial domain and Fourier domain: a Rectangular transfer function with cutoff frequency l0 and corresponding impulse response h, b represented transfer function with cutoff frequency 3l0 and relative impulse response contracted of a factor 1/3
475 Impulse response: sinc(i)
Rectangular function spectrum
(a) 1
1
0.8 0.5
0.6 0.4
0
0.2 0 20
(b)
10
0 10 Frequency u
0.5
20
5
0 Axis (i)
5
1 1
0.8 sinc(x)
0.6 0.4
0.5 0
0.2 0 20
10
0
10
Frequency u
20
0.5
10
0
10
Axis (i)
pixel value and the resulting overall effect of the filter on all pixels is given by the combination of blur and ringing. We also observe the dual behavior that exists between the cutting frequency and spatial domain: when the cutting frequency increases there is a contraction in the spatial domain of the impulse response and vice versa, a change of scale that amplifies the range of definition of h produces a contraction in the frequency domain. This explains better the effects of the filter: when the transfer function H is very wide, that is, the cutting frequency is very high the blurring and ringing effects are greatly reduced, conversely, they are instead amplified for lower values of the cut frequency which, in contrast, produces a strong expansion of spatial sinc (see Fig. 9.60). It is easy to see that the extreme limits are that the rectangular transfer function H (u, v) is constantly 1 and its anti-transformed, the spatial sinc impulsive function tends to become an impulse that convoluted with the image of input we know to produce the replica of the latter leaving it unchanged, without blurring. The following filters prevent abrupt discontinuities in the elimination of high frequencies.
9.13.2 Butterworth Low Pass Filter The transfer function of this filter of order n and with cutoff frequency l0 is given by H (u, v) =
1 1 + [l(u, v)/l0 ]2n
(9.136)
where l(u, v) represents the previously defined spectrum. The effect of the filter is still substantially controlled by the value of the cutting frequency l0 from which depends the amount of energy, that is retained, and therefore, the level of blurring introduced in the image. Unlike the ideal filter, however, there is no abrupt discontinuity between the filtered and pass-through frequencies (see Fig. 9.61). For a smoothing filter it is important to define a cutoff frequency l0 such that the value of function H reaches a value no greater than a fraction of its maximum value. From the previous equation, for H (u, v) = .5 (50% of the maximum value) it follows that l(u, v) = l0 . Another used fraction is √1 of the maximum value of H . In this case, under the conditions that l(u, v) = l0 , the transfer 2
476
9 Image Enhancement Techniques 3D Butterworth filter n=1
(a)
(b)
Butterworth filter n=1,2,3,4 1
H(u,v)
0.8 0.6 0.4 0.2 0 200
300
400 u
500
600
Fig. 9.61 Butterworth low pass filter. a 3D representation of the transfer function H (u, v) of order n = 1; b 1D profiles of the filter H (u) for different values of order n = 1, 2, 3, 4 n=2
Fig. 9.62 Application of Butterworth low pass filter of order n = 2 in the same conditions of operation of the ideal filter. a original image, b Representation of the F(u, v) image spectrum in (a) with highlighted radius circles l0 at cutoff frequencies; c–h follow the different results of the original image reconstructed at the various percentages of the residual power spectrum after filtering
function becomes H (u, v) =
1 1 = √ 1 + 0, 414[l(u, v)/l0 ]2n 1 + [ 2 − 1][l(u, v)/l0 ]2n
(9.137)
This filter is still applied to the same image as the ideal low pass filter (see Fig. 9.62) with l0 equal to the first five circumferences, and with n = 2. With filters higher than 1 you would start to have the effects of ringing as the order increases. For n > 20 you would actually have the same effects as an ideal low pass filter. A good compromise is obtained for orders not exceeding 2.
9.13 Low Pass Filtering in the Fourier Domain
(a)
477
(b)
Gaussian Low-pass Filter Cutting Frequency l0=100px
Gaussian low-pass filter
1 0.9 0.8
H(u,v)
0.8
0.7
0.6
0.6
0.4
0.5
0.2
0.4 0.3 400
0.2 300
400 300
200 100
v
0.1
200 100
u
300
350
u
400
450
500
Fig. 9.63 Gaussian low pass filter: a 3D representation of the transfer function H (u, v) at the cutoff frequency l0 = 100px, b 1D profiles of the filter H (u) for different values of cutoff frequencies l0 = 10, 15, 35, 70, 100, 200, 250px
9.13.3 Gaussian Low Pass Filter The transfer function of this filter is as follows: 1
H (u, v) = e− 2 [l(u,v)/l0 ]
2
(9.138)
The interesting property of the Gaussian low pass filter is to have the same shape in both the spatial and frequency domains. Thus the Fourier transform (FT) of a Gaussian function is itself Gaussian. When the Gaussian filter is applied in both domains, the filter action is reduced only to blurring and does not produce the effects of ringing. The Gaussian filter is shown in Fig. 9.63 with the 1D profile corresponding to the different cutting frequencies. In comparison to the Butterworth filter of order 2 the blurring action is less and especially the effect of the ringing is void (see Fig. 9.64). The transfer function given with the (9.138) is motivated by the property that the Fourier transform of a Gaussian function is still a Gaussian with only real values. To prove it, let’s assume that the Gaussian impulsive response is given in the form h(x) = e−π x
2
(9.139)
and we apply the Fourier transform F [h(x)] = H (u) =
∞ −∞
= =
h(x) · e−j2π ux dx ∞ −∞ ∞ −∞
e−π x e−j2π ux dx 2
e−π(x
2 +j2ux)+π
(9.140)
dx
The second member of the last expression is multiplied with the following unit value expression e−π u e+π u = 1 2
2
478
9 Image Enhancement Techniques Gaussian
Fig. 9.64 Application of the Gaussian low pass filter in the same operating conditions as the ideal filters and Butterworth. a Original image, b Representation of the F(u, v) image spectrum in a with highlighted radius circles l0 at cutoff frequencies; c–h follow the different results of the original image reconstructed at the various percentages of the residual power spectrum after filtering at the various cutting frequencies
and continuing in the Eq. (9.140) we have H (u) = e−π u
2
∞ −∞
= e−π u = e−π u = e−π u = e−π u
2
e−π(x ∞
−∞ ∞ 2 −∞ ∞ 2 −∞ ∞ 2 −∞
2 +j2ux)+π u2
dx
e−π(x
2 +j2ux)+π u2
e−π(x
2 +j2ux−u2 )
dx
dx
(9.141)
e−π(x+ju)(x+ju) dx e−π(x+ju) dx 2
At this point we perform the substitution with the variables of convenience: z = x + ju and dz = dx. The last expression of (9.141) becomes H (u) = e−π u
2
∞
e−π z dz 2
(9.142)
−∞
The integral of this last equation corresponds to the normalized Gaussian area which is known to be equal to unity. It follows that (9.142) is reduced to the following: H (u) = e−π u
2
(9.143)
confirming that the FT of a Gaussian function (Eq. 9.139) is still a Gaussian (Eq. 9.143). As is known from the statistics, other normalized forms of the Gaussian function are used. In fact, if instead of
9.13 Low Pass Filtering in the Fourier Domain
479
the form (9.139) we consider the following: hμ,σ (x) =
2 1 − (x−μ) √ e 2σ 2 σ 2π
(9.144)
where μ is the mean and σ the standard deviation (or variance σ 2 ). From the geometrical point of view the parameter σ indicates how wide the curve is centered on the average μ along the x axis. In essence, the inflection point of the curve occurs at the point given by μ ± σ . The area under the curve remains always normalized to 1. The question to ask is the following: what is the Fourier transform when the Gaussian impulse response is given by the Eq. (9.144)? For simplicity we consider the case with mean zero (μ = 0) which is then the useful form of the Gaussian function applied in the convolution process in both the spatial and the frequency domains. In this case the (9.144) becomes 2 1 − x (9.145) hσ (x) = √ e 2σ 2 σ 2π The Fourier transform of this Gaussian form hσ (x) is obtained by referring to the similarity theorem (i.e., to the scaling property of FT) which introduces a scale factor a (in expansion or contraction) in the spatial domain with opposite effect in the domain of frequencies, given by u 1 F {f (ax)} = (9.146) F |a| a If the scale factor a is greater than unity, a contraction of f (x) occurs in the spatial domain, whereas, it corresponds to an expansion of F(u) in the frequency domain. Looking closely at (9.146) one can better understand what happens in the frequency domain.
1. With a large, the expansion of F(u/a) is greater than the factor a with respect to that of F(u). 2. Multiplying by the factor 1/a the transformed becomes a1 F( au ) which has the effect of reducing the magnitude of the transform itself and graphically means a push of the curve towards the vertical axis. The opposite happens if a < 1. In this case the graph of f (ax) is more expanded horizontally than that of f (x) while in the frequency domain the transform F(u/a) is more contracted horizontally and stretched vertically. In short, if a function is extended in the spatial domain, it is contracted in the frequency domain and vice versa. √ For the Gaussian form hσ (x) (Eq. 9.145), if we consider the scale factor a = 1/σ 2π , remem2 bering the transform (9.142) of the Gaussian h(x) = e−π x and applying the similarity theorem (9.146), the Fourier transform results √ 1 −π(u/a)2 1 u 2 2 2 = F {hσ (x)} = Hσ (u) = = σ 2π · e−2π σ u (9.147) H e |a| a |a| which is similar to a Gaussian but very different from the original one. For a better readability of the Gaussian form of Hσ (u) we indicate with σu its standard deviation to distinguish it from the spatial one σ and we place the following relation 2π σ = σ1u . Substituting in the previous equation we have the following relation: Hσ (u) =
1/(2π σu2 ) · e
−
u2 2σu2
(9.148)
480
9 Image Enhancement Techniques
(a) 6
(b)
Gaussian Impulse
50
Spectrum H(u)
5 4
f(x)
Power spectrum of a Gaussian pulse
60
3 2
40 30 20 10
1
0 0
Axis x
0.5
0
5
10
15
20
25
30
Spatial frequencies u
Fig. 9.65 Gaussian low pass filter: a Gaussian impulse response parameterized by standard deviation σ ; b Transfer function obtained from the Fourier transform of (a). We observe the dual-scale change between spatial and frequency domains: expansion in one domain corresponds to a contraction in the other and vice versa, governed by the value of σ
2
x which is an expression identical to the Gaussian function in the spatial domain h(x) = e− 2σ 2 , i.e., 1 with a Gaussian form but with reciprocal standard deviation σu = 2π σ less than a 2π factor. This explains why the curves in the two domains have opposite shapes in the sense that, the larger is σ in the spatial domain, the more the curve becomes thinner in the frequency domain and follows an opposite behavior with the Gaussian curve which shrinks at a smaller σu value (see Fig. 9.65). It is finally observed that the Gaussian is given by (9.145) (with μ = 0) is symmetrical and its FT is real and symmetrical. With μ > 0 given by (9.144), the corresponding transform is obtained by considering, in addition to the similarity theorem, also the spatial translation property of the FT F {f (x − μ)} = F(u)e−j2π μu (9.149)
The Gaussian transformation hμ,σ (x), expressed by (9.145), by combining the two properties given by (9.146) and (9.149), would result √ 2 2 2 F {hμ,σ (x)} = Hμ,σ (u) = σ 2π · e−j2π μu e−2π σ u (9.150) In this case the pair of transforms would no longer be symmetrical and the translation of μ of the Gaussian curve, in the frequency domain, only alters the phase of the related spectrum as it does not change the energy at any point of the spectrum itself. This situation is not very useful for spatial image filtering applications. It is also noted that for μ = 0 in the equation (9.150) the transfer function Hμ,σ (u) is reduced to Hσ (u) (Eq. 9.148). We summarize the results achieved
(a) For small values of σ correspond large values of σu , i.e., the Gaussian filter produces a slight smoothing if the standard deviation σ has small values that correspond to large values in the frequency domain with the consequent limited cutting of the high frequencies. (b) On the contrary, if σ has high values, the smoothing operation is noticeably accentuated by having in correspondence a small value of σu eliminating a higher number of high frequencies. In the latter case, the elimination of high frequencies involves both a greater attenuation of noise and of the structures present in the spatial domain.
9.13 Low Pass Filtering in the Fourier Domain
(a)
481
(b)
(c) Trapezoidal Low-pass Filter Cutting Frequency
Trapezoidal low-pass filter 1 0.9 1
0.8
0.8 H(u,v)
H(u,v)
0.7 0.6 0.5
0.6 0.4
0.4
0.2
0.3
0
0.2
400
0.1 0 300
350
u
400
450
500
300
200
100
v
100
200
300
400
u
Fig. 9.66 Trapezoidal low pass filter relative to frequencies l0 = 70 and l1 = 200px: a 1D representation, b 2D gray level and, c 3D graphics
9.13.4 Trapezoidal Low Pass Filter This filter produces an intermediate effect between the ideal filter and the previously considered filters (Gaussian and Butterworth). The transfer function is as follows: ⎧ for l(u, v) < l0 ⎨1 1 for l (9.151) H (u, v) = l(u,v)−l 0 ≤ l(u, v) ≤ l1 ⎩ l0 −l1 for l(u, v) > l1 0 In this filter l0 represents the cutoff frequency and (l0 , l1 ) represents a frequency interval with linear variation of H (u, v), useful to avoid the abrupt variations typical of the ideal filter (see Fig. 9.66). The results of the trapezoidal filter are shown in Fig. 9.67.
9.13.5 Summary of the Results of the Smoothing Filters The smoothing filters described are used to reduce noise and improve visual quality, especially in the case of quantized images in a few levels. In the latter case, in fact, when they are displayed, the effect of false contours is noted. We have presented two ways to realize the filters of smoothing: in the spatial domain and in the frequency domain. In the spatial domain, we used the convolution operator between image and convolution mask appropriately designed to achieve the effects required by a smoothing filter. The simplest was the average filter with the weights of the mask all the same. This filter reduces the contrast and attenuates the details in the image. Better results were obtained with the nonlinear filter, the median filter that was very effective in reducing impulsive noise leaving almost intact the details of the images. A good compromise between noise reduction and moderate application of blurring effect is achieved by a Gaussian filter (or with its approximation, the binomial filter) properly parameterized in terms of variance and size of the mask. All smoothing filters have the effect of cutting (filtering) the high frequencies that, in fact, represent the informative content of the details present in the image (the strong variations of intensity in the spatial domain). A quantitative measure of blurring’s effects in spatial filtering was examined by evaluating the DFT, i.e., ripple effects of the convolution mask in the frequency domain. This allowed evaluating the undesired effects (ringing) introduced by the convolution process, due to the presence of external lobes in the impulse response, very present in the average filter and very attenuated in the Gaussian and binomial filters. In the frequency domain, the convolution operator has always been used which, thanks to the convolution theorem, simplifies the convolution process by reducing it to the multiplication of the
482
9 Image Enhancement Techniques
Fig. 9.67 Application of trapezoidal low pass filter. a Original image; b Representation of the spectrum F(u, v) of the image in a with highlighted circles of radius l0 and l1 at the corresponding cutting frequencies; c–f Follow the different results of the original reconstructed image at the various percentages of the residual power spectrum after filtering to the different pairs cutting frequencies (l0 , l1 )
image Fourier transform and the transfer function. Theoretically, the convolution process in the frequency domain would be immediately reduced to the design of a transfer function that attenuates the high frequencies through low pass filtering. As we have seen the high frequencies contained in the image are associated with strong variations in intensity (usually represented by the details) but also due to the presence of noise. In the frequency domain the convolution mask can produce a very selective cut of the frequencies to be filtered or passed. Ideal low pass filters (of low utility), Butterworth, Gaussian, and trapezoidal filter has been introduced, all characterized by a circular symmetry transfer function, by the cutoff frequency parameter to control the amount of the high frequencies to be cut and the level of discontinuity of the transfer function at the cutting frequency. In the ideal low pass filter, the discontinuity is clear while for others it is adequately modeled to minimize such discontinuity. Operating in the frequency domain it is also possible to evaluate the amount of the filtered power spectrum and to estimate the robustness of the filter. It can be better quantified the undesirable effect of the ringing through the parameters of control of the transfer function. In theory, in the frequency domain, it should be simpler to design the transfer function to produce a right balance between desired smoothing and noise reduction, but given the complexity of the images this is not always possible. A right compromise is to operate in the two domains, for example, start in the domain of frequencies and then verify the effects in the spatial domain by modifying appropriately the convolution mask and then re-evaluate in the frequency domain the effects of changes.
9.13 Low Pass Filtering in the Fourier Domain
483 Rebuilt with Gaussian filter LP rpx220
Filtered with Gaussian mask 3x3
Filtered with Gaussian mask 5x5
Filtered with 3x3 mask derived from FFT of the Gaussian filter LP rpx220
Rebuilt with Butterworth filter LP rpx220
Filtered with 3x3 mask derived from FFT of the Gaussian filter LP rpx220
Fig. 9.68 Smoothing filters compared: a original image; b Original image with added Gaussian noise σ = 0.01; c Rebuilt image with Gaussian low pass filter with 220 px cutoff frequency and 98.89 % pass-through spectrum; d Rebuilt image with Butterworth low pass filter with 220 px cutoff frequency and 99% pass-through spectrum; e Image filtered in the spatial domain with a Gaussian mask of size 3 × 3; f as in e but with mask 5 × 5 and σ = 1; g Filtered image in the spatial domain with 3 × 3 mask derived from the FFT of the Gaussian low pass filter applied in c; and h as in g but with mask 5 × 5
In Fig. 9.68 are summarized some results of the smoothing filters, based on the spatial domain and frequencies, applied to an image of size 512 × 512. Gaussian noise with σ = 0, 01 (Fig. 9.68b) has been added to provide a basis for comparison on their effectiveness on the effects of smoothing and noise reduction in the original image. The best results are obtained in the frequency domain with a Gaussian filter (220 px cutting frequency and 98.89% passing spectrum), Butterworth filter (220 px cut frequency, order n = 2, and 99% passing spectrum); in spatial domain with Gaussian masks of size 3 × 3, 5 × 5 and σ = 1; and finally with masks of 3 × 3, 5 × 5 derived from the FFT of the Gaussian low pass filter previously applied in the frequency domain. The best results obtained, for this type of image (rich of many vertical and horizontal discontinuities) with Gaussian noise, are those with Gaussian lowpass filter and Butterworth considering a good balance between attenuated noise and smoothing level. Finally, it highlights the best response in the spatial domain by working with a mask derived from the Gaussian filter designed in the frequency domain. The median filter has not been considered because it is not suitable to remove the Gaussian noise instead it is very effective to remove the impulsive noise leaving unchanged the details of the image. Several other filters exist besides those mentioned. For example, filters with the triangular transfer function are used (Bartlett, shaped like a Welch parabolic filter), the Chebyshev filter and filters with forms derived from the cosine function (Hamming, Hanning, Blackman) [4].
484
9 Image Enhancement Techniques
References 1. J.W. Cooley, J.W. Tukey, An algorithm for the machine calculation of complex fourier series. Math. Comput. 19, 297–301 (1963) 2. Y.Y.-Q.W. Guang-Bin, A new pseudocolor coding method for medical image processing. J. Med. Imaging 13(11), 878–879 (2003) 3. S.G. Johnson, M. Frigo, Implementing FFTs in practice, in Fast Fourier Transforms (C. S. Burrus, ed.), chapter 11. (Connexions, 2008) 4. S.L. Marple, Digital Spectral Analysis with Applications (Prentice-Hall, Englewood Cliffs, NJ, 1987) 5. A. Visvanathan, S.E. Reichenbach, Q.P. Tao, Gradient-based value mapping for pseudocolor images. J. Electron. Imaging 16(3), 1–8 (2007) 6. M.R.E.Z.A. Ali, Realization of the contrast limited adaptive histogram equalization (clahe) for real-time image enhancement. J. VLSI Signal Process. 38, 35–44 (2004) 7. P. Spagnolo, T. D’Orazio, P.L. Mazzeo, Color brightness transfer function evaluation for non overlapping multi camera tracking, in Third ACM/IEEE International Conference on Distributed Smart Cameras (2009), pp. 1–6 8. K. Zuiderveld, Contrast limited adaptive histogram equalization, in Graphics gems IV, ed. by P.S. Heckbert, vol. IV, chapter VIII.5 (Academic Press, Cambridge, MA, 1994), pp. 474–485
Index
A Abbe distance, 238 absolute refraction index, 13 absorption phenomena, 3 additive synthesis, 116, 118 additive trichromy, 112 Airy’s disk, 229, 230, 234, 235 albedo, see BRDF Lambertian aliasing, 251 anisotropic, 65 aspect ratio, 361 B binary image, 293 background, 302 foreground, 302 bit-plane manipulation, 397 black body, 43, 45 BRDF-Bidirectional Reflectance Distribution Function, 63 BRDF Lambertian, 67 Brewster angle, 31, 32, 38, 40 Brewster’s law, 37 C camera, 272 cell receptive field excitatory response On, 99 inhibitory response Off, 99 central limit theorem, 257 chromaticity coordinates (r, b) diagram, 131 (r, g, b) trichromatic, 126 (x, y) diagram, 134
(x, y, z) trichromatic, 130 chromaticity diagram, 127 chromaticity diagram achromatic point, 133 alychne line, 140 brightness, 142 chroma, 142 CIE standard (r, g), 131 CIE standard (x, y), 134 CIE UCS (u’,v’), 154 CIE UCS (u,v), 154 colorfulness, 143 complementary dominant wavelength, 136 dominant wavelength, 135 excitation purity, 136 gamut, 133 hue, color appearance, 141 lightness or value, 142 luminous intensity, 143 non-spectral color purity, 136 purple saturated points, 133 saturation, 142 chrominance plane, 133 CIE-International Commission on Illumination, 110 CIE standard observer, 89, 111, 139, 140, 168 CMF-Color Matching Function, 111, 119, 138, 174 CoC-Circles of Confusion, 266 diameter of CoC, 266 coherent light, 9 color blindness, 101 color constancy, 106 colorimetric equality, 115
© Springer Nature Switzerland AG 2020 A. Distante and C. Distante, Handbook of Image Processing and Computer Vision, https://doi.org/10.1007/978-3-030-38148-6
485
486 colorimetric functions, see also CMF-Color Matching Function colorimetry, 109 color mixture primary additive, 163 primary subtractive, 161 color physiological evidence, 84 color space CIE 1960/64 UCS, 153 CIElab L ∗ a ∗ b∗ , 156 CIELab LCh, 159 CIELuv LCh, 159 CIE L ∗ u ∗ v ∗ , 158 CIE RGB standard, 129 CIE UCS Sθ W ∗ polar coordinates, 155 CIE UCS U ∗ V ∗ W ∗ , 155 CIE XYZ, 128 CMY, 161 CMYK quadrichromy, 167 CMY ↔ RGB conversion, 167 2D and 3D geometric representation, 140 HSI, 144 HSI → RGB conversion, 151 HSL or HLS, 151 HSV, 151 LCh geometric representation, 159 L ∗ u ∗ v ∗ → XYZ conversion, 158 RGB, 128 RGB geometric representation, 131 RGB → HSI conversion, 148 UVW, 154 YIQ, 160 color temperature, 138 complementary color, 136, 163, 165 Compton effect, 11 conjugated points, 186 connected component, 302 connectivity, 301 contour, 304 contour encoding CC-Chain Code, 324, 342 curved segments approximation, 347 FD-Fourier Descriptors, 377 moments, 371 normalized CC, 343 polygonal approximation, 345 shape number CC, 344 contour representation by convex hull, 349 by MAT skeletonization, 350 by signature, 347
Index contrast manipulation density slicing, 397 gamma transformation, 399 nonlinear, 395 piecewise linear, 392 contrast sensitivity, 96 convolution operator, see also LSI-Linear Shift Invariant properties, 428 convolution theorem, 244, 454 Co-occurrence matrix, 319, 388 corrective lenses, 215 correlation measurement autocorrelation function, 311 ergodic process, 311 shift invariant, 311 covariance matrix, 373 critical angle, 23 crystalline lens, 89, 93, 198, 213 cutoff frequency, 473 CW-ToF camera, 285 D data hierarchical organization Gaussian pyramid, 335 Laplace pyramid, 335 matrix pyramid, 331 octree, 337 pyramides, 331 Quadtree, 332 T-Pyramid, 334 data topological organization graph, 328 RAG-Region Adjacency Graph, 328 relational, 329 Depth of focus, 270 Descartes equation, 181 DFT-Discrete Fourier Transform, 377, 445 for image processing, 447 magnitude, 446 phase angle, 446 power spectrum, 446 separability, 452 4-diagonal neighbors, 294 dielectric constant, see permittivity diffusion phenomenon, 11 digital image, 76 digitalization process digital image representation, 221 image quantization, 254 Nyquist frequency, 246
Index Nyquist rate, 246 optimal quantizer, 258 quantization error, 258 sampling grid, 245 sampling theorem, 245, 264 spatial frequency resolution, 223 digitizer, 260 dioptric power, 214 Dirac Delta function, 246 directional emissivity, 47 E edges, 304 eigenvalue method, 373 electric field, 4 electromagnetic field, 4 electromagnetic wave characteristic, 4 energy, 26 propagation, 13 sources, 8 emissivity, 47 energy conservation, 66 equienergetic white, 119, 122, 133 exitance or radiosity, see radiant emittance F Fermat principle, 185 FFT-Fast Fourier Transform, 453 focal length, 75, 180, 197 focal point, 177 Fourier descriptors, 377 complex FD, 378 normalized, 381 rotation invariant, 379 scale invariant, 379 translation invariant, 378 frequency, 3 frequency convolution operator, 456 2D signal, 456 filter mask selecting, 457 Fresnel equations, 28–30 fundamental radiometry relationship, 75 G Gaussian filter circular symmetry, 466 discrete design, 467 separable, 467 generalized additive trichromy, 120 generalized trichromy, 117
487 GLCM-Gray Level Co-occurrence Matrix, 319 Grassmann’s Law, 112, 117 H halftone screen technique by CMYK model, 168 HDMI-High-Definition Multimedia Interface, 261 Helmholtz reciprocity principle, 65 hemispherical emissivity, 47 histogram, 308 homogeneous stochastic process, 307 human visual system, 17, 79, 87, 109, 213 Visual acuity, 268 I ideal diffusers, see also Lambertian model ideal radiator, 43 image acquisition system MTF, 290 image as a stochastic process, 309 image digitalization, 76 image encoding RL-Run-Length binary, 324 RL-Run-Length color, 326 RL-Run-Length grayscale, 326 image enhancement false color technique, 420 multispectral, 420 natural color, 416 pseudo color, 417 image formation Gaussian PSF, 230 ideal pinhole constraint, 227 perspective projection model (pinhole), 225 spatially invariant optical system, 228 spatially invariant PSF, 229 superposition principle (convolution), 232 image histogram adaptive equalization, 407 cdf image, 392 contrast limited adaptive equalization, 408 equalization, 401 linear & nonlinear modification, 401 pdf image, 389 specification, 410 image irradiance, 76 image metric chessboard distance, 297 city block distance, 296 distance transform, 297 Euclidean distance, 295
488 image noise Gaussian, 313 impulsive, 314 multiplicative, 315 salt-and-pepper, 314 image quality measurement cross-correlation, 312 MSE, 312 PSNR, 313 image perception phases, 215 image processor, 261 image sensor Bayer’s pattern, 170 CCD-Charge-Coupled Device, 218 CFA-Color Filter Arrays technology, 170 CMOS-Complementary Metal Oxide Semiconductor, 218 debayering, 171 FF-Full Frame, 262, 274, 276 raw RG B sensory data, 171 sensitive area, 217 size and resolution, 264 superimposed artificial retinas technology, 171 image statistical measure contrast, 323 energy, 323 entropy, 322 homogeneity, 323 image topological properties compactness, 306 convex hull, 306 Euler number, 305 impulse response, see also PSF-Point Spread Function incoherent light, 9 infrared extreme, 16 far, 16 intermediate, 16 near, 16 infrared camera, 278 integral convolution equation, 244 integrated optical density, 364 invariant radiance, 76 irradiance, 8, 27, 57, 61, 63, 64 isotropic, 65 J JPEG-Joint Photographic Experts Group, 171
Index L LADAR-LAser Detection And Ranging, 279 Lambertian model, 57, 67 Lambertian radiator, 45 Lambertian-specular BRDF, 71 Lambertian-specular reflectance model, 70 Land retinex theory, 83 laser source, 9 lateral inhibition, 100 LED-Light Emitting Diode, 111, 168, 285 lens aberrations, 185 compound, 183 convergent, 184 divergent, 184 simple, 183 LIDAR-Light Detection and Ranging, 279 light, 17 light-matter interaction absorption phenomenon, 10 diffusion phenomenon, 10 fluorescence, 10, 104 image formation, 9 luminescence phenomenon, 10 phosphorescence phenomenon, 10, 104 light particle, 3 light polarization by reflection, 37 light propagation, 4 absorption, 18 reflection, 18 refraction, 18 scattering, 18 light reflection diffuse, 25 Snell-Descartes law, 21 specular, 25 total internal, 23 linear system frequency response, 453 linear system theory, see numerical spatial filtering local and global operators, 421 local operators statistical, 415 luminosity function mesopic, 89 photopic, 55 scotopic, 55 luminous efficiency, 55, 87, 119, 126
Index M Mach bands, 99 magnetic permeability, 5, 8, 13 magnification factor, 182 Maxwell relation, 14 McAdam ellipses, 154 M-connected, 301 metamerism, 114, 161 microscopy, 287 microwaves, 15 mired, 138 mirrored reflectance model, see specular reflectance model Moiré effect, 169, 251, 253 moment, 309 based on orthogonal basis functions, 375 central, 363 central ellipse of inertia, 367 centroid, 363 covariance, 365 Hu invariant, 370 of inertia, 365 rotation invariant, 364 scale invariant, 364 variance, 365 Mondrian mosaic, 107 MPEG-Moving Picture Experts Group, 170 MSE-Mean Square Error, 256, 312 MTF-Modulation Transfer Function, 239, 291 N 4-neighbors, 294 8-neighbors, 294 Newton’s color experiment, 81 NMR-Nuclear Magnetic Resonance, 16 normalized tristimulus values, 123 numerical aperture, 76 numerical spatial filtering, 423 O object radiance, 76 optical aberrations astigmatism, 204 barrel distortion, 205 chromatic, 203 coma, 204 monochromatic, 202 pincushion distortion, 205 optical flow, 96 optical image, 211, 217, 222 optical magnification
489 angular, 198 compound microscope, 199 distinct distance, 199 eye-magnifying, 198 far point, 198 near point, 198 refractive telescope, 201 resolutive power, 201 optical pupil, 91 optical region, 15 Optical-sensor resolving power, 275 optical system parameters AS-Aperture Stop, 205 contrast modulation (MTF), 239 diameter D, 207 DoF-Depth of Field, 265, 269, 274 entrance pupil, 207 f/number (f/#), 207 focal length f, 205 FoV-Field of View, 261 FS-Field Stop, 205 hyperfocal distance, 270 limited diffraction, 238 NA-Numerical Aperture, 207 optical cutoff frequency, 239, 242 phase shift, 242 resolving power, 236 optic nerve, 91 optic nervous system, 91 optic papilla, 95 OTF-Optical Transfer Function, 243 P paraxial ray, 181 perceived luminous flux, 55 permittivity, 5, 13, 33 phase, 13 phase unwrapping, 284 phase wrapping, 284 Phong reflectance model, 71 photometric quantities illuminance, 52 luminance, 52 luminous energy, 52 luminous flux, 55, 110 luminous intensity, 55 luminous power, 55 photometric units candela, 53 lumen, 53 lux, 53
490 photon, 4, 10 physical image, 217 pinhole model, 138, 212, 227 Planck’s constant, 11 point operations homogeneous, 413 nonhomogeneous, 414 polarization, 28 polarization angle, 30 polaroid filter, 38 poynting vector, 7 preface, vii primary visual area, 93 principal ray, 193 PSF-Point Spread Function, 227, 425 PSNR-Peak Signal-to-Noise Ratio, 312 PTF-Phase Transfer Function, 242 Q quantum, 3, 44 quantum mechanics, 12 R RADAR-Radio Detection And Ranging, 279 radiance, 43, 57, 60, 69 radiant emittance, 27 radiant flow or radiant power, 8 radiant intensity, 60 radiant power, 27 radiometers, 26 radiometric and photometric quantities, 55 radiometry, 26 radiosity, 43 radius of gyration, see moment of inertia rainbow phenomenon, 19 Rayleigh criterion, 236 Rayleigh distance, 237 RDF-Reflectivity Distribution Function, 63 real image, 186 reference observer, see CIE standard observer reflectance, 27, 33, 85 reflectance coefficient, 36, 67 region, 302 relative magnetic permeability, 33 relative refraction index, 21, 102 retina layer, 93 retina photoreceptors cones, 89 rods, 89 rings effect, 474
Index S saccade, 93 segmentation, 304 shape descriptors area, 357 basic rectangle, 360 compactness, 358 eccentricity, 368 elongation, 361 GFD-Generic Fourier Descriptors, 382 major and minor axis, 359 moments, 362, 364 new elongation, 368 perimeter, 356 rectangularity, 361 spreadness, 368 sifting property, 233, 247 signature, 347 similarity theorem, 479, 480 simultaneous contrast, 99 sinc function, 249, 463 skeletal-based algorithms DT-transformed distance, 352 MAT-Medial Axis Transformation, 350 thinning, 353 skewness, 309 smoothing frequency domain Butterworth low pass filter, 475 Gaussian low pass filter, 477 ideal low pass filter, 473 trapezoidal low pass filter, 481 smoothing local operators arithmetic average, 460 average filter, 460 binomial filter, 469 computational analysis, 471 Gaussian filter, 466 median filter, 464 minimum & maximum filter, 465 nonlinear filters, 464 Snell’s law, 21, 23, 29, 183, 186 SNR-Signal-to-Noise Ratio, 259 solid angle, 43, 44, 59 SPAD-Single Photon Avalanche Diode, 281 spatial convolution operator 2D computational complexity, 443 computational complexity, 438 mask peculiarity, 439 1D discrete, 430 separable, 443 symmetric & circular mask, 442
Index 2D discrete, 436 spectral absorptivity coefficient, 47 spectral band, 14 spectral density, 44 spectral directional emissivity, 47 Spectral emittance, 47 spectral energy distribution, 110 spectral hemispherical emissivity, 47 spectral luminous efficacy, 54 spectral power, 26 spectral reflectivity coefficient, 47 spectral transmissivity coefficient, 47 spectrophotometer, 84 spectrum of radiation, 14 specular BRDF, 70 specular reflectance model, 69 spherical aberration, 182, 203 Stefan-Boltzmann constant, 43 steradian, 43 subtractive primary colors Cyan, Magenta, Yellow - CMY, 161 subtractive synthesis or black color theory, 161 superimposition principle, 423 SVD-Singular Value Decomposition, 443 synapse, 96 synaptic junction, 93, 96 T telescopic.10, 289 thermal emittance, 47 thermal radiation, 42 thin lens Gaussian formula, 191 magnification factor, 195 TIFF-Tagged Image File Format, 171 ToF-Time-of-Flight camera, 278 modulated light with continuous waves method, 281 pulsed light method, 282
491 transmittance, 28, 33 tristimulus theory, 89 tristimulus values, 118 U ultraviolet, 16 V variance, 309 video camera, see also image sensor frame rate, 276 interlaced scan, 276 progressive scan, 276 video signal analog to digital conversion, 219 digital to analog conversion, 221 frame-grabber, 219 vidicon technology, 218 virtual image, 180 visibility coefficient, 54 visible band, 15 visible region, see also spectral band Visual colorimeter, 109, 121, 138 visual cortex, 86, 93 visual field, 93 visual projection area, 93 W wavelength, 3 wave number, 3, 6 Weber ratio, 97 Wien’s displacement law, 42 X X-rays, 16 Y Young-Helmholtz trichromatic theory, 82, 112