460 67 17MB
English Pages 428 [728]
Image Processing Masterclass with Python
50+ Solutions and Techniques Solving Complex Digital Image Processing Challenges Using Numpy, Scipy, Pytorch and Keras
Sandipan Dey
www.bpbonline.com
FIRST EDITION 2021 Copyright © BPB Publications, India ISBN: 978-93-89898-64-4
All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication, photocopy, recording, or by any electronic and mechanical means.
LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY
The information contained in this book is true to correct and the best of author’s and publisher’s knowledge. The author has made every effort to ensure the accuracy of these publications, but publisher cannot be held responsible for any loss or damage arising from any information in this book.
All trademarks referred to in the book are acknowledged as properties of their respective owners but BPB Publications cannot guarantee the accuracy of this information.
Distributors:
BPB PUBLICATIONS
20, Ansari Road, Darya Ganj
New Delhi-110002
Ph: 23254990/23254991
MICRO MEDIA Shop No. 5, Mahendra Chambers,
150 DN Rd. Next to Capital Cinema, V.T. (C.S.T.) Station, MUMBAI-400 001
Ph: 22078296/22078297 DECCAN AGENCIES 4-3-329, Bank Street,
Hyderabad-500195 Ph: 24756967/24756400
BPB BOOK CENTRE
376 Old Lajpat Rai Market, Delhi-110006
Ph: 23861747
Published by Manish Jain for BPB Publications, 20 Ansari Road, Darya Ganj, New Delhi-110002 and Printed by him at Repro India Ltd, Mumbai
www.bpbonline.com
Dedicated to
My beloved Parents, Shri Sanjit Kumar Dey Shrimati Sila Dey
&
My Uncles and Aunts
About the Author
Sandipan Dey is a data scientist with a wide range of interests, covering topics such as machine learning, deep learning, image processing, and computer vision. He has worked in numerous data science fields, such as recommender systems, predictive models for the events industry, sensor localization models, sentiment analysis, and device prognostics. He earned his master’s degree in computer science from the University of Maryland, Baltimore County and has published in a few IEEE data mining conferences and journals. He has also authored a couple of image processing books, published from an international publication house. He has earned certifications from 100+ MOOCs on data science, machine learning, deep learning, image processing, natural language processing, artificial intelligence, algorithms, statistics, mathematics, and related courses. He is a regular blogger data science central, and medium.com) and is a machine learning education enthusiast.
About the Reviewer
Jyoti Dabass is a Full-Time Ph.D. Scholar at The Northcap University, Gurugram. Her research interest is focused on computer vision, deep learning, fuzzy logic, machine learning, artificial intelligence, image processing, computer-aided diagnosis, and natural language processing. She has earned her master’s degree from J.C. Bose University of Science and Technology, YMCA, Faridabad and has published more than 20 papers in International Journals and conferences. She has earned certifications from 400+ MOOCs on data science, machine learning, deep learning, image processing, and related courses. She believes this book will give you enough hands-on experience to jump-start your career in computer vision.
Acknowledgements
There are a few people who I want to thank for the continued and ongoing support they have given me during the writing of this book. First and foremost, I would like to thank my parents for continuously encouraging me to write the book—I could have never completed this book without their support.
I am grateful to the excellent online courses provided by top schools across the globe over the past few years. Few of them are image processing (@Coursera by Duke, Northwestern), computer vision and image analysis (@edX by Microsoft), computational photography (@Coursera by Georgia Tech), machine learning (@Coursera by Stanford, University of Toronto; @edX by UCSD), and deep learning (@Coursera by deeplearning.ai; @Udacity by Google).
My gratitude also goes to the team at BPB Publication (including Surbhi and Anugraha) for being supportive enough to provide me quite a long time to finish the first part of the book and also to allow me to publish the book in multiple parts, as image processing is a vast and an active area of research, it was impossible to deep-dive into a different class of problems in a single book, especially by not making it too voluminous.
Preface
This book covers many different aspects of image processing through different types of problems —starting from standard image processing to the recent advancements in image processing with machine learning and deep learning models. The key focus is on the implementation of different algorithms using python libraries to solve an exhaustive set of important and relevant image processing problems (with different levels of complexity) in more than one way. The book describes the theoretical concepts required to solve a problem, discussed on the mathematical perspective as well as on the intuitive one, to make it easy for the readers to understand. It demonstrates the impact of running an algorithm on a set of colored images with input/output images. It also provides some cutting-edge solutions to a few advanced problems from the leading image processing research conferences/journals. Every chapter is followed by quite a few challenging exercise problems to create an opportunity for the readers to test the skills they have learned and got hands-on.
The book takes a problem-oriented approach—it focusses on solving many different types of problems (of different levels of difficulty) from the vast majority of areas in digital image processing, by explaining the underlying theory, as well as providing hands-on implementations. Considering the trade-off between the vast majority of topics to be covered in the book and
to keep the volume manageable, this book will be released in two parts.
Part-1 will start from the basics and will gradually proceed towards solving advanced problems. It will start with problems based on basic image processing and manipulation techniques. Then, it will focus on classical image processing algorithms that come from signal processing, for example, sampling, convolution, Fourier transform, frequency domain filtering, and more. Then, it will proceed further to solve problems on image enhancement, for example, using spatial domain filtering. It will also discuss a few machine learning and deep learning model-based approaches to solve quite a few popular image processing problems. Last but not the least, it will also have some sections on solving facial image processing problems, such as face detection, recognition, and more; these problems are gaining more and more interest from image processing communities nowadays.
Part-2 will start with more advanced concepts and assume that the readers already have exposure/some expertise in basic image processing techniques. It will focus on solving problems in image restoration, feature extraction, and variational methods in image processing, and more machine learning/deep learning model-based approaches to solve big image processing problems.
This book is for anyone looking to develop the fundamental concepts in image processing and who wants to explore more advanced problems. Typical targeted readers will be image processing/computer vision researchers who want to learn how to
write code in python to solve some typical image processing/computer vision problems. The readers should have some knowledge of python as a programming language and should have some programming experience. Also, it will be good for the readers to have some basic knowledge about image storage/representation in a computer and some basic math background to get the most out of the book.
This is the first part of the book, and it will consist of the following four chapters, in which you will learn the following: Chapter 1 will aim at solving a few introductory image and video processing problems that will help in understand the basic concepts of image processing. It will focus on problems demonstrating how image and video I/O works, how to apply a few basic image transformation/ manipulation techniques and couple of little advanced problems (e.g., object removal with context-aware image resizing, creating fake miniature), using popular python libraries such as scikit-image, PIL, opencv-python, scipy, numpy and matplotlib.
Chapter 2 will continue on more advanced image trasformation for example geometric transformations such as linear (euclidean/ affine/ projective transformation/ homography) and non-linear transformations on an image (e.g., with inverse-warping). Another set of important image processing problems that we shall look into in this chapter is image hashing (e.g., using a cryptographic hash function to find duplicate images and a perceptual hash function to find similar images), with hashlib and imagehash libraries.
Chapter 3 will focus on signal processing techniques such as sampling, convolution and discrete Fourier transform and apply them to solve a few popular image processing problems (e.g., implement frequency domain filters such as Gaussian/ Butterworth LPF/ HPF and notch filters, reconstruct/ denoise images), using functions from popular python libraries such as scipy and numpy.
Chapter 4 is a continuation of the previous chapter, we shall build on the concepts and solve problems like template matching in frequency domain. We shall also focus on solving classical image processing problems such as image denoising and image compression using discrete Cosine and Wavelet transform, with popular python libaries such as scipy, numpy and pywt. Chapter 5 will discuss on image enhancement problems (e.g., with spatial filters to denoise / Sharpen / enhance contrast of images, detect edges in images etc.), along with some applications (e.g, fingerprint cleaning using morphological operations). Popular python libraries to be used in addition are simpleITK and pytorch.
Chapter 6 is a continuation of the previous chapter and it will focus on solving more image enhancement problems such as using Hough transform to detect shapes, computing depth map from stereo images, performing tone mapping for HDR images and doing distributed image processing, using functions from popular python libaries such as scikitimage, PIL, opencv-python. A few advanced problems using pre-trained deep neural net models (e.g., enhance low-light image, dehaze a hazy image and create high-quality super-resolution images with SRGAN) using deep learning libraries such as tensorflow, keras and pytorch.
Chapter 7 will focus on solving popular facial image processing problems, starting with a couple of fundamental problems such as face detection and facial features (landmarks) detection and then building on the concepts to solve popular problems such as face morphing, face-swapping and face parsing. Also we shall discuss on problems such as age/ gender detection using pre-trained deep learning mdoels and start working on face recognition problems using classical machine learning models, using libraries such as dlib, MTCNN, opencv-python, scikit-learn, as well as pre-trained deep learning models with keras and Microsoft Cognitive Vision APIs for face detection/recognition.
Downloading the code bundle and coloured images:
Please follow the link to download the Code Bundle and the Coloured Images of the book:
https://rebrand.ly/c718b
Errata
We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors, if any, that may have occurred during the publishing processes involved. To let us maintain the quality and help us reach out to any readers who might be having difficulties due to any unforeseen errors, please write to us at :
[email protected]
Your support, suggestions and feedbacks are highly appreciated by the BPB Publications’ Family.
Did you know that BPB offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.bpbonline.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on BPB books and eBooks.
BPB is searching for authors like you
If you're interested in becoming an author for BPB, please visit www.bpbonline.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
The code bundle for the book is also hosted on GitHub at In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at Check them out!
PIRACY
If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material. If you are interested in becoming an author
If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit
REVIEWS
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at BPB can understand what you think about our products, and our authors can see your feedback on their book. Thank you! For more information about BPB, please visit
Table of Contents
1. Basic Image and Video Processing Introduction Structure Objectives Problems Display RGB image color channels in 3D Video I/O Read/write video files with scikit-video Capture video from camera and extract frames with OpenCV-Python Implement Instagram-like Gotham filter The Gotham filter Interpolation with NumPy interp() function Explore image manipulations with different python libraries Plot image montage with scikit-image Crop/resize images with the SciPy ndimage module Draw contours with OpenCV-Python Counting objects in an image Convert a PNG image with palette to grayscale with PIL Different ways to convert an RGB image to grayscale Rotating an image with scipy.ndimage Image differences with PIL Converting RGB to HSV and YUV color spaces with scikit-image Resizing an image with OpenCV-Python Add a logo to an image with scikit-image Change brightness/contrast of an image with linear transform and gamma correction with OpenCV-Python Detecting colors and changing colors of objects with OpenCV-Python
Object removal with seam carving Creating fake miniature effect Summary Questions Key terms References
2. More Image Transformation and Manipulation Introduction Structure Objectives Problems Applying Euclidean and Affine transformation on an image Basics of linear geometric transformations in 2D Rotating an image with scipy.ndimage Flipping and flopping an image with NumPy Apply affine transformation with scipy.ndimage Implement image transformation with warping/inverse warping using scikit-image and scipy.ndimage Applying translation on an image using scikit-image warp Implementing the swirl transformation using scikit-image warp Implementing swirl transform using scipy.ndimage Implementing elastic deformation Image projection with homography using scikit-image Detecting colors and changing colors of objects with OpenCVPython Detecting Covid-19 virus objects with colors in the HSV colorspace Finding duplicate and similar images with hashing Using Perceptual Hash function (pHash) to find similar images using imagehash
Summary Questions Key terms References 3. Sampling, Convolution, Discrete Fourier, Cosine and Wavelet Transform Introduction Structure Objectives Problems Fourier Transform Basics Sampling to increase/decrease the resolution of an image Up-sampling an image by using the DFT and a low pass filter (LPF) Down-sampling with anti-aliasing using the Gaussian filter Denoising an image with LPF/Notch filter in the Frequency domain Removing periodic noise with Notch filter Removing salt and pepper noise using the Gaussian LPF with scipy fftpack Blurring an image with an LPF in the frequency domain Different blur kernels and convolution in the frequency domain Blurring with scipy.ndimage frequency-domain filters With fourier_gaussian With fourier_uniform With fourier_ellipsoid Gaussian blur LPF with scipy.fftpack Convolution in the frequency domain with a colored image using fftconvolve from scipy signal
Edge detection with high pass filters (HPF) in the frequency domain Implementation of homomorphic filters Summary Questions Key terms References 4. Discrete Cosine/Wavelet Transform and Deconvolution Introduction Structure Objectives Template matching with phase-correlation in the frequency domain Image compression with the Discrete Cosine Transform (DCT) JPEG compression Image denoising with Discrete Cosine Transform (DCT) Deconvolution for image deblurring Blur detection Non-blind deblurring with SimpleITK deconvolution filters Non-blind deblurring with scikit-image restoration module functions Image denoising with wavelets Wavelet basics Image denoising using wavelets with pywt Image denoising with wavelets using scikit-image restoration Image fusion with wavelets Fusion algorithm Secure spread spectrum digital watermarking with the DCT Summary Questions Key terms
References 5. Image Enhancement Introduction Structure Problems Image Enhancement Filters with PIL for noise removal and smoothing BLUR filter to remove salt and pepper noise Gaussian BLUR filter to remove salt and pepper noise Median filter to remove salt and pepper noise Max, min, and mode filters to remove outliers from an image Min filter Max filter Mode filter Progressive application of Gaussian blur, median, mode, and max filters on an image Unsharp masking to sharpen an image With the scikit-image filters module With the PIL ImageFilter module Laplacian sharpening with SimpleITK Implementing a unsharp mask with opencv-python Averaging of images to remove random noise Image denoising with curvature-driven algorithms Anisotropic diffusion Contrast stretching/histogram equalization with opencv-python Fingerprint cleaning and minutiaes extraction Fingerprint cleaning with morphological operations Feature (minutiae) extraction from an enhanced fingerprint Edge detection with LOG/zero-crossing, canny versus holisticallynested
Computing the image derivatives With LoG/zero-crossing Marr-Hildteth (LOG) algorithm With canny and holistically-nested (deep learning model based) Canny edge detection Holistically-nested edge detection Summary Questions Key terms References 6. More Image Enhancement Introduction Structure Problems Object detection with Hough transform and colors Counting circular objects in an image with the circle Hough transform Detecting lines with progressive probabilistic Hough transform Detecting objects of arbitrary shapes using the generalized Hough transform Detecting objects with colors in HSV colorspace Object saliency map, depth map, and tone map (HDR) with OpenCV-python Creating object saliency map Creating depth-map from stereo images Tone mapping and High Dynamic Range (HDR) imaging Pyramid blending Constructing the Gaussian pyramid Constructing the Laplacian Pyramid Reconstructing an image only from its Laplacian pyramid Blending images with pyramids
Image Super Resolution with Deep Learning Model (SRGAN) Low-light image enhancement using CNNs Realistic image dehazing using deep neural net Distributed image processing with Dask Summary Questions Key terms References 7. Face Image Processing Introduction Structure Objectives Problems Face morphing with dlib, scipy.spatial, and opencv-python Facial landmark detection with deep learning models Facial landmark detection with Keras Facial landmark detection with the MTCNN Implementation of face swapping Implementation of face parsing Face recognition with FisherFaces Face recognition with Local Binary Patterns Histogram (LBPH) with opencv-python Face detection and recognition with Microsoft Cognitive Vision APIs Summary
Questions Key terms References
Index
CHAPTER 1 Basic Image and Video Processing
Introduction
Image processing refers to the automatic processing, manipulation, analysis, and interpretation of images using algorithms and codes on a computer. Video processing refers to a special case of image processing that often employs video filters and where the input and output signals are video files or video streams. Image and video processing have applications in many disciplines and fields in science and technology such as television, photography, robotics, remote sensing, medical diagnosis (CT scan/X-Ray/MRI), and industrial inspection. Social networking sites such as Facebook and Instagram, which we have got used to in our daily lives and where we upload tons of images/videos every day, are typical examples of the industries that need to use/innovate many image/video processing algorithms to process the images/videos we upload.
In this chapter, we shall solve a few initial image and video processing problems that will help us understand the basic concepts of image and video processing. Before we start processing/analysing an image/video, we need to be able to load the image into memory using a suitable data structure and also be able to save the processed image/video back to the disk. It is also important to be able to visualize (plot) the image on the computer screen (to see the impact of an image processing algorithm on an image immediately). Often an image/a video needs to be pre-processed before it can be used in some complex
image/video processing algorithms (such as classification or segmentation that you will get to know more in the later chapters); some transformation/manipulation techniques (such as resizing/cropping/changing brightness and contrast) are very useful. Similarly, as a post-processing step, we may need to apply some image/video manipulation/transformation techniques to get back the desired output. With image transformation and manipulation, we can also enhance the appearance of an image (for example, by applying a filter).
In this chapter, you are going to learn how to use different Python libraries and for basic image/video processing, manipulation, and transformation. We shall start by displaying the three channels of an RGB image with 3D visualizations. Next, we shall demonstrate how to capture a video from a camera and extract frames. Then, we shall show how to implement Instagramlike Gotham Finally, we shall explore the following few problems on image manipulations and see how to solve them using python libraries:
Plot image montage, crop/resize images, and draw contours Convert PNG image with a palette to grayscale Rotate an image and convert RGB to YUV color space (using and
Structure
This chapter is organized as follows: Objectives
Problems
Display RGB image color channels in 3D Video I/O
Read/write video files
Capture video from camera and extract frames with OpenCVPython
Implement Instagram-like Gotham filter
Explore image manipulations (using and scipy Plot image montage with scikit-image
Crop/resize images with SciPy ndimage module
Draw contours with OpenCV-Python
Counting objects in an image
Convert a PNG image with a palette to grayscale with PIL
Different ways to convert an RGB image to grayscale
Rotating an image with scipy.ndimage
Image differences with PIL
Converting RGB to HSV and YUV color spaces with scikit-image Resizing an image with OpenCV-Python
Add a logo to an image with scikit-image Change brightness/contrast of an image with linear transformation and gamma correction with OpenCV-Python
Detecting colors and changing colors of objects with OpenCVPython
Object removal with seam carving Creating fake miniature effect
Summary
Questions Key terms
References
Objectives
After studying this Chapter, you should be able to: Understand the image/video storage and data structures in python
Do image/video file I/O in python using different libraries
Write python code to do basic image/video manipulations
Problems
Display RGB image color channels in 3D
It is very useful to be able to conceptualize an image as a function and visualize it to understand what it is and then do further analysis/processing. A grayscale image can be thought of a 2-D function f(x, y) of the pixel locations (x, a function that maps each pixel into its corresponding grey level (for example, an integer in [0,255] or equivalently a floating-point number in [0,1]), that is: f : (x, y) → R
For an RGB image, there are three such functions that can be denoted as:
(x, y), (x. y) and y)
which is corresponding to each of the channels and B, respectively. The library matplotlib’s 3-D plot functions can be used to plot each of these functions. The following Python code shows how to plot the RGB channels separately in 3D.
The following are the steps you need to follow:
First, start by importing all the required packages by using the following code. For reading an image, we need the imread() function from the scikit-image library’s io module. For array operations, we need numpy (as an image is loaded as a For displaying an image, we shall use matplotlib.pylab module functions. For 3D plotting, we need mpl_toolkit library’s mplot3d module. The rest of the modules from the library matplotlib are also required for plotting. To display an image with matplotlib inside a notebook, we need to use %matplotlib inline - this is used only for the displaying purpose (not interactive/zoom-able).
# comment the next line only if you are not running this code from jupyter notebook %matplotlib inline from skimage.io import imread import numpy as np import matplotlib.pylab as plt from mpl_toolkits.mplot3d import Axes3D from matplotlib import cm from matplotlib.ticker import LinearLocator, FormatStrFormatter Next, let us implement a function named plot_3d() that plots the pixel values for a channel. It uses the plot_surface() function, which is the key function for 3D plotting. From Matplotlib’s documentation, we can find the following about this function:
Axes3D.plot_surface(X, Y, Z, *args, **kwargs) Create a surface plot.
plot.
plot. plot. plot. plot. plot. plot. plot. plot. plot. plot. plot. plot. plot. plot. plot. plot. plot. plot. plot. plot. plot.
As can be seen from the following code snippet, the Y- and Zaxes are used to show the horizontal and vertical axes (on), respectively, and the X-axis is used to show the depth of the image. Note that and Z must be of the same dimensions. The cmap is the color map used to show the different values of pixels as follows: def plot_3d(X, Y, Z, cmap='Reds', title="): """ This function plots 3D visualization of a channel It displays (x, y, f(x,y)) for all x,y values """
fig = plt.figure(figsi ze=(15,15)) ax = fig.gca(projection='3d') surf = ax.plot_surface(X, Y, Z, cmap=cmap, linewidth=0, antialiased=False, rstrid e=2, cstride=2, alpha=0.5) ax.xaxis.set_major_locator(LinearLocator(10)) ax.xaxis.set_major_formatter(FormatStrFormatter('%.02f')) ax.view_init(elev=10., azim=5) ax.set_title(title, size=20) plt.show()
Let us first read the Lena RGB image from the disk and load it in memory using the scikit-image library’s io module’s imread() function; the description of the function is shown as follows: skimage.io.imread(fname, as_gray=False, plugin=None, flatten=None, **plugin_args) Load an image from file.
file. file. file. file. file. file.
file. file. file. file. file. file. file. file. file.
im = imread('images/Img_01_01.jpg') Then, use Numpy's arange() and meshgrid() functions to create a 2D-grid of pixel coordinates (X,Y) as follows: Y = np.arange(im.shape[0]) X = np.arange(im.shape[1]) X, Y = np.meshgrid(X, Y) Finally, assign the red, green, and blue channels of the image to the variables and Z3, respectively. These channels are displayed in 3D using the plot_3d() function as follows: Z1 = im[…,0]
Z2 = im[…,1] Z3 = im[…,2] Now, let us visualize the image in 3D. The following code block shows how to visualize the color channels of the Lena RGB image with the preceding function. You need to use the Z-axis to be the depth axis, and the Y-axis values are subtracted from the height of the image, just to shift the coordinate from left-top to left-center (otherwise the image will appear upside-down). Use the function plot_3d() to visualize the red color channel first as follows:
# plot 3D visualizations of the R, G, B channels of the image respectively plot_3d(Z1, X, im.shape[1]-Y, cmap='Reds', title='3D plot for the Red Channel') The following image shows the 3D plot for the red channel:
Use the function plot_3d() again, this time to visualize the green color channel of the input Lena image as follows:
plot_3d(Z2, X, im.shape[1]-Y, cmap='Greens', title='3D plot for the Green Channel')
The following image shows the 3D plot for the green channel:
Finally, visualize the blue color channel as follows:
plot_3d(Z3, X, im.shape[1]-Y, cmap='Blues', title='3D plot for the Blue Channel')
The following image shows the 3D plot for blue channel:
As you can see from the preceding figures, with the depth of colors in each channel and the 3D plots look like the original 2D image. Now, it is left as an exercise to you to search in the scikit-image documentation for the function to save an image to disk.
Video I/O
It is very useful to understand what a video is, how to do the video I/O, and visualize specific frames to do further analysis/processing. A video is a series of images (also called played in sequence at a specified frame rate (for example, Hence, if you add another dimension (that is, a sequence of time instances when the images will be played), you get the videos.
In this section, we shall demonstrate how to do the video I/O using Python library functions. First, to read/write a video, we shall use scikit-video library’s io module’s functions and also we shall display some frames extracted from the video. Next, to read an image from the camera, we are going to use opencv-python library’s VideoCapture() function.
Read/write video files with scikit-video
In this problem, we shall first learn how to load a video from the disk using scikit-video library functions. This library uses the FFmpeg software for video I/O under the hood, so the code block demonstrated in this section will only work if first FFmpeg is installed and then scikit-video is installed, so that scikit-video finds the FFmpeg installation (refer to You need to follow the following steps for performing video I/O. Let us start by importing all the required packages by using the following code snippet:
import skvideo.io import numpy as np import matplotlib.pylab as plt
The following code snippet shows how to read a video file (part of a trailer of the movie Spider-Man 3 from the disk using the FFmpegReader() function and display a few frames (images) from the video randomly. The relevant part of the function FFmpegReader() from the documentation is shown as follows: skvideo.io.FFmpegReader(*args, **kwargs) Reads frames using FFmpeg
FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg
FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg FFmpeg
# set keys and values for parameters in ffmpeg inputparameters = {} outputparameters = {} reader = skvideo.io.FFmpegReader('images/Vid_01_01.mp4', inputdict=inputparameters, outputdict=outputparameters)
Also, use the method getShape() (along with the object returned by the FFmpegReader() function) to get the number of frames, height, width, and number of channels of the video as follows: ## Read video file num_frames, height, width, num_channels = reader.getShape() print(num_frames, height, width, num_channels) # 600 916 1920 3
Now, use the nextFrame() method (which yields frames using a python generator) to read the frames from the video by using the following code block.
Choose four frames randomly (with NumPy’s random.choice() function) and display those frames only as follows: plt.figure(figsize=(20,10)) # iterate through the frames and display a few frames frame_list = np.random.choice(num_frames, 4) i, j = 0, 1 for frame in reader.nextFrame(): if i in frame_list: plt.subplot(2,2,j) plt.imshow(frame) plt.title("Frame {}".format(i), size=20) plt.axis('off') j += 1 i += 1 plt.show()
The video was taken from youtube trailer for the spiderman 3 (2007) like this one: https://www.youtube.com/watch?v=wPosLpgMtTY (the exact video that I used couple of years back, I can't find on youtube now)
Binary image processing is often one of the major tasks of an image-processing system (for example, morphological image processing algorithms generally need a binary input image to start with).
To compute a binary image (that is, an image with only two distinct grey-level values, for example, black and white), the simplest way is to use a threshold (above which all pixels will be white, and below which all pixels will be black). The following code block shows how the frames from the preceding video can be thresholded (using the threshold_otsu() function from filter module, we shall describe this function in detail in the segmentation chapter of the next part of the book; for the time being, let us assume it is a blackbox function that turns a grayscale image into a binary image). Apply thresholding on each color channel to obtain a binary frame from an image frame. Use FFmpegWriter() function to save the binary video by accumulating the binary frames sequentially in the same order as shown in the following code snippet.
from skimage.color import rgb2gray from skimage.filters import threshold_otsu writer = skvideo.io.FFmpegWriter("images/spiderman_binary.mp4", outputdict={})
for frame in skvideo.io.vreader("images/Vid_01_01.mp4"): frame = rgb2gray(frame) thresh = threshold_otsu(frame) binary = np.zeros((frame.shape[0], frame.shape[1], 3), dtype=np.uint8) binary[…,0] = binary[…,1] = binary[…,2] = 255*(frame > thresh).astype(np.uint8) writer.writeFrame(binary) writer.close()
Now, read the binary video you just saved using the following code snippet and then display a few random frames (as you did last time) as follows: plt.figure(figsize=(20,10)) # iterate through the frames and display a few frames reader = skvideo.io.FFmpegReader("images/spiderman_binary.mp4") num_frames, height, width, num_channels = reader.getShape() frame_list = np.random.choice(num_frames, 4) i, j = 0, 1 for frame in reader.nextFrame(): if i in frame_list:
plt.subplot(2,2,j) plt.imshow(frame) plt.title("Frame {}".format(i), size=20) plt.axis('off') j += 1 i += 1 plt.show()
Capture video from camera and extract frames with OpenCVPython In this problem, you will learn how to capture video and extract frames using the library cv2 This time we shall capture video (live stream) recorded with a camera (for example, the in-built webcam of a laptop).
Follow the following steps: First import the required libraries
If you are using Jupyter notebook, use %matplotlib notebook this time, to get a zoom-able and resize-able notebook, the best one to work interactively as follows:
# comment the next line only if you are not running this code from jupyter notebook # %matplotlib notebook import cv2 import matplotlib.pyplot as plt
As explained in OpenCV documentation, to capture a video, we need to create a VideoCapture object. Its argument can be either the device index or the name of a video file.
Device index is just the number to specify which camera. Normally one camera is connected to the computer, so simply passing a 0 as a parameter works (We can select the second camera by passing 1 and so on).
You can check whether the VideoCapture object is initialized properly or not with the isOpened() method (check whether it returns true or not). If it returns true, then we can read the very first frame (and all subsequent frames) with the function read() as shown in the following code block. The read() function is the most convenient method for capturing data from the device, and it returns the just grabbed frame. If no frames have been grabbed (camera has been disconnected, or there are no more frames in video file), the method returns false, and the function returns an empty image.
In the following code snippet, the Boolean variable is_capturing holds whether or not a frame could be grabbed as follows:
vc = cv2.VideoCapture(0) plt.ion() if vc.isOpened(): # try to get the first frame is_capturing, frame = vc.read() webcam_preview = plt.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)) else: is_capturing = False
Once the first frame is read properly, we can capture frame-byframe within a while loop, with the condition whether a frame can still be captured.
The following code block shows how to capture the first ten frames.
In the end, don’t forget to call the release() function on the VideoCapture object.
Also note that OpenCV uses BGR color format, and to display the frame with real RGB color, we must use the transformation function cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) as follows:
# capture 10 frames frame_index = 1 while is_capturing: if frame_index > 10: break is_capturing, frame = vc.read() image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # makes the blues image look real colored webcam_preview.set_data(image) plt.title('Frame {0:d} '.format(frame_index)) plt.draw() frame_index += 1 try: # Avoids a NotImplementedError caused by 'plt.pause' plt.pause(2)
except Exception: pass vc.release() If the camera device attached to your computer is working, you should see the image of yourself captured in the frames when you run the preceding code snippet. The cv2.VideoCapture() function can also be used to read a video file from the disk, and cv2.VideoWriter() can be used to save a video file to the disk. Explore these functions on your own.
Implement Instagram-like Gotham filter
In this section, you will learn to implement a filter like the one Instagram uses to enhance the images uploaded to the site. The following figure shows the input image that we want to enhance by implementing an Instagram-like filter :
The Gotham filter
The Gotham filter is computed as follows (the steps taken from applying the following operations on an image: the corresponding python code and input and output images are shown along with the operations (with the following input image).
Let us start by importing the required libraries. In this problem, we shall use the PIL library for image processing functions as follows: from PIL import Image import numpy as np import matplotlib.pylab as plt
im = Image.open('images/Img_01_03.jpg') # assumed pixel values in [0,255] print(np.max(im)) # 255 255
The Gotham filter has the following steps to be implemented: First, a mid-tone red contrast boost needs to be applied on the input image, which is done with the following python code using numpy's interp() function, which is used to implement channel interpolation. Let us first understand how the NumPy interpolation
works for the 1-D case. The following code snippet illustrates the concept.
Interpolation with NumPy interp() function
From the NumPy documentation, we get the following about the interp() function:
numpy.interp(x, left=None, right=None, period=None)
One-dimensional linear interpolation. Returns the one-dimensional piecewise linear interpolant to a function with given discrete data points, evaluated at
Let us say we want to (linearly) interpolate the values of the cosine function in the interval [0, starting with the actual values of the function provided to us only at ten reference points in the interval. We can use the interp() function to compute the value for the function at remaining points, starting with the values of the function at the given points and then by applying linear interpolation. The following code shows how to do it. The orange piecewise-linear curve shows the one estimated by interp() function, and the green curve shows the original cosine curve. As can be seen, the interp() function computed descent estimates for the values of the function at the new points:
# reference points x_p = np.linspace(0, 2*np.pi, 10) # generate sequence of 10 points (numbers) evenly spaced in the interval [0, 2π] # true values at reference points y_p = np.cos(x_p) # test points x = np.linspace(0, 2*np.pi, 50) # generate sequence of 50 test points (numbers) evenly spaced in the interval [0, 2π] # true values at all points y = np.cos(x) # interpolated values at all test points y_interp = np.interp(x, x_p, y_p) # now plot plt.figure(figsize=(20,10)) plt.plot(x_p, y_p, 'o', label='reference points') plt.plot(x, y_interp, '-x', label='interpolated') plt.plot(x, y, '--', label='true') plt.legend(prop={'size': 16}) plt.show() Consider the following graph:
The preceding concept can be used in a similar way to compute channel interpolation values for the R (red) channel using the interp() function as required to be done in the first step of the implementation. The red channel values of an image are essentially a 2D array (matrix), so before you can apply the function on the channel, you need to do the following:
First, flatten the 2D array into a 1D array (using NumPy’s ravel() function)
Then, apply the channel interpolation with the interp() function, and
Finally, reshape the 1D array back to the image matrix (using NumPy’s reshape function). The following code block shows how to use the np.interp() function to stretch the red channel histogram with 11 reference
points (by specifying the old and new red channel values at those points): r, g, b = im.split() # split the channels into red, green and blue r_old = np.linspace(0,255,11) # reference points r_new = [0., 12.75, 25.5, 51., 76.5, 127.5, 178.5, 204., 229.5, 242.25, 255.] # new values at reference points # strech the red channel histogram with interpolation and obtain new red channel values for each pixel r1 = Image.fromarray((np.reshape(np.interp(np.array(r).ravel(), r_old, r_new), (im.height, im.width))).astype(np.uint8), mode='L') Now, plot the images along with the red channel histograms as follows: # plot with 2x2 subplots plt.figure(figsize=(20,15)) plt.subplot(221) plt.imshow(im) plt.title('original', size=20) plt.axis('off') plt.subplot(222) im1 = Image.merge('RGB', (r1, g, b)) plt.imshow(im1) plt.axis('off') plt.title('with red channel interpolation', size=20) plt.subplot(223) plt.hist(np.array(r).ravel(), normed=True) plt.subplot(224) plt.hist(np.array(r1).ravel(), normed=True)
plt.show() The following images show the original image and the image with red channel interpolation:
Make the blacks a little bluer by using the following python code. As can be seen, the blue values are increased by 7.65, and the function np.clip() is used to ensure that the new value remains in between 0 and 255 as follows: plt.figure(figsize=(20,10)) plt.subplot(121) plt.imshow(im1) plt.title('last image', size=20)
plt.axis('off') b1 = Image.fromarray(np.clip(np.array(b) + 7.65, 0, 255).astype(np.uint8)) im1 = Image.merge('RGB', (r1, g, b1)) plt.subplot(122) plt.imshow(im1) plt.axis('off') plt.title('with transformation', size=20)
plt.tight_layout() plt.show() The following images show the last image and the with transformation:
A small sharpening is performed by using the following python code. The enhance() method from the PIL library’s ImageEnhance class is used (with enhancement factor to sharpen the image. From the PIL documentation, we get the following:
class PIL.ImageEnhance.Sharpness(image) This class can be used to adjust the sharpness of an image.
image. image. image. image. image. image. image. image. image. image. image. image. image.
The matplotlib library’s pylab module is used for plotting. A part of the Matplotlib documentation of the subplot() function is as follows:
subplot(nrows, ncols, index, **kwargs) Add a subplot to the current figure.
For example, plt.subplot(121) creates a subplot with one row and two columns and uses the 1st (left) column index for plotting. The function imshow() shows the following image:
from PIL.ImageEnhance import Sharpness plt.figure(figsize=(20,10)) plt.subplot(121) plt.imshow(im1) plt.title('last image', size=20) plt.axis('off') im2 = Sharpness(im1).enhance(3.0) plt.subplot(122) plt.imshow(im2) plt.axis('off')
plt.title('with transformation', size=20) plt.tight_layout() plt.show()
A boost in a blue channel for lower mid-tones
A decrease in a blue channel for upper mid-tones, done with the following python code using channel interpolation again, but this time on the blue channel of the RGB image as follows:
blue_old = np.linspace(0,255,17) # pixel values at reference points blue_new = [0., 11.985, 30.09, 64.005, 81.09, 99.96, 107.1, 111.945, 121.125, 143.055, 147.9, 159.885, 171.105, 186.915, 215.985, 235.875, 255.] # new pixel values at the reference points # now perform a blue channel interpolation b2 = Image.fromarray((np.reshape(np.interp(np.array(b1).ravel(), blue_old, blue_new), (im.height, im.width))).astype(np.uint8), mode='L') Now, plot the images along with the blue channel histograms as follows:
plt.figure(figsize=(20,15)) plt.subplot(221) plt.imshow(im2) plt.title('last image', size=20) plt.axis('off') plt.subplot(222) im3 = Image.merge('RGB', (r1, g, b2)) plt.imshow(im3) plt.axis('off') plt.title('with blue channel interpolation', size=20) plt.subplot(223) plt.hist(np.array(b1).ravel(), normed=True) plt.subplot(224) plt.hist(np.array(b2).ravel(), normed=True) plt.show()
The following figure shows the final output image produced by applying the Gotham filter: plt.figure(figsize=(20,15)) plt.imshow(im3) plt.axis('off') plt.show()
Explore image manipulations with different python libraries
In this section, you shall learn to implement a few image manipulation/transformation/visualization techniques that are very useful on their own and also are often used as essential pre/postprocessing steps for more complex image processing task.
Plot image montage with scikit-image
In this problem, you will learn how to add random noise to an image (with different variance) to create noisy images and then create a montage of images.
You need to run the following steps:
Let us start by importing the required libraries/modules/function using the following code snippet. Also, read the input RGB image of a tiger using skimage.imread() as follows:
from skimage.io import imread from skimage.util import random_noise, montage import matplotlib.pylab as plt import numpy as np im = imread("images/Img_01_04.jpg")
Now, use scikit-image module’s function random_noise() to create a noisy image from the input image by adding Gaussian random noise with a given variance. From the scikit-image documentation, we get the following information about the function skimage.util.random_noise(image, mode='gaussian', seed=None, clip=True, **kwargs)
**kwargs) **kwargs) **kwargs) **kwargs) **kwargs) **kwargs) **kwargs)
**kwargs)
**kwargs)
**kwargs) **kwargs) **kwargs) **kwargs) **kwargs) **kwargs) **kwargs) **kwargs) **kwargs)
**kwargs) **kwargs)
**kwargs) **kwargs)
**kwargs) **kwargs)
**kwargs)
**kwargs)
**kwargs)
**kwargs)
**kwargs) **kwargs) **kwargs) **kwargs)
**kwargs)
**kwargs)
**kwargs)
Use the preceding function to generate noisy images by adding random Gaussian noise with different variances = to the input image. Generate nine different σ values using the NumPy function starting from 0 to 1 (in increasing order of values), as shown in the following code snippet:
sigmas = np.linspace(0, 1, 9) # create 9 standard deviation values in the increasing order starting from 0 to 1 noisy_images = np.zeros((9, im.shape[0], im.shape[1], im.shape[2])) for i in range(len(sigmas)): noisy_images[i,:,:,:] = random_noise(im, var=sigmas[i]**2) # add Gaussian random noise to image with different sigma values
Use the scikit-image util module’s function montage() to create a montage of the noisy images, with the following line of code. The montage() function takes the noisy images ndarray and shows the images in a grid. The following excerpt from the scikit-image documentation describes the input/output parameters of the function: skimage.util.montage(images, fill='mean', rescale_intensity=False, grid_shape=None, padding_width=0, multichannel=False) Create a montage of several single or multichannel images.
images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images. images.
noisy_images_montage = montage(noisy_images, rescale_intensity= True, multichannel=True) # create montage Finally, plot the montage of the noisy images as follows: plt.figure(figsize=(15,15))
plt.imshow(noisy_images_montage) plt.title('Noisy montage', size=30) plt.axis('off') plt.show()
Crop/resize images with the SciPy ndimage module
Resizing/cropping an image is an important pre-processing task (for example, for the deep learning models). In this problem, you will learn how to use the zoom() function from the scipy.ndimage module to zoom into an image and then how to crop a portion of image using NumPy ndarray slicing, staring with the input image of the Goddess Durga. You need to follow the next steps:
Let us start by importing the required functions from the respective python library modules by using the following code snippet:
from scipy import ndimage import matplotlib.pyplot as plt from skimage.io import imread
Read the input image and use the scipy.ndimage.zoom() function to zoom into the image by using the following code block.
Let us first understand the zoom() function. The next section, taken from the SciPy documentation, describes the parameters of the zoom() function as follows: scipy.ndimage.zoom(input, zoom, output=None, order=3, mode='constant', cval=0.0, prefilter=True) Zoom an array, using spline interpolation of the requested order.
order. order. order. order. order. order. order. order. order. order.
order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order. order.
Use the “nearest” mode for the function, which means that the input will be extended by replicating the last pixel.
As can be seen from the following code block, we can specify the zoom factor for each axis separately, and as we want no zoom on color channel, the order of the spline interpolation across this channel is specified to be 1, whereas the zoom factors for the width and height dimensions are specified as 2 as follows: im = imread('images/Img_01_05.jpg') / 255 zoomed_im = ndimage.zoom(im, (2,2,1), mode='nearest', order=1) # no zoom on color channel, order of the spline interpolation = 1 print(im.shape, zoomed_im.shape) # (320, 475, 3) (640, 950, 3) Finally, display the original image and the zoomed version of the image by cropping a part of it (use NumPy ndarray slicing), with the following code block:
plt.figure(figsize=(20,10)) plt.subplot(121) plt.imshow(im) plt.title('Original Image', size=25) plt.subplot(122) plt.imshow(zoomed_im[125:325,375:550,:]) # crop the enlarged face plt.title('Zoomed and Cropped Image', size=25) plt.show()
Draw contours with OpenCV-Python
Contours can be explained simply as a curve joining all the consecutive points (along the boundary), having the same color or intensity. The contours are a useful tool for shape analysis and object detection and recognition. For better accuracy, it is recommended to use binary images to find contours. Hence, before finding contours, thresholding or canny edge detection is needed to be applied on the image. The input parameters for these two functions from cv2 documentation are explained as follows:
cv2.threshold(img, thresh, maxval, type) Applies a fixed-level threshold to image to create a binary image
image image image image image image image image image image image image image image image image image image image image image image image image image image image image image image image image image image image image image image image image
Canny(img, threshold1, threshold2, apertureSize = 3, L2gradient = false)
Finds edges in an image using the Canny algorithm
algorithm algorithm algorithm algorithm algorithm algorithm
algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm algorithm
As you can see from the preceding code, the Canny() function takes the input grayscale image along with double (lower and upper) hysteresis thresholds as input. The idea is that the pixel values > max threshold value will be considered as strong edge pixels, and the pixel values in between the min threshold and the max threshold will be considered as weak edge pixels. Strong edges will be included in the edge map, and weak edges will be included in the edge map if and only if they are connected to strong edges as shown in the following figure:
Generally, the upper threshold value is chosen as 1.5–2 times the lower threshold value. The following code shows how to use findContours() function to compute the contours of a grayscale Einstein image. First, the image is converted to a binary image using the preceding two functions, then the contours are computed. Also, we need to remember that, finding contours with cv2 is like finding a white object from a black background; the object to be found should be white, and the background should be black. The findContours() function accepts three parameters’ the input parameters/return values taken from cv2 documentation are explained below as follows: cv2.findContours(img, mode, method) Finds contours in a binary image. The following are the steps you need to follow:
First, import the required libraries. Read the input Einstein image from the disk using OpenCV-Python’s imread() function. OpenCV stores an RGB image in BGR format; to display it properly with matplotlib.pylab module, we need to convert it to an RGB format by using cv2.cvtColor() function, before plotting the image as follows: import cv2 import numpy as np import matplotlib.pylab as plt image = cv2.imread("images/Img_01_06.jpg") image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # convert from cv2 BGR to matplotlib RGB
Convert the image to grayscale and use a Canny edge detector with appropriate hysteresis thresholds to find the edges in the image. Find contours from the binary edges image as follows: gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) edged = cv2.Canny(gray, 125, 250) contours_edged, _ = cv2.findContours(edged, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) print("Number of Contours found with Canny edges = " + str (len(contours_edged))) # 967 Number of Contours found with Canny edges = 967
Use thresholding to compute the binary image from the grayscale image. Find the contours from the thresholded image as follows: ret, thresh = cv2.threshold(gray, 127, 255, 0) contours_thresh, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) print("Number of Contours found with threshold = " + str(len(contours_thresh))) # 640 Number of Contours found with threshold = 640 Display the original, thresholded, and contour images obtained using matplotlib.pylab module’s imshow() function as follows: plt.figure(figsize=(20,15)) plt.subplot(221), plt.imshow(image), plt.title('Original Image', size=20), plt.axis('off') plt.subplot(222), plt.imshow(thresh, cmap='gray'), plt.title ('Threshold Image', size=20), plt.axis('off') plt.subplot(223), plt.imshow(edged, cmap='gray'), plt.title('Canny Edges Image', size=20), plt.axis('off') plt.subplot(224), plt.imshow(cv2.drawContours(np.copy(image), contours_thresh, -1, (0,255,0), 3)) plt.title('Contour Lines with Threshold Image', size=20), plt.axis ('off') # (Text(0.5, 1.0, 'Contour Lines with Threshold Image'), (-0.5, 639.5, 479.5, -0.5))
Finally, draw the first n=500 contours from the edges binary image as follows:
n = 500 plt.figure(figsize=(7,7)) colors = plt.cm.coolwarm(np.linspace(0, 1, n)) for i in range(n): image = cv2.drawContours(image, contours_edged, i, 255*colors[i], 3) plt.imshow(image)
plt.title('First ' + str(n) + ' Contour lines with Canny Edges', size=20), plt.axis('off') plt.tight_layout() plt.show()
Now, let us see an important application of computing the contours.
Counting objects in an image
Now, you will use OpenCV-Python’s findContours() function to count the number of objects in an image. The following example shows how to count the Bengali alphabets from the input image. The following are the steps that are to be followed:
Import the required libraries. Convert the image to grayscale and use Canny’s edge detector to find the edges in the image using the cv2 function Canny() as follows: import cv2 import numpy as np import matplotlib.pylab as plt image = cv2.imread('images/Img_01_12.jpg') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) edged = cv2.Canny(gray, 75, 150)
Use morphological closing morphologyEx() function) to remove the small holes from the image as follows:
thresh = cv2.threshold(gray, 215, 255, cv2.THRESH_BINARY_INV)[1] kernel = np.ones((2,2),np.uint8) thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
Finally, use the findContours() function to find the contours around all the objects and draw them iteratively with the
drawContours() function as follows:
# find contours (i.e., outlines) of the foreground objects in the thresholded image
_, cnts, _ = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) output = image.copy()
# loop over the contours for c in cnts: # draw each contour on the output image with a 3px thick red # outline, then display the output contours one at a time cv2.drawContours(output, [c], -1, (0, 0, 255), 2) Plot the original, binary, and the output images using the following code block: text = "Found {} objects".format(len(cnts)) cv2.putText(output, text, (50, 220), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2) plt.figure(figsize=(20,7)) plt.subplot(131), plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)), plt.axis('off'), plt.title('Original image', size=20) plt.subplot(132), plt.imshow(thresh, cmap='gray'), plt.axis('off'), plt.title('Binary image', size=20) plt.subplot(133), plt.imshow(cv2.cvtColor(output, cv2.COLOR_BGR2RGB)), plt.axis('off'), plt.title('Counting objects', size=20) plt.show()
Convert a PNG image with palette to grayscale with PIL
In this section, you will learn about PIL image modes and how to convert a PNG colored image with a palette to a grayscale image. As you can see from the following code, the easiest way to do this is to convert the mode of the image from (image that uses a color palette) to (grayscale image) using the convert() function. The following are the steps:
First, import the required libraries and read the image from the disk to an image object using PIL image class’s method Plot the original PNG image with matplotlib.pylab as follows:
#The easy way import numpy as np from PIL import Image import matplotlib.pyplot as plt img = Image.open('images/Img_01_07.png') print(img.mode) plt.imshow(img) plt.axis('off') plt.title('Original Image') plt.show() #P
Next, on the image object, you need to invoke the method convert() as shown in the following code. Plot the output image as follows:
img = img.convert('RGB').convert('L') print(img.mode) plt.imshow(img, cmap='gray') plt.axis('off') plt.title('Grayscale Image') plt.show() #L
There is yet another hard way to convert the image to a grayscale image. It is a little more complex way and requires more lines of code. However, it will clear the concepts about how an image is stored in the format in PIL. The following code snippet does it using the following steps:
Start by importing all the required libraries and define your function rgb2gray() to compute the grey-level intensity value from an RGB color image. Read the PNG image and get the color palette as follows:
img = img.convert('RGB').convert('L') print(img.mode) plt.imshow(img, cmap='gray') plt.axis('off') plt.title('Grayscale Image') plt.show() Iteratively, compute the color palette index for each pixel value in the image using the getpixel() method from PIL Image class
Obtain the R, G, and B values for each pixel from the palette using the index computed (they are stored in consecutive locations) Finally, use the rgb2gray() function (you could use the same function from scikit-image.color module too) to compute the greylevel of the output pixel corresponding to the same location in the image as follows: arr = np.zeros((img.height, img.width)) # initialize the output image with zero values for i in range(arr.shape[0]): for j in range(arr.shape[1]): idx = img.getpixel((j,i)) # get the index of the pixel in the palette
R, G, B = pal[3*idx], pal[3*idx+1], pal[3*idx+2] # get the R,G,B values of the pixel arr[i,j] = rgb2gray(R, G, B) # convert to grayscale Plot the output image as follows: plt.imshow(arr, cmap='gray') plt.title('Grayscale Image') plt.axis('off') plt.show()
Different ways to convert an RGB image to grayscale
In this section, you will learn to implement six different algorithms to convert an RGB color image (3D image with three color channels) to a grayscale (2D) image. The algorithms are described as follows, followed by their straightforward implementations using the function rgb2gray() that you will implement:
Intensity algorithm: The simplest way to convert an RGB to a grayscale image is by taking the average of three color channels: This algorithm is designed to match human brightness perception by using a weighted combination of the RGB channel: 0.3R + 0.59G + 0.11B.
It is computed by taking the maximum of the RGB channels: max(R, G,
Luster: It is the L channel in the Hue, Lightness, and Saturation (HLS) color space. It is computed as the mean of the minimum and maximum RGB value:
(1/2)*(max(R,G,B) + min(R,G,B)) Lab L: Convert the image from an RGB to Lab colorspace and extract the L (intensity) channel.
RGB R: Extract the red channel from the RGB image (you can equivalently extract the G/B channel too).
The following are the steps you need to follow:
First, import the required libraries and define your rgb2gray() function that implements the preceding six algorithms and returns the output grayscale image for each of them as follows:
import numpy as np from skimage.color import rgb2lab from skimage.io import imread import matplotlib.pylab as plt
def rgb2gray(img): gray_images = {} gray_images['intensity'] = np.mean(image, axis=2) gray_images['luminance'] = np.average(image, axis=2, weights= [0.3, 0.59, 0.11]) gray_images['value'] = np.max(image, axis=2) gray_images['luster'] = (np.max(image, axis=2) + np.min(image, axis=2)) / 2 gray_images['Lab L'] = rgb2lab(img)[…,0] gray_images['RGB R'] = img[…,0] return gray_images
Read and plot the original Ishihara image (used for color-blindness test) as follows:
image = imread('images/Img_01_17.png') plt.figure(figsize=(5,5)) plt.imshow(image), plt.axis('off'), plt.title('RGB image', size=20) plt.show()
Call the function rgb2gray() and plot the grayscale images produced by different algorithms as follows:
gray_images = rgb2gray(image) i = 1 plt.figure(figsize=(15,10)) plt.gray() for gray_type in sorted(gray_images): plt.subplot(2,3,i), plt.imshow(gray_images[gray_type]), plt.axis('off'), plt.title(gray_type, size=20) i += 1
plt.suptitle('Conerting RGB to GrayScale image with different methods', size=25) plt.show()
Rotating an image with scipy.ndimage
Sometimes, we may need geometric transformations as a preprocessing step in an image processing task. In this section, you will learn how to rotate an image using the rotate() function that has the following signature (from the documentation, a few of the important parameters are as follows):
scipy.ndimage.rotate(img, angle, mode='constant') Rotate an image counter-clockwise by angle degrees (using spline interpolation of the requested order).
order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order).
The following are the steps you need to follow:
Import the required functions from the corresponding python library-modules. Read the colored input image of the tiger as follows:
from scipy.ndimage import rotate from skimage.io import imread im = imread('images/Img_01_04.jpg')
Apply the rotate() function from scipy.ndimage module to transform the input image along with the rotation value in degrees (for example, use -45 to rotate clockwise by 45 degrees). The anti-clockwise rotation is to be expressed in positive degrees as per the usual convention as follows:
im = rotate(im, -45) plt.figure(figsize=(5,5)) plt.imshow(im) plt.axis('off') # stop showing the axes plt.show()
Image differences with PIL
In practice, we often require to compute the difference in two images differing only slightly (for example, one way to achieve video compression is to store a frame and the difference with the other frames from the first frame, instead of storing all the frames—this technique can achieve high compression ratio if the frames differ only slightly). In this problem, you will learn how to use PIL library functions to compute the difference image of two images. Follow the following steps for the implementation: Import the required modules to get started by using the next code snippet as follows:
from PIL.ImageChops import difference from PIL import Image
Read the input images using the open() function from the Image module
To compute the difference, the input images must be of the same size. Resize the second image to have the same size of the first image. You can use the Image.show() function to see the input images. They are shown in the following figure:
im1 = Image.open("images/Img_01_08.jpg")
im2 = Image.open("images/Img_01_09.jpg").resize((im1.width, im1.height))
We want to compute the difference between the second image from the first one.
Use PIL ImageChop module’s difference() function to compute the difference image of two images as shown in the following code block. Save the difference image by using the save() function from the Image module as follows: difference(im2, im1).show() difference(im2, im1).save('images/Img_01_16.jpg')
The following figure shows how the output difference image looks like. As the second input image only is exactly same as the first
one, except the additional flying bird object, in the difference output all other but these pixels are black (zeros):
Converting RGB to HSV and YUV color spaces with scikit-image
Transforming an image from one color space to another is quite useful; it finds a lot of applications in different algorithms (for example, segmentation). In this problem, you will learn the following:
How to convert a colored image from an RGB to HSV colorspace and back using scikit-image.color module’s rgb2hsv() and hsv2rgb() functions. You will also see the impact of changing the values in the h (hue), s (saturation) channel, and v (value) channels independently for an image.
The following code demonstrates the conversion from an RGB to HSV color space and back:
Let us start by importing the required libraries/modules/functions using the following code snippet:
from skimage.io import imread from skimage.color import rgb2hsv, hsv2rgb import numpy as np import matplotlib.pyplot as plt
Read the input RGB color image of a parrot and convert it to an HSV colorspace using the skimage.rgb2hsv() function. The input image looks like the following:
Use NumPy’s clip() function to ensure that the output pixel values are in [0,1] as follows:
im = imread("images/Img_01_11.jpg") im_hsv = np.clip(rgb2hsv(im), 0, 1) Plot the original input image, along with the and v channels from the transformed image
Change the and value channel values separately and transform back to the RGB colorspace to see the impact on the output image
Use the function subplots_adjust() from Matplotlib to adjust the margin and the white spaces between the subplots (for example, the horizontal and vertical spacing between the subplots are specified to be as follows: plt.figure(figsize=(20,12)) plt.subplots_adjust(0,0,1,0.925,0.05,0.05) plt.gray() plt.subplot(231), plt.imshow(im_hsv[…,0]), plt.title('h', size= 20), plt.axis('off') plt.subplot(232), plt.imshow(im_hsv[…,1]), plt.title('s', size= 20), plt.axis('off') plt.subplot(233), plt.imshow(im_hsv[…,2]), plt.title('v', size= 20), plt.axis('off') im_hsv_copy = np.copy(im_hsv) im_hsv[…,0] /= 4 plt.subplot(234), plt.imshow(np.clip(hsv2rgb(im_hsv), 0, 1)), plt.title('original image with h=h/4', size=20), plt.axis('off') im_hsv = im_hsv_copy im_hsv[…,1] /= 3 plt.subplot(235), plt.imshow(np.clip(hsv2rgb(im_hsv), 0, 1)), plt.title('original image with s=s/3', size=20), plt.axis('off') im_hsv = im_hsv_copy im_hsv[…,2] /= 5
plt.subplot(236), plt.imshow(np.clip(hsv2rgb(im_hsv), 0, 1)), plt.title('original image with v=v/5', size=20), plt.axis('off') plt.show()
From the following example, you will learn about another color model YUV. The channel Y stands for the luminance component (the brightness), and U and V are the chrominance (color) components.
Now, you will learn how to use the color module functions to convert from an RGB to YUV color model and back. Let us first import the additional required functions from the corresponding libraries with the following line of code:
from skimage.color import rgb2yuv, yuv2rgb
Read the RGB input image of the tiger and use the rgb2yuv() function from skimage.color to convert it to a YUV image. The following figure shows how the original input image looks like:
im = imread("images/Img_01_04.jpg") im_Yuv = rgb2yuv(im)
Plot the original image, the luminance (Y), and chrominance channels (U, V) from the YUV image Also, let us change the values of each of the channels in YUV colorspace and transform the image back to the RGB colorspace (with skimage.color.yuv2rgb() function) and observe the impacts the image’s look and feel as follows: plt.figure(figsize=(20,15)) plt.subplots_adjust(0,0,1,0.925,0.05,0.05) plt.gray() plt.subplot(231), plt.imshow(im_Yuv[…,0]), plt.title('Y', size= 20), plt.axis('off') plt.subplot(232), plt.imshow(im_Yuv[…,1]), plt.title('u', size= 20), plt.axis('off') plt.subplot(233), plt.imshow(im_Yuv[…,2]), plt.title('v', size= 20), plt.axis('off') im_Yuv_copy = np.copy(im_Yuv) im_Yuv[…,0] /= 2 plt.subplot(234), plt.imshow(np.clip(yuv2rgb(im_Yuv),0,1)), plt.title('original image with Y=Y/2', size=20), plt.axis('off') im_Yuv = im_Yuv_copy im_Yuv[…,1] /= 3 plt.subplot(235), plt.imshow(np.clip(yuv2rgb(im_Yuv),0,1)), plt.title('original image with u=u/3', size=20), plt.axis('off') im_Yuv = im_Yuv_copy im_Yuv[…,2] /= 4 plt.subplot(236), plt.imshow(np.clip(yuv2rgb(im_Yuv),0,1)), plt.title('original image
with v=v/4', size=20), plt.axis('off') plt.show()
As can be seen from the outputs, decreasing the Y channel values decreases the brightness of the image (does not change the color), whereas changing the U, V channels change the image color.
Resizing an image with OpenCV-Python
Let us revisit image resizing, but this time with OpenCV-Python and with different interpolation techniques. The relevant part from the function’s documentation is as follows:
cv2.resize(src, dsize, fx, fy, interpolation) Resize an image.
image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image.
The interpolation flag can have one of the following methods:
methods: methods: methods: methods: methods: methods: methods: methods: methods: methods: methods: methods: methods: methods: methods: methods: methods: methods: methods:
methods: methods: methods: methods: methods: methods: methods:
methods: methods: methods: methods: methods: methods: methods:
Interpolation is particularly essential when we are scaling down the image, avoiding aliasing artifacts. Different interpolation techniques have different types of smoothing impact on the image. The following code block demonstrates the impact of interpolation on a small Lena image by increasing the size of the image four times (along vertical and horizontal dimension) with different interpolation techniques. Now, let us resize an image with OpenCV-Python by using OpenCV-Python; you need to follow the following steps: Start by importing the required libraries. Read the input image. Let us specify the interpolation algorithms that we shall explore as follows:
import cv2 import matplotlib.pylab as plt im = cv2.imread("images/Img_01_10.jpg") interps = ['nearest', 'bilinear', 'area', 'lanczos', 'bicubic']
Resize the original image (enlarge the image by 4x) with each of the interpolation algorithms and plot the output images obtained in each case by using the following code block:
i = 1 plt.figure(figsize=(18,12)) for interp in [cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_AREA, cv2.INTER_LANCZOS 4, cv2.INTER_CUBIC]:
im1 = cv2.resize(im, None, fx=4., fy=4., interpolation = interp) # 4 times plt.subplot(2,3,i) plt.imshow(cv2.cvtColor(im1, cv2.COLOR_BGR2RGB)) plt.axis('off') plt.title(interps[i-1], size=30) i += 1 print(im.shape, im1.shape) plt.show() (55, 55, 3) (220, 220, 3)
Add a logo to an image with scikit-image
A common way to add some copyright information to an image is by adding a small logo image to it (as a watermark). In this section, you will learn how to write codes to add a logo to an image. The following code block demonstrates the watermarking process; the following are the steps (here, we are assuming that the logo text/object is dark and logo background is light):
Load the required libraries. Read the original and the logo images from the disk with s imread() function as follows:
from skimage.io import imread from skimage.color import rgb2gray, gray2rgb import numpy as np import matplotlib.pylab as plt # Load two images img1 = imread('images/Img_01_13.png').astype(np.uint8) img2 = imread('images/Img_01_14.jpg').astype(np.uint8) # logo
Convert the logo image to grayscale with skimage.color's rgb2gray() function and then to a binary (mask) image by thresholding as follows: # put logo on top-left corner, So create a ROI rows, cols, _ = img2.shape roi = img1[0:rows, 0:cols]
# Now create a mask of logo and create its inverse mask also img2gray = (255*rgb2gray(img2)).astype(np.uint8)
mask = 255*(img2gray < 150) #cv2.threshold(img2gray, 10, 255, cv2.THRESH_BINARY)
Use the binary logo image as mask image, invert the mask, and compute a bitwise and operation (using NumPy’s bitwise_and() function) with the region of interest (of the same shape as the mask and based on where you want to insert your logo) extracted from the original image to compute the background of the output image as follows:
mask_inv = np.invert(mask) #cv2.bitwise_not(mask) mask_inv = mask_inv.astype(np.uint8) # Now black-out the area of logo in ROI img1_bg = np.bitwise_and(roi, gray2rgb(mask_inv)) #cv2.bitwise_and(roi,roi,mask = mask_inv) Again, compute another bitwise operation in between the original image and the mask image to compute the foreground of the output image as follows: # Take the only region of a logo from logo image. img2_fg = np.bitwise_and(img2, gray2rgb(mask)) # cv2.bitwise_and(img2,img2,mask = mask)
Finally, add the foreground and background images computed to obtain the modified region of interest and assign to the
corresponding location in the original image to get the watermarked image. Plot the output image as follows: # Put logo in ROI and modify the main image dst = img1_bg + img2_fg img1[0:rows, 0:cols] = dst
plt.figure(figsize=(20,20)) plt.imshow(img1) plt.axis('off') plt.show()
Change brightness/contrast of an image with linear transform and gamma correction with OpenCV-Python Contrast/brightness enhancements are two techniques that are very regularly used before many image processing tasks (for example, image classification). In this section, you will learn how to improve contrast and brightness of an image using OpenCVPython library functions.
Two commonly used image manipulation techniques (point processes) are multiplication and addition with a constant (the basic linear transform) as follows: g(x)=αf(x)+β. The parameters α>0 and β are often called the contrast (gain) and brightness (bias) parameters, respectively. As earlier, you can think of f(x) as the source image pixels and g(x) as the output image pixels. Then, conveniently, we can write the expression as g(i,j)=α⋅f(i,j)+β, where i and j indicate that the pixel is located in the ith row and jth column. In this recipe, you will learn how to implement the basic linear transform using OpenCV-Python’s convertScaleAbs(), also see its impact on an image; the function is described as follows (from the OpenCV documentation):
cv2.convertScaleAbs(src, alpha, beta) On each channel of the input image, the function performs three operations sequentially: scaling, taking an absolute value, conversion to an unsigned 8-bit type (for an input image I, it output image returned is I*alpha+beta).
I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta).
I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta).
The power-law transform = which is also known as γ-correction, shifts the image towards the dark end of the spectrum when γ > 1 otherwise makes the image appear light when γ < 1. The following code block shows how to use LUT() function to change the lookup table of an image to a new one with the power-law transform. The function is described as follows (from the OpenCV documentation): cv2.LUT(src, lut) On each channel of the input image, the function performs three operations sequentially: scaling, taking an absolute value, conversion to an unsigned 8-bit type (for an input image I, it output image returned is I*alpha+beta).
I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta). I*alpha+beta).
Let us start again by importing the required libraries and functions. Use the OpenCV-Python’s function (described earlier) to define the functions basic_linear_transform() and gamma_correction() to implement the change of contrast and gamma correction, respectively, as shown in the following code block: import cv2 import numpy as np import matplotlib.pylab as plt alpha, beta, gamma = 1, 0, 1
def basic_linear_transform(img, alpha, beta): return cv2.convertScaleAbs(img, alpha=alpha, beta=beta)
def gamma_correction(img, gamma): lookup_table = np.empty((1,256), np.uint8) for i in range(256): lookup_table[0,i] = np.clip(pow(i / 255.0, gamma) * 255.0, 0, 255) return cv2.LUT(img, lookup_table) image = cv2.imread('images/Img_01_01.jpg') With different values of α and β, call the function basic_linear_transform() to change the brightness of the input Lena image as follows: plt.figure(figsize=(20,20)) i = 1
for alpha in [0.25, 0.5, 1, 1.5, 2.5]: for beta in [0, 0.5, 1, 1.5, 2]: image_corrected = basic_linear_transform(image, alpha, beta) plt.subplot(5,5,i), plt.imshow(cv2.cvtColor(image_corrected, cv2.COLOR_BGR2RGB)), plt.axis('off') plt.title(r'$\alpha$={:.2f }, $\beta$={:.2f }'.format(alpha, beta), size=20) i += 1 plt.suptitle('Basic linear transform to change brightness', size=30) plt.show()
Call the function gamma_correction() with different values of the input parameter γ and plot the output images as follows: plt.figure(figsize=(20,20)) i = 1 for gamma in np.linspace(0, 2, 16): image_gamma_corrected = gamma_correction(image, gamma) plt.subplot(4,4,i), plt.imshow(cv2.cvtColor(image_gamma_corrected, cv2.COLOR_BGR2RGB)), plt.axis('off')
plt.title(r'$\gamma$={:.2f }'.format(gamma)) i += 1 plt.suptitle('Gamma correction', size=30) plt.show()
Detecting colors and changing colors of objects with OpenCVPython A simple way to detect and change the color of an object in an image to transform the image from an RGB to HSV color space and then use a range of hues to detect the object, this can be easily done using OpenCV-Python. We need to specify a range of color values using which the object we are interested in will be identified and extracted. We can change the color of the object detected and even make the object detected transparent. For this problem, the input image we shall use will be a brown horse in a field, and the object of interest will be the horse. We shall detect the brown change it to a black by keeping everything else in the image as they are, by working in an HSV space; the following are the steps:
Load the required libraries and read the input image as follows:
import import import img =
cv2 numpy as np matplotlib.pylab as plt cv2.imread("images/Img_01_18.png")
Convert the input image from a BGR to an HSV color space as follows:
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
Create a mask for the horse by selecting the possible range of HSV colors that the brown horse can have. Extract the brown horse from the input image as follows:
mask = cv2.inRange(hsv, (0, 70, 25), (15, 255, 255)) imask = mask>0
brown = np.zeros_like(img) brown[imask] = img[imask] Change the color of the brown to black by reducing all the HSV channel values and then convert the image back to the BGR space.
The function to be used for extracting the pixels corresponding to the colored object (by checking whether a pixel value is within a range of values specified by a lower bound and an upper bound) is described as follows (from the OpenCV documentation):
cv2.inRange(src, lowerb, upperb) Checks if array elements lie between the elements of two other arrays
arrays arrays arrays arrays arrays arrays arrays arrays arrays arrays arrays arrays
black = img.copy() hsv[…,0:3] = hsv[…,0:3] / 3 black[imask] = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)[imask] black = np.clip(black, 0, 255) Finally, plot the input image, the extracted brown horse, and the output image with the horse color changed to black by using the following code block: plt.figure(figsize=(20,10))
plt.subplots_adjust(0,0,1,0.9,0.01,0.075) plt.subplot(131), plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)), plt.axis('off'), plt.title('original', size=20) plt.subplot(132), plt.imshow(cv2.cvtColor(brown, cv2.COLOR_BGR2RGB)), plt.axis('off'), plt.title('only horse', size=20) plt.subplot(133), plt.imshow(cv2.cvtColor(black, cv2.COLOR_BGR2RGB)), plt.axis('off'), plt.title('horse color changed', size=20) plt.suptitle('Detecting and changing object colors with opencvpython', size=25) plt.show()
Object removal with seam carving
Seam carving is a content-aware image resizing technique where the image is reduced in size by one pixel in height (or width) at a time. A vertical seam in an image is a path of pixels connected from top to bottom with one pixel in each row. A horizontal seam is a path of pixels connected from left to right with one pixel in each column.
Finding and removing a seam (using dynamic programming) involves the following three parts:
Energy calculation: The higher the energy, the less likely the pixel will be included as a part of a seam. For example, the dualgradient energy function can be used for energy computation, the energy of (i, j)th pixel being e(i, j).
Seam identification: The following step is to find a vertical or horizontal seam of minimum total energy. This is similar to the classic shortest path problem in an edge-weighted digraph, with the important difference that the weights are on the vertices instead of the edges. The goal is to find the shortest path from any of the W pixels in the top row to any of the W pixels in the bottom row. The digraph is acyclic, where there is a downward edge from pixel (x, y) to pixels (x − 1, y + (x, and (x + 1, y + assuming that the coordinates are in the prescribed ranges. Also,
seams cannot wrap around the image. The optimal seam can be found using dynamic programming. The first step is to traverse the image from the second row to the last row and compute the minimum cumulative energy, for all = + – j – 1), j – 1), – 1, possible connected seams for each pixel (i, as follows:
Seam removal: The final step is to remove from the image all of the pixels along the vertical or horizontal seam.
You can use seam carving to remove objects or artefacts from images too, which is the typical problem that you will learn to solve using implementation of seam carving. This requires weighing the object region with low values, as the low weights are preferentially removed in seam carving. We shall use an input image from NASA’s publicly available images; use a mask image of the same shape as the original input to mask a few tower objects from the image as follows: Start by importing the required libraries. Note that you must use scikit-image version < 0.15 to be able to use the seam carving implementations, in the future versions, it is removed due to patenting as follows: # pip install scikit-image==0.14.2 import skimage print(skimage.__version__) # 0.14.2 from skimage.io import imread from skimage.color import rgb2gray from skimage.transform import seam_carve import matplotlib.pylab as plt
Read the input and the mask image. Note that the mask image is a binary image, that has black pixels corresponding to the objects that we want to be removed from the image. Plot the images using the following code block as follows: image = imread('images/Img_01_27.png') mask_image = rgb2gray(imread('images/Img_01_28.png')) print(image.shape) plt.figure(figsize=(20,20)) plt.subplot(121), plt.imshow(image), plt.title('Original Image', size=20) plt.subplot(122), plt.imshow(mask_image, cmap='gray'), plt.title ('Mask for the object to be removed (the towers)', size=20) plt.tight_layout() plt.show() # (1536, 1079, 3)
Use the function seam_carve() from the skimage.transform module, along with the input and the mask image; ask the function to remove 120 (min-energy) vertical seams; plot the output image by using the following code snippet. You will obtain a figure like the following one, with the desired objects removed (without causing visual distortions/artefacts):
plt.figure(figsize=(10,15)) plt.title('Objects Removed', size=15) out = seam_carve(image, mask_image, 'verical', 120) plt.imshow(out) plt.show()
Creating fake miniature effect
The pattern of blur in an image can strongly influence the perceived scale of the captured scene. The blur plays a significant role in conveying the desired sense of size and distance. Miniature faking, also known as diorama effect/illusion, is a process in which a photograph of a life-size location or object is made to look like a photograph of a miniature scale model. Blurring parts of the photo simulate the shallow depth of field normally encountered in close-up photography, making the scene seem much smaller than it is.
Applying a blur gradient that approximates a shallow depth of field can induce the miniaturization effect. The effects are most convincing when the images are large and viewed from a short distance. In this problem, you will learn how to create fake miniature effect in an image using PIL library functions. You need to follow the following steps for implementation:
Use bell whistles mask to allow selection of arbitrary objects that should be in focus. The binary mask that we shall use here has black pixels for the objects to be in focus, otherwise, there are white pixels. Apply a Gaussian blur to the image, and use the mask to select the appropriate pixels from either the original image or the blurred image.
To simulate the depth of field effect, continue this blurring process by repeatedly applying the Gaussian blur to the already blurred image, and repeat this process with the mask scaled using morphological erosion. This will generate a linear gradient in the blurring mask.
Enhance the color and brightness of the image.
Let us start by importing all the required libraries by using the following code:
from PIL import Image, ImageEnhance, ImageFilter from scipy.ndimage import binary_erosion from PIL.ImageFilter import GaussianBlur import matplotlib.pyplot as plt, numpy as np
Let us implement the following function that simulates the depth of field effect by iteratively applying the Gaussian blur (using GaussianBlur() with a given radius) along with the mask scaled using morphological erosion (using Each time the mask will get an enlarged and repetitive application of the Gaussian blur that will ensure gradient blurring, increasing the blur amount as follows:
def apply_gradient_blur(image, mask, n=10, radius=1): mask = mask.convert("1") for i in range(n):
mask = binary_erosion(np.array(mask), structure=np.ones((10, 10)), border_value=1) im_blur = image.filter(GaussianBlur(radius=radius)) image.paste(im_blur, mask=Image.fromarray(mask)) return image
Next, define the following function to create the fake miniature effect using the preceding function
Enhance the color and contrast of the image using PIL.ImageEnhance module's Color() and Contrast() functions, respectively
Use the preceding function to apply gradient blur on the region outside the focus (that is, corresponding to the white pixels in the mask) Merge the enhanced image (the region in focus) with the blurred image (the region outside the focus) as follows: def create_fake_miniature(im, custom_mask, color=1.9, contrast=1.4, blur_radius=1.3): # Cranking up the contrast and color edited = ImageEnhance.Contrast(ImageEnhance.Color(im).enhance (color)).enhance(contrast) # Blurring the image and merging im_blur = apply_gradient_blur(edited.copy(), mask.copy(), n=50, radius=blur_radius)
edited = edited.convert("RGBA") edited.paste(im_blur, mask=mask) return edited Now, read the input image and the mask image (that defines the regions to be in focus) from disk using PIL’s Image.open() function Create a fake miniature effect with the image as input Plot the input image by using the following code snippet:
im = Image.open("images/Img_01_29.png") mask = Image.open("images/Img_01_30.png") out = create_fake_miniature(im, mask) plt.figure(figsize=(20,10)) plt.imshow(im), plt.axis('off'), plt.title('Original image', size= 20) plt.show()
Plot the binary mask image as follows: plt.figure(figsize=(10,10)) plt.imshow(mask), plt.axis('off'), plt.title('(Bell Whistles) Mask image', size=20) plt.show()
Finally, plot the output image with a fake miniature effect as follows: plt.figure(figsize=(20,10)) plt.imshow(out), plt.axis('off'), plt.title('Fake Miniature image', size=20) plt.show()
As you can see from the preceding figure, the buildings in focus appear to be closer, as expected.
Summary
In this chapter, we covered the basic image and video manipulation techniques. By now, you should be able to read/write/display images/videos, extract image-frames from videos, crop/resize/rotate images, undergo different geometric transformations, convert to different color spaces, change brightness /contrast using different python libraries, such as opencv-python, and You should be able to solve more advanced problems such as object detection with colors, object removal with seam carving, and creating fake miniature effects. There are a few more python image processing libraries (for example, SimpleITK and try to implement the image I/O, and manipulation operations using the functions from these libraries. In the next chapter, we shall discuss more advanced problems with image manipulation and introduce a few big problems in facial image processing.
Questions
Show visually that the constant α for the basic linear transform is associated with the contrast of the image and plot the color channel histograms for different values of α—you will notice from the following figure that the change in α value changes the range (spread) of values of the color channel. You should get a figure like the following one with Lena input image:
Superimpose two inverted images on top of each other. Use the screen() function from PIL Given the input images shown in the following figure, you should obtain an output like the one shown as follows:
Add a transparent lion PNG image to a background image of a leaving room using PIL’s alpha_composite(), and blend() functions and compare the outputs obtained. You should get a figure like the following one:
Start with the following two input images, namely, the map of India and the flag of India as follows:
Find the extreme points in India’s map and plot the bounding rectangle. Remove/replace the background of the map image with opencv-python to obtain a figure like the following one:
Use context-aware-image-resizing (with seam carving) to resize the following input image (with original size 320 × 480) as follows:
Compare the resized images with the regular resize and resize with seam carve (Hint: remove horizontal seams). You should obtain an image like the following one:
Pointillism is a technique of painting in which small, distinct dots of color are applied in patterns to form an image. Use the python library pointillism (install the library) to create an artistic image from the following input image of elephants:
Your output should look like the one shown in the following figure:
Key terms
RGB, YUV, Gotham, Crop, Resize, Montage, Contour, Rotate, Affine, Transfer function, Seam Carve, Fake Miniature, Object Detection, Object Removal
References
https://perso.crans.org/frenoy/matlab2012/seamcarving.pdf http://graphics.berkeley.edu/papers/Held-UBA-2010-03/Held-UBA2010-03.pdf
https://docs.opencv.org/master/d6/d00/tutorial_py_root.html https://scikit-image.org/docs/stable/
https://pillow.readthedocs.io/en/stable/
https://sandipanweb.wordpress.com/2017/10/14/seam-carving-usingdynamic-programming-to-implement-context-aware-image-resizing-inpython/
https://www.youtube.com/watch?v=lF0aOM3WJ74
https://www.academia.edu/2793649/Color_to_grayscale_does_the_met hod_matter_in_image_recognition
CHAPTER 2 More Image Transformation and Manipulation
Introduction
Image transformation is an art of transforming an image. With image transformation and manipulation, we can enhance the appearance of an image. As mentioned in the previous chapter, the transformation and manipulation operation can also be used as pre-processing steps for more complex image processing tasks. In this chapter, we shall work on image processing problems based on more advanced image transformation, for example, geometric transformations such as linear (Euclidean/affine transformations) and non-linear transformations on an image (for example, with inverse warping). Then, we shall work on some problems related to another type of linear transformation known as the projective transformation/homography. These transformations will be implemented using module functions.
Finally, you will learn how to use a cryptographic hash function to find duplicate images and a perceptual hash function to find similar images, using hashlib and imagehash libraries, respectively.
Structure
This chapter is organized as follows: Objectives
Problems
Applying Euclidean and Affine transformations on an image Basics of linear geometric transformations in 2D
Rotating an image with scipy.ndimage
Flipping and flopping an image with NumPy
Applying affine transform with scipy.ndimage
Implement image transformation with warping/inverse warping using scikit-image and scipy.ndimage Applying translation on an image using scikit-image warp
Implementing the swirl transformation using scikit-image warp
Implementing swirl transform using scipy.ndimage
Implementing elastic deformation
Image projection with homography using scikit-image
Detecting colors and changing colors of objects with OpenCVPython
Finding duplicate and similar images with hashing
Summary Questions
Key terms References
Objectives
After studying this chapter, you should be able to: Apply 2D linear geometric transformations (for example, Euclidean and affine) to an image
Apply non-linear transformation with inverse warping Apply projective transformation to an image with homography
Detect duplicate images using a cryptographic hash function
Find images similar to a given image using a perceptual hash function
Problems
Applying Euclidean and Affine transformation on an image
In this section, you will learn about 2D linear transformations (from projective geometry) and how to apply them to an image. Let us first start with the basics math that you should know in order to understand what is going on under the hood before we start the actual implementation. We shall discuss how to apply Euclidean (for example, rotation / reflection) and affine transformation on an image.
Basics of linear geometric transformations in 2D
These transformations are point transformations, as they are applied to each pixel (point in 2D, for a grayscale image) in image. They are linear, as the transformation can be carried out with matrix multiplications (and additions). There are four classes of linear transfomations as follows:
Euclidean transformation/isometry Similarity transformation
Affine transformation
Homography transformation
The simplest of all the transformations is called Euclidean/isometric transformation, which preserves Euclidian distance = same and metric = This is an orthogonal transformation and can be represented as follows: ψ(p) = Rp + t where R is a 2 × 2 orthogonal matrix, t is a 2 × 1 translation vector, and p = [x, is the point to be transformed.
The invariants for Euclidean transform are distance, length, and area—these quantities remain unchanged after the isometric transformation. Isometry has three degrees of freedom (one for the angle θ corresponding to the rotation matrix R and the other two for the translations along the x-axis and y-axis direction, corresponding to the translation vector).
The set of isometries form a group (in the abstract algebraic sense), where the identity element in the group corresponds to R = and t = 0 (here, is a 2 × 2 identity matrix).
The matrix R in Euclidean transform is orthogonal (with R = For rotation, the determinant of the matrix is 1, whereas for reflection, the determinant is -1. In homogeneous coordinates, a Point [x, in 2D is represented as [x, y, The rotation and translation for a Euclidean transform can be captured by a single 3 × 3 matrix H in homogeneous coordinates, as shown in the following figure. An isometry composed with a uniform/isotropic (only position) scaling is known as a similarity transformation. It has four degrees of freedom; the additional parameter comes from the isotropic scaling.
An affine transformation can be thought of as a non-singular linear transformation followed by a translation. It has six degrees
of freedom, with non-isotropic (no-uniform) scaling contributing to the additional two parameters. The projective transformation is a general non-singular linear transformation in homogeneous coordinates, it has eight degrees of freedom, as shown in the following figure. We shall discuss more on this transformation and how to implement it in the later sections of this chapter.
The hierarchy of the linear geometric transformations in 2D are shown in the following figure. As seen, the Euclidean is a special case of the affine, the latter being a special case of the perspective transformation:
Finally, let us summarize the properties of the transformations we learnt earlier in the following figure:
Rotating an image with scipy.ndimage
Sometimes, we may need geometric transformations as a preprocessing step in an image processing task. In this section, you will learn how to rotate an image using rotate() function that has the following signature (obtained from the documentation and a few of the important parameters are shown as follows):
scipy.ndimage.rotate(img, angle, mode='constant') Rotate an image counter-clockwise by angle degrees (using spline interpolation of the requested order).
order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order). order).
The following are the steps you need to follow:
Import the required functions from the corresponding python library modules. Read the following colored input image of the tiger:
# comment the next line only if you are not running this code from jupyter notebook %matplotlib inline
from scipy.ndimage import rotate from skimage.io import imread import matplotlib.pylab as plt
im = imread('images/Img_02_04.jpg')
Apply the rotate() function from scipy.ndimage module to transform the input image along with the rotation value in degrees (for example, use -45 to rotate clockwise by 45 degrees). The anti-clockwise rotation is to be expressed in positive degrees, as per the usual convention as follows:
im = rotate(im, -45) plt.figure(figsize=(5,5)) plt.imshow(im) plt.axis('off') # stop showing the axes plt.show()
Flipping and flopping an image with NumPy
In this problem, you will learn how an image can be reflected vertically (that is, flipped) and horizontally (that is, flop) using NumPy’s ndarray flipping. You need to follow the following steps:
Let us start by importing the following required libraries:
import matplotlib.pyplot as plt import numpy as np
Read the input image (a cat in front of a water tank) with the imread() function from the matplotlib.pylab module to obtain the image ndarray as
im = plt.imread('images/Img_02_42.jpg')
Flip the image ndarray (reflect vertically, up-down) using flipud() function from NumPy by using the following line of code.
im_filpped = np.flipud(im) Plot the original and flipped image by using the following code snippet:
plt.figure(figsize=(10, 12))
plt.subplot(211), plt.imshow(im), plt.axis('off'), plt.title ('original', size=20) plt.subplot(212), plt.imshow(im_filpped), plt.axis('off'), plt.title('flipped', size=20) #np.fliplr(im) plt.show()
Now, use the function numpy.filphr() to flop (horizontally reflect, left-to-right) a different input image (a lady in front of a mirror) by using the following code block: im = plt.imread('images/Img_02_43.jpeg') im_filpped = np.fliplr(im) plt.figure(figsize=(15, 12)) plt.subplot(121), plt.imshow(im), plt.axis('off'), plt.title ('original', size=20) plt.subplot(122), plt.imshow(im_filpped), plt.axis('off'), plt.title('flopped', size=20) #np.fliplr(im) plt.show()
Apply affine transformation with scipy.ndimage
In this problem, we are going to demonstrate the affine transformation on an image using SciPy library’s ndimage module’s functions.
An affine transformation sends each pixel f(x, y) from the input image to its corresponding at location (x′, y′) = T(x, y) in the output image. However, what if a transformed pixel coordinates lie in between two pixels in the output image? This problem is often tackled with inverse mapping (warping).
For each pixel at location (x′, y′) in the output image, the following value is obtained from the pixel value in the input image at its corresponding location:
(x, y) = (x′, y′)
If the pixel in the input image comes from between two pixels, the pixel value is computed using interpolated (for example, with bi-linear interpolation) pixel values from the neighbors.
The following figure shows the concept of forward warping and inverse warping:
The following screenshot shows the matrices (M) for each of the affine transformation operations:
From the SciPy documentation, we get the following image about the affine_transfom() function that we shall use to implement image transformation by passing an affine transformation matrix to the function (few of the input arguments to the function are described): scipy.ndimage.affine_transform(input, matrix, offset=0.0, output_shape= None, output=None, order=3, mode='constant', cval=0.0, prefilter=True) Apply an affine transformation. The pixel value in the output image at location o is determined from the pixel value in the input image at position np. dot(matrix, o) + offset (where matrix and offset refer to couple of additional parameters passed to the function).
function).
function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function). function).
Apply the following steps to implement the affine transformation with SciPy library’s ndimage module: First, import all the Python libraries required using the following code snippet: # comment the next line only if you are not running this code from jupyter notebook %matplotlib inline from skimage.io import imread from scipy.ndimage import affine_transform import numpy as np import matplotlib.pylab as plt
Read the image and use the function affine_transform() by passing the 3 × 3 transformation matrix (in homogeneous coordinates) and the offset to carry out the transformation as shown in the
following code block. Here, we will rotate the image in the positive (anti-clockwise) direction by 45 degrees along with sheers along the x- and y-axis (the @ operator is used to multiply the corresponding transformation matrices) as follows:
im = imread("images/Img_02_01.jpg") rot_mat = np.array([[np.cos(np.pi/4),np.sin(np.pi/4), 0],[np.sin(np.pi/4),np.cos(np.pi/4), 0], [0,0,1]]) shr_mat = np.array([[1, 0.45, 0], [0, 0.75, 0], [0, 0, 1]]) transformed = affine_transform(im, rot_mat@shr_mat, offset=[im.shape[0]/4+25, im.shape[1]/2-50, 0], output_shape=im.shape) Plot the input and output image. The following figure shows the output image generated by running the preceding code snippet:
plt.figure(figsize=(20,10)) plt.subplot(121), plt.imshow(im), plt.axis('off'), plt.title ('Input image', size=20) plt.subplot(122), plt.imshow(transformed), plt.axis('off'), plt.title('Output image', size=20) plt.show()
Implement image transformation with warping/inverse warping using scikit-image and scipy.ndimage In this problem, we shall demonstrate how to transform an image using a more generic function call warp() that applies an inverse warping to an image.
Applying translation on an image using scikit-image warp
For all possible pixels in the translated output image (x, the corresponding points in the input image (u, v) can be found with the following equations:
where and denotes translations along the x and y-axis, respectively (and x(u, v) = x, y(u, v) = We can use inverse warping to implement the image transformation as well. The advantage of using the warp function is that it is more generic and can be used to implement both linear (for example, transformations that can be implemented using matrix multiplication) and non-linear image transformations. Instead of the transformation matrix, we need to provide the function that does the (inverse) transform.
From the scikit-image documentation, we get the following about the warp() function that we shall use to implement image transformation in a more generic way (few of the input arguments to the function are described as follows)as follows:
skimage.transform.warp(image, inverse_map, map_args={}, output_shape= None, order=1, mode='constant', cval=0.0, clip=True, preserve_range=False) Warp an image according to a given coordinate transformation
transformation transformation transformation transformation
transformation transformation transformation transformation transformation
transformation transformation transformation transformation transformation
transformation transformation transformation transformation
transformation transformation transformation transformation
transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation transformation
The following steps are needed to be run to implement the image translation using the warp() function: First, import all the libraries required Then, import the function warp from scikit-image library’s transform module as follows:
from skimage.io import imread from skimage.transform import warp import matplotlib.pylab as plt
Next, define the translate() function that implements the pixel-wise translation by using the following code snippet:
def translate(xy, t_x, t_y): xy[:, 0] -= t_y xy[:, 1] -= t_x return xy
Read the sea–beach input image and use the warp() function with the translate function as an argument to translate the image. The following figure shows the output of the following code snippet and the output image obtained when the preceding translation is applied to the input image:
im = imread('images/Img_02_01.jpg') im = warp(im, translate, map_args={'t_x':-250, 't_y':200}) #create a dictionary for translation parameters plt.imshow(im) plt.title('Translated image', size=20) plt.show() The following image shows the translated
Implementing the swirl transformation using scikit-image warp
In the previous sub-section, we have seen how the warp() function can be used to implement a linear transformation. In this sub-section, we shall demonstrate how it can be used to implement a non-linear transformation called swirl.
Consider the coordinate (x, y) in the output image. The reverse mapping for the swirl transformation first computes, relative to a center (x0, its polar coordinates as follows:
and then, transforms them according to the following:
where ψ = rotation and s = strength.
Follow the following steps to implement a swirl transform:
Define the swirl() function that implements the preceding pixelwise transformation by using the following code snippet:
def swirl(xy, x0, y0, R): r = np.sqrt((xy[:,1]-x0)**2 + (xy[:,0]-y0)**2)
a = np.pi*r / R xy[:, 1] = (xy[:, 1]-x0)*np.cos(a) + (xy[:, 0]-y0)*np.sin(a) + x0 xy[:, 0] = -(xy[:, 1]-x0)*np.sin(a) + (xy[:, 0]-y0)*np.cos(a) + y0 return xy
Read the input image of the lion and use the warp() function with the swirl function as an argument this time to apply the non-linear transformation to the image. The following figure shows the output of the following code snippet:
im = imread('images/Img_02_02.jpg') print(im.shape) im1 = warp(im, swirl, map_args={'x0':220, 'y0':360, 'R':650}) plt.figure(figsize=(20,10)) plt.subplot(121), plt.imshow(im), plt.axis('off'), plt.title('Input image', size=20) plt.subplot(122), plt.imshow(im1), plt.axis('off'), plt.title('Output image', size=20) plt.show() # (480, 720, 3) The following images show the input and output image generated:
Note, the y0, and R parameters are passed to the swirl function. Change the value of these parameters and observe the impact on the output image.
Implementing swirl transform using scipy.ndimage
In this sub-section, we shall demonstrate how we can use the geometric_transform() function from scipy.ndimage to implement the same non-linear transformation swirl as demonstrated in the last subsection.
Follow the following steps to implement a swirl transform: Let us start by importing the required libraries as follows:
from scipy import ndimage as ndi from skimage.io import imread from skimage.color import rgb2gray import matplotlib.pylab as plt, numpy as np
Define the following function that implements the swirl transform:
def apply_swirl(xy, x0, y0, R): r = np.sqrt((xy[1]-x0)**2 + (xy[0]-y0)**2) a = np.pi*r / R return ((xy[1]-x0)*np.cos(a) + (xy[0]-y0)*np.sin(a) + x0, -(xy[1]x0)*np.sin(a) + (xy[0]-y0)*np.cos(a) + y0) Read the Lena image and convert it to grayscale
Invoke the geometric_transform() function from scipy.ndimage and pass the apply_swirl() (that will do the actual transformation) as defined earlier as an argument to the function.
The function apply_swirl() accepts the parameters, namely, and R and passes the values of these parameters from the geometric_transform() function as
Finally, plot the original and the transformed image by using the following code snippet:
im = rgb2gray(imread('images/Img_02_06.jpg')) print(im.shape) im1 = ndi.geometric_transform(im, apply_swirl, extra_arguments= (100, 100, 250)) plt.figure(figsize=(20,10)) plt.gray() plt.subplot(121), plt.imshow(im), plt.axis('off'), plt.title('Input image', size=20) plt.subplot(122), plt.imshow(im1), plt.axis('off'), plt.title ('Output image', size=20) plt.show() (220, 220) The following image shows the input and output images generated:
Implementing elastic deformation
As we have seen, image distortions can be generated by applying displacement fields to images. This is done by computing, for every pixel, a new target location to the original location. The new target location at position (x, y) is given concerning the previous position. For instance, if Δx(x, y) = and Δy(x, y) = this means that the new location of every pixel is shifted by 1 to the right.
Elastic image deformations are created using the following steps: First, generate random displacement fields, that is Δy(x, y) = rand(−1, +1) and Δx(x, y) = rand(−1, where rand(−1, +1) is a random number between -1 and generated with a uniform distribution.
The fields Δx and Δy are then convolved with a Gaussian of standard deviation σ (in pixels). If σ is large, the resulting values are very small because the random values average 0. If we normalize the displacement field (to a norm of 1), the field is then close to a constant with a random direction. If σ is small, the field looks like a completely random field after normalization (as depicted in Figure 2, top right).
For intermediate σ values, the displacement fields look like elastic deformation, where σ is the elasticity coefficient. The displacement fields are then multiplied by a scaling factor α that controls the intensity of the deformation.
Let us start the implementation of elastic deformation function by importing the required libraries first as usual by using the following code snippet: import numpy as np import matplotlib.pylab as plt from skimage.color import rgb2gray from scipy.ndimage import gaussian_filter, map_coordinates Define the following function implement the elastic deformation
The function map_coordinates() maps the input array to new coordinates by interpolation. The array of coordinates is used to find, for each point in the output, the corresponding coordinates in the input. The value of the input at those coordinates is determined by the spline interpolation of the requested order. def elastic_transform(image, alpha, sigma):
random_state = np.random.RandomState(None) h, w = image.shape dx = gaussian_filter((random_state.rand(*image.shape) * 2 - 1), sigma, mode="constant", cval=0) * alpha dy = gaussian_filter((random_state.rand(*image.shape) * 2 - 1), sigma, mode="constant", cval=0) * alpha x, y = np.meshgrid(np.arange(w), np.arange(h))
indices = np.reshape(y+dy, (-1, 1)), np.reshape(x+dx, (-1, 1)) distored_image = map_coordinates(image, indices, order=1, mode='reflect') return distored_image.reshape(image.shape) Use Matplotlib’s imread() function to read a digit image (9) and convert it to grayscale with skimage.color's rgb2gray() function. Call the elastic_transform() function to apply the elastic deformation to the image Plot the original input and the distorted output image as follows:
img = rgb2gray(plt.imread('images/Img_02_22.png')) img1 = elastic_transform(img, 100, 4) plt.figure(figsize=(20,10)) plt.subplot(121), plt.imshow(img), plt.axis('off'), plt.title('Original', size=20) plt.subplot(122), plt.imshow(img1), plt.axis('off'), plt.title('Deformed', size=20) plt.tight_layout() plt.show()
The following image shows the original and deformed images:
Image projection with homography using scikit-image
The goal of the perspective (projective) transform is to estimate homography (a matrix from point correspondences between two images. As the matrix has eight depth of field (DOF), you need at least four pairs of points to compute the homography matrix from two images as follows:
The matrix H can be computed using estimate() method from the ProjectiveTransform class. The following are the steps that are required to be followed to solve the problem:
To start with, load all the required libraries using the following code snippet:
from skimage.transform import ProjectiveTransform from skimage.io import imread
import numpy as np import matplotlib.pylab as plt from matplotlib.path import Path
Read the source and destination images:
im_src = imread('images/Img_02_04.jpg') im_dst = imread('images/Img_02_03.jpg') print(im_src.shape, im_dst.shape) # (379, 262, 3) (321, 450, 3)
Create an instance of the ProjectiveTransform class, specify the four corners from the source image and corresponding points (the corners of the canvas) from the destination image. Estimate the homography matrix H with the source and destination points by using the estimate() method. How the method works is explained as follows (reference: scikit-image documentation):
skimage.transform.ProjectiveTransform.estimae(src, dst) Estimate the transformation from a set of corresponding points, which is computed using the total least-squares method. The number of source and destination coordinates must match. In case of a total least-squares homogeneous system of equations is formed and the solution of this system is the right singular vector of A which corresponds to the smallest singular value normed.
normed. normed. normed.
normed. normed.
pt = ProjectiveTransform()
width, height = im_src.shape[0], im_src.shape[1] src = np.array([[0., 0.], [height-1, 0.], [height-1, width-1], [0., width-1]]) dst = np.array([[74., 41.], [272., 96.], [272., 192.], [72., 228.]]) pt.estimate(src, dst) # True
Create an instance of matplotlib.path module’s Path class with the destination points provided, to obtain the quadrilateral enclosed by the points. Then, use the method contains_point() to find all the pixel locations (in the destination image) that are inside the canvas (by creating a Boolean mask) as follows: width, height = im_dst.shape[0], im_dst.shape[1] polygon = dst
poly_path = Path(polygon) x, y = np.mgrid[:height, :width] coors = np.hstack((x.reshape(-1, 1), y.reshape(-1,1))) mask = poly_path.contains_points(coors) mask = mask.reshape(height, width) dst_indices = np.array([list(x) for x in list(zip(*np.where(mask > 0)))]) #print(dst_indices)
Use the inverse() method to obtain the pixel locations in the source image corresponding to the destination pixel locations (indices) in the canvas and copy the corresponding pixel values from the source image. The method is described as follows (in the documentation):
skimage.transform.ProjectiveTransform.inverse(coords) Apply inverse transformation.
transformation. transformation. transformation. transformation. transformation. transformation.
src_indices = np.round(pt.inverse(dst_indices), 0).astype(int)
src_indices[:,0], src_indices[:,1] = src_indices[:,1], src_indices [:,0].copy() im_out = np.copy(im_dst) im_out[dst_indices[:,1], dst_indices[:,0]] = im_src[src_indices [:,0], src_indices[:,1]] Finally, plot the source, destination, and the output images obtained using the following code; you will get a figure like the following one:
plt.figure(figsize=(30,10)) plt.subplot(131), plt.imshow(im_src, cmap='gray'), plt.axis('off'), plt.title('Source image', size=30) plt.subplot(132), plt.imshow(im_dst, cmap='gray'), plt.axis('off'), plt.title('Destination image', size=30) plt.subplot(133), plt.imshow(im_out, cmap='gray'), plt.axis('off'), plt.title('Output image', size=30) plt.tight_layout() plt.show() # (379, 261, 3) (321, 450, 3)
Detecting colors and changing colors of objects with OpenCVPython A simple way to detect and change the color of an object in an image to transform the image from RGB to HSV color space and then to use range of hues to detect the object, this can be easily done using OpenCV-Python. We need to specify a range of color values using which the object we are interested in will be identified and extracted. We can change the color of the object detected and even make the object detected transparent. For this problem, the input image we shall use will be a brown horse in a field, and the object of interest will be the horse. We shall detect the brown horse, change it to a black horse, by keeping everything else in the image as they are, by working in HSV space; here, are the the following steps:
Load the required libraries and read the input image by using the following code:
import import import img =
cv2 numpy as np matplotlib.pylab as plt cv2.imread("images/Img_02_05.png")
Convert the input image from BGR to HSV color space as follows:
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
Create a mask for the horse by selecting a possible range of HSV colors that the brown horse can have.
Extract the brown horse from the input image, using the following code snippet:
mask = cv2.inRange(hsv, (0, 70, 25), (15, 255, 255)) imask = mask>0
brown = np.zeros_like(img) brown[imask] = img[imask]
Change the color of the brown horse to black by reducing all the HSV channel values and then converting the image back to BGR space. The function to be used for extracting the pixels corresponding to the colored object (by checking whether a pixel value is within a range of values specified by a lower bound and an upper bound) is described as follows (from OpenCV documentation): cv2.inRange(src, lowerb, upperb) Checks if array elements lie between the elements of two other arrays
arrays arrays arrays arrays arrays arrays arrays arrays
arrays arrays arrays arrays
black = img.copy() hsv[…,0:3] = hsv[…,0:3] / 3 black[imask] = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)[imask] black = np.clip(black, 0, 255) Finally, plot the input image of the extracted brown horse and the output image of the color-changed horse : plt.figure(figsize=(20,10)) plt.subplots_adjust(0,0,1,0.9,0.01,0.075) plt.subplot(131), plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)), plt.axis('off'), plt.title('original', size=20) plt.subplot(132), plt.imshow(cv2.cvtColor(brown, cv2.COLOR_BGR2RGB)), plt.axis('off'), plt.title('only horse', size=20) plt.subplot(133), plt.imshow(cv2.cvtColor(black, cv2.COLOR_BGR2RGB)), plt.axis('off'), plt.title('horse color changed', size=20) plt.suptitle('Detecting and changing object colors with opencvpython', size=25) plt.show()
Detecting Covid-19 virus objects with colors in the HSV colorspace
Here, we shall see another example on how to detect objects using colors in the HSV colorspace using OpenCV-Python. You need to specify a range of color values by means of which the object you are interested in will be identified and extracted. We shall use an image of Covid-19 virus inside blood cells and identify the virus objects with their green color as follows:
First, read the image with OpenCV-Python’s imread() function and convert the BGR image to an RGB image (so that we can plot it with imshow() function properly).
Next, let us convert the image from the RGB to HSV colorspace using the cv2.cvtColor() function as follows:
img = cv2.cvtColor(cv2.imread('covid_19_blood.jpg'), cv2.COLOR_BGR2RGB) img_hsv=cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
Create a mask for the virus object by selecting a possible range of HSV colors that the virus object in the image can have by using the cv2.inRange() function. The following figure shows an HSV colormap for fast color lookup. The x-axis denotes hue, with values in (0,180), the y-axis
(1) denotes saturation with values in (0,255), and the y-axis (2) corresponds to the hue values corresponding to S = 255 and V =
To locate a particular color in the colormap, just look up the corresponding H and S range and then set the range of V as (25, For example, the green color of the Covid-19 virus, we are interested in, can be searched in the HSV range from (30, 23, 10) to (80, 255, 255), as observed here.
The inRange() function from OpenCV-Python will be used for color detection. It accepts the HSV input image along with the color range (defined previously) as parameters.
It returns a binary mask, where white pixels represent the pixels within the range and the black pixels represent the one outside the range specified as follows:
low_green = np.array([30, 25, 10]) high_green = np.array([80, 255, 255]) green_mask = cv2.inRange(img_hsv, low_green, high_green)
green = cv2.bitwise_and(img, img, mask=green_mask)
Slice the green virus objects using the binary mask by creating an output image and set it to zero everywhere except the mask region as follows: output_img = img.copy() output_img[np.where(green_mask==0)] = (0,0,0)
Finally, plot the input image, the green mask image greeted and the output image with the Covid-19 virus objects detected. You will get a figure like the output if you run the following code: plt.figure(figsize=(20, 8)) plt.gray() plt.subplots_adjust(0,0,1,0.975,0.05,0.05) plt.subplot(131), plt.imshow(img), plt.axis('off'), plt.title ('original', size=20) plt.subplot(132), plt.imshow(green_mask), plt.axis('off'), plt.title('mask', size=20) plt.subplot(133), plt.imshow(output_img), plt.axis('off'), plt.title('covi19 virus cells', size=20) plt.suptitle('Filtering out the covid-19 virus cells', size=30) plt.show()
Notice, from the preceding figure, that a few virus objects are occluded by blood cells, so they are only visible partially in the output image.
Finding duplicate and similar images with hashing
In this section, we shall discuss about two related problems in image searching, namely, using hash-function based approaches to solve the problems and the hash functions to be used will be quite different in nature though.
The first problem tries to find duplicate images in a given collection, where you will learn how to use the MD5 cryptographic hash function to solve it. The second problem aims at finding images similar with a given image; here, you will learn how to use the perceptual hash function to solve the problem by using cryptographic (MD5) hash functions to find duplicate images with
A hash function h maps arbitrary strings of data to fixed length output. The function is deterministic and public, but the mapping should look “random.” In other words, h : {0, → {0, for a fixed In practice, hash functions are used for digesting large data. It is desirable that the function is collision-resistant. That is, it should be to find two inputs m1 and m2 for a good hash function such that h(m1) =
The MD5 (message digest 5) is a widely used hash function producing a 128-bit hash value, that is., d = 128, for MD5
message-digest algorithm. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit or of the input. Although MD5 was initially designed to be used as a cryptographic hash function, it has been found to suffer from extensive vulnerabilities. However, the algorithm is still used to verify the authenticity of a file and unintentional corruption. The benefits of MD5 over other cryptographic hash functions is that it can be implemented faster than the others and can provide an impressive performance increase for verifying. In this problem, you will learn how to use the hashlib implementation of the algorithm to detect duplicate images as follows: We shall use the image content as the key to the MD5 hash function. Then compute 128-bit hex digests (hash-values) of all the images (of any size) in our image collection. If any two images in our collection are identical to each other, same hex digest will be generated for them. By comparing the digests, we shall come to know whether they are duplicates, in which case the digests will be exactly same. However, MD5 is vulnerable to collision resistance. It means it is possible that two images are not duplicate, but there is a chance that it can still generate the same digest (a false positive), but it is extremely rare.
For example, our MD5 sig being 128 bits long, the probability that the MD5 digests for two different images are equal by chance will
be so, we shall be likely to see around 50% collisions, when we have at least 264 images. In other words, if you have 9 trillion MD5s, there is only one chance in 9 trillion that there will be a collision by chance. So, in all practical purposes it can be used to find image duplicates.
Follow the following steps to find duplicates of an image with Let us start by importing the required libraries using the following code: import hashlib, os from glob import glob import matplotlib.pylab as plt from skimage.io import imread from time import time
This module implements the interface to RSA’s MD5 message digest algorithm Its use is quite straightforward as follows: create an MD5 object with image content (read as binary file) as key and ask it for the digest (a strong kind of 128-bit checksum, also known as, using the hexdigest() method. Given a key with any arbitrary length (that is, for image of any size), the MD5 hash function will always return a fixed-length (128 bit) digest.
Let us compute a hex digest for an image, print the digest in hexadecimal/binary form, and compute the length of the digest in bits by using the following code snippet: hex_digest = hashlib.md5(open('images/Img_02_01.jpg', 'rb').read()).hexdigest() bin_digest = format(int(str(hex_digest), 16), "040b") print('MD5 digest = {} ({})'.format(hex_digest, bin_digest)) print('length of hex digest = {} bytes'.format(len(hex_digest)))
print('length of bin digest = {} bits'.format(len(bin_digest))) MD5 digest = 8335f826c77f68640f21bbf9ac784bad (1000001100110101111110000010011011000 1110111111101101000011001000000111100100001101110111111100110101 100011110000100101110101101) length of hex digest = 32 bytes length of bin digest = 128 bits As you can see from the preceding output, the length of the digest was 128 bits.
Now, let us implement the following function which takes as an input a directory name, finds all images (with .JPG and PNG extension) in the directory, and returns the list of duplicate images found (as a list of lists, each list containing two or more images that are identical). As described earlier, it computes the hex digest for all images.
It uses a python dictionary and inserts an item to it, with the key as the hex digest and the value as a list of file names that have the given hex digest. In case, one item with the same hex digest already exists, it immediately implies that it is a duplicate image, and the file name is appended to the corresponding list pointed to bey the hex digest key. Finally, the dictionary value-lists having more than one file name are returned as duplicate images as follows: def find_duplicates(dir_name): def is_image(file_name): f = file_name.lower() return f.endswith(".png") or f.endswith(".jpg") hash_keys = dict() for file_name in glob(dir_name): if os.path.isfile(file_name) and is_image(file_name): with open(file_name, 'rb') as f: file_hash = hashlib.md5(f.read()).hexdigest() if file_hash not in hash_keys: hash_keys[file_hash] = [file_name] else: hash_keys[file_hash].append(file_name) return [hash_keys[file_hash] for file_hash in hash_keys if len(hash_keys[file_hash]) > 1]
Let us define the following function to show all the duplicate images, by iterating on the duplicate images lists, printing the duplicate images on the same row, for each such image, and computing the number of times it is duplicated as follows: def show_duplicates(duplicates): for duplicated in duplicates: try: plt.figure(figsize=(20,10)) plt.subplots_adjust(0,0,1,0.9,0.05,0.05) for (i, file_name) in enumerate(duplicated): plt.subplot(1, len(duplicated), i+1) plt.imshow(imread(file_name)) plt.title(file_name, size=20) plt.axis('off') plt.suptitle('{} duplicate images found with MD5 hash'.format(len(duplicated)), size=30) plt.show() except OSError as e: continue Finally, call the preceding functions to find the duplicate images and show them by using the following lines of code: The next figure shows the output of the code, when executed:
duplicates = find_duplicates('images/*.*') print(duplicates) show_duplicates(duplicates)
[['images\\Img_02_11.jpg', 'images\\Img_02_15.jpg', 'images\\Img_ 02_29.jpg'], ['images\\Img_02_13.jpg', 'images\\Img_02_30.jpg']]
Using Perceptual Hash function (pHash) to find similar images using imagehash In this problem, you will learn how to find images similar to a given image from a collection of images by using image hashing. However, we shall use a different hash function this time known as the perceptual hash function, as cryptographic hash functions such as MD5 is not suitable for this purpose.
A key feature of conventional cryptographic hashing algorithms such as (MD5) is that they are extremely sensitive to the that is, changing even one bit of the input will change the output dramatically. With cryptographic hashes, the hash values are random (can be thought of as pseudo-random generators)—the data used to generate the hash value acts like a random seed, so, the same data will generate the same digest, but different data will create a different digest.
However, the images go through various manipulations such as compression, enhancement, cropping, and scaling. Hence, to find images similar to an image, an image hash function should instead take into account the changes in the visual domain and produce hash values based on the image’s visual appearance. Such a hash function will be useful in identifying images similar to a given image from a collection, as all the images possibly undergo incidental changes (such as compression and format changes, common signal processing operations, scanning, or watermarking).
Perceptual hash algorithms describe a class of comparable hash functions. Features in the image are used to generate a distinct (but not unique) fingerprint, and these fingerprints are comparable. In other words, perceptual hash functions are analogous if features are similar (whereas cryptographic hashing relies on the avalanche effect of a small change in input value creating a drastic change in the output value).
The pHash uses a robust algorithm. It uses a discrete cosine transform to reduce the frequencies (We shall discuss more on DCT in Chapter 4, Discrete Cosine/Wavelet Transform and The following are the steps to compute a pHash: Reduce To simplify the DCT computation.
Reduce Covert an image to a grayscale. Compute the Although JPEG uses an 8x8 DCT, this algorithm uses a 32x32 DCT. Reduce the While the DCT is 32x32, just keep the top-left 8x8 and compute the mean DCT value (excluding the first flat DC coefficient). Further reduce the Set the 64-hash bits to 0 or 1 depending on whether each of the 64 DCT values is above or below the average value (to survive gamma and color histogram adjustments).
Construct the Set the 64 bits into a 64-bit integer
pHash values can be compared using the Hamming distance algorithm. (Just compare each bit position and count the number of differences) In this problem, we shall use imagehash library’s implementation of the pHash algorithm and compute similar images. Follow the following steps to implement it:
Let us start by importing the required libraries by using the following code. Few of the images we shall use for demonstration are taken from the Caltech 101 dataset (download it from the following link: http://www.vision.caltech.edu/Image_Datasets/Caltech101/#Download Caltech101/#Download) # install image hash with pip, if not already installed #!pip install imagehash from PIL import Image import imagehash from time import time import os from glob import glob import matplotlib.pylab as plt Let us implement the following function with the following code snippet:
It accepts two input images and a hash function (defaults to pHash) Uses implementation of the hash function to compute the 64-bit fingerprints for each of the input images
Plots the images side-by-side with their pHash values as titles Computes the hamming distance of between the fingerprints of the images def plot_images_to_compare(imfile1, imfile2, hashfunc = imagehash.phash): img1, img2 = Image.open(imfile1), Image.open(imfile2) print('sizes of images = {}, {}'.format(img1.size, img2.size)) hash1 = hashfunc(img1) hash2 = hashfunc(img2) plt.figure(figsize=(20,10)) plt.subplots_adjust(0,0,1,0.95,0.01,0.01) plt.subplot(121), plt.imshow(img1), plt.title(str(hash1), size=20), plt.axis('off') plt.subplot(122), plt.imshow(img2), plt.title(str(hash2), size=20), plt.axis('off') plt.show() print('hash1 = {} ({}), length = {} bits'.format(format (int(str(hash1), 16), "040b"), str(hash1), \ len(format(int(str(hash1), 16), "040b"))))
print('hash2 = {} ({}), length = {} bits'.format(format (int(str(hash2), 16), "040b"), str(hash2), \ len(format(int(str(hash2), 16), "040b")))) print('hamming distance =', hash1 - hash2)
Call the preceding function to compare two different pigeon images, the second image is being created by adding a few strokes on the first one as follows: plot_images_to_compare('images/Img_02_31.jpg', 'images/Img_02_32.jpg') sizes of images = (300, 258), (300, 258)
hash1 = 10011011010010000110011101100110100101001001100110110011011000 11 (9b4867669499b363), length = 64 bits hash2 = 10011011010010000110011101100110100101001001100110110011011000 11 (9b4867669499b363), length = 64 bits hamming distance = 0
As expected, the images are very similar, the exact same pHash fingerprint is returned for both, with hamming distance of 0. Now, let us compute the pHash values of the original pigeon input image with its contrast-enhanced version, with the following line of code: plot_images_to_compare('images/Img_02_31.jpg', 'images/Img_02_43.png') sizes of images = (300, 258), (300, 258)
hash1 = 10011011010010000110011101100110100101001001100110110011011000 11 (9b48676694 99b363), length = 64 bits hash2 = 10011011010010000110011101100110100101001001100111010011011000 11 (9b4867669499d363), length = 64 bits hamming distance = 2
Again, as can be seen earlier, the images being very similar, the hamming distance is 2 (the pHash fingerprints differ in 2 bits only). Now, let us compare the pHash fingerprint of a nature image with that of its watermarked version (note that the images are of different sizes) as follows: plot_images_to_compare('images/similar/Img_02_41.jpg', 'images/similar/Img_02_41.png') sizes of images = (1024, 683), (574, 383)
hash1 = 10010101100000010110101111010010100100001111001001110010011111 10 (95816bd290f2727e) hash2 = 10010101100000010110101011010010100100101111001001110010011111 10 (95816ad292f2727e) hamming distance = 2
Again, as you can see from the preceding output, the signatures differ in 2 bits only. Finally, let us compare the pHash fingerprints for two completely different images using the following line of code:
plot_images_to_compare('images/Img_02_31.jpg', 'images/Img_02_35.jpg') sizes of images = (300, 258), (399, 174)
hash1 = 10011011010010000110011101100110100101001001100110110011011000 11 (9b4867669499b363) hash2 = 11111111100111100011000000110100100110100010001100100101100001 11 (ff9e30349a232587) hamming distance = 32
As can be seen earlier, the fingerprints are very different (as expected); they differ in half of the fingerprint bits’ values.
Now, let us implement the following function to preprocess the images from a directory and generate their pHash fingerprints. As earlier, a dictionary is created, with each key corresponding to a unique fingerprint (hash-value) and the corresponding value being the list of images (later converted to NumPy array of images) having the same fingerprint value. Return the dictionary of fingerprints as follows:
def preprocess_images(dir_name, hashfunc = imagehash.phash): image_filenames = sorted(glob(dir_name)) print('number of images to process = {}'.format(len(image_filenames))) images = {} for img_file in sorted(image_filenames): hash = hashfunc(Image.open(img_file)) images[hash] = images.get(hash, []) + [img_file] for hash in images:
images[hash] = np.array(images[hash]) return images Finally, let us implement the following function: query_k_similar_images()
It accepts a query image filename, the dictionary of the imagefingerprints to search from, the number of similar images with the query image to find, and the hash function to use (pHash by default).
It first computes the fingerprint for the query image. It then computes the hamming distance of the query fingerprint with all other fingerprints in our collection.
It finally sorts the images in collection with increasing hamming distance (the less hamming distance between fingerprints, the more similar the images are) and returns the k most similar images.
def query_k_similar_images(image_file, images, k=3, hashfunc = imagehash.phash): hash = hashfunc(Image.open(image_file)) hamming_dists = np.zeros(len(images)) image_files = np.array(list(images.values())) hash_values = list(images.keys()) for i in range(len(image_files)): hamming_dists[i] = hash - hash_values[i] indices = np.argsort(hamming_dists) return np.hstack(image_files[indices][:k]), hamming_dists [indices][:k]
Preprocess all the input images we have for this chapter (there are 42 of them) and measure the time taken for preprocessing the images as follows:
start = time() images = preprocess_images('images/*.*') end = time() print('processing time = {} seconds'.format(end-start)) number of images to process = 42 processing time = 0.3982961177825928 seconds Now, define the following function to plot the query image along with the similar images returned by our function as follows
def plot_query_returned_images(query, returned): n = 1 + len(returned) plt.figure(figsize=(20,8)) plt.subplots_adjust(0,0,1,0.95,0.05,0.05) plt.subplot(1,n,1), plt.imshow(Image.open(query)), plt.title ('query image', size=20), plt.axis('off') for i in range(len(returned)): plt.subplot(1,n,i+2), plt.imshow(Image.open(returned[i])), plt.title('returned image {}'.format(i+1), size=20) plt.axis('off') plt.show()
Use an airplane image (from Caltech 101) as the query image.
Find top four similar images with the query image, using the following lines of code and check the hamming distances between the fingerprints of the images returned:
query = 'images/Img_02_39.jpg' found, dists = query_k_similar_images(query, images, k=4) dists array([0., 10., 24., 24.]) Plot the query image along with the returned images as follows: plot_query_returned_images(query, found) The following figure shows the images found:
Summary
In this chapter, we covered different geometric image transformation (using warping /homography) and image hashing techniques. By now, you should be able understand and be able to apply different types of geometric transforms (namely, Euclidean, affine, and non-linear transformations) to an image, using different python libraries such as numpy, and There are a few more Python image processing libraries (for example, and try to implement image warping using the functions from these libraries. We also discussed about image hashing techniques to find duplicated and similar images by using the libraries, namely, hashlib and respectively. In the next chapter, we shall discuss about a few classical image processing techniques (from the signal processing counterpart of image processing), such as sampling, convolution, and discrete Fourier transform and solve problems based on their applications.
Questions
Use the nbdimage module from the library SciPy with the function affine_transform() and provide the 3 x 3 transformation matrix (in homogeneous coordinates) and the offset to rotate an image by an angle 45o clockwise (with respect to the center of the image) as shown in the following figure:
Hint: The transformation can be thought of as a composite one consisting of three sequential transformations as shown in the following figure:
Use geometric_transform() function to apply the following wave transform to the input grayscale Lena image:
The output should look like the following images:
Use scipy.ndimage.affine_transform() function to reflect an image horizontally (flop) and vertically (flip), with the appropriate transformation matrix. Repeat the same with OpenCV-Python’s warpAffine() function.
Estimating 2D geometric transformation parameters, estimate the parameters for the affine transformation as shown in the following image (you will find both the images in the images folder for this chapter). Remember that the affine transformation has six DoF, so you will need at least three matching points from the source and the destination images to estimate the parameters (you will have 3x2=6 equations to solve for six unknown and one equation for each of x and y coordinates for each point). The following figure shows three pairs of possible points:
With scikit-image.transform module estimate the transform with the source and destination points. Then, use the inverse() method along with warp() to apply the inverse of the transformation on the destination image. Did you get back the original source image? Extend the elastic deformation function to a colored image and obtain a deformed Lena image like the following one:
Use homography (ProjectiveTransform) to replace the graffiti canvas in the following image:
And with the following painting:
(Hint: use a mask image like the following one)
You should get an output image like the following one:
Use different Hash functions (for example, average_hash and to find similar images with the imagehash library and compare the quality of the top three images returned. Can we use these functions for fingerprint image recognition?
Implement an image search engine with the Caltech 101 images as your image database. Use phash fingerprints to return similar images once the search is executed with a query image. How can you increase the speed of the search (think of the data structures you can Compute the accuracy of your search engine using Precision@10 and Recall@10 metrics.
Key terms
Homography, Affine transformation, Flip, Swirl, Image Hashing, phash, MD5, Object Detection with Colors
References
http://www.vision.caltech.edu/Image_Datasets/Caltech101/ http://cognitivemedium.com/assets/rmnist/Simard.pdf
https://docs.scipy.org/doc/scipy/reference/tutorial/ndimage.html
http://www.cs.cornell.edu/courses/cs1114/2013sp/lectures/CS1114lec14.pdf
https://www.youtube.com/watch?v=lF0aOM3WJ74/
http://drone.sjtu.edu.cn/dpzou/teaching/course/lecture04projective_geometry.pdf
https://stackoverflow.com/questions/201705/how-many-randomelements-before-md5-produces-collisions
https://tools.ietf.org/html/rfc1321
https://ocw.mit.edu/courses/electrical-engineering-and-computerscience/6-046j-design-and-analysis-of-algorithmsspring-2015/lecturenotes/MIT6_046JS15_lec21.pdf
http://users.ece.utexas.edu/~bevans/projects/hashing/introduction.htm l
http://www.phash.org/
CHAPTER 3 Sampling, Convolution, Discrete Fourier, Cosine and Wavelet Transform
Introduction
Sampling is a spatial operation for selection/rejection of image pixels typically used to increase/reduce the size of an image. On the contrary, convolution is a local mathematical operation that is implemented by multiplying a pixel’s and its neighboring pixels intensity value by a kernel (usually, a small support window/matrix); the convolution of an image with different kernel results in different effects in the output image (for example, blurring, sharpening, extraction of the edges, and so on.). The basic idea behind the Discrete Fourier Transform is that an image can be thought of as a 2D function that can be expressed as a weighted sum of sines and cosines (Fourier basis/coefficients) along two dimensions. The DFT is an extremely useful algorithm that is used to transform an image from spatial to the frequency domain, as we shall see operations such as convolution can be executed much faster in the frequency domain.
In this chapter, we shall work on solving image processing problems based on the sampling, convolution, and DFT theorems, not necessarily in the same order. These operations will be implemented using functions from popular python libraries such as and
Structure
This chapter is organized as follows: Objectives
Problems
Fourier transform basics Sampling to increase/decrease the resolution of an image
Up-sampling an image by using the DFT and a low pass filter
Down-sampling with anti-aliasing using the Gaussian filter
Denoising an image with an LPF/notch filter in the frequency domain
Removing periodic noise with the notch filter Removing salt and pepper noise using the Gaussian LPF with scipy fftpack
Blurring an image with an LPF in the frequency domain
Different blur kernels and convolution in the frequency domain
Blurring with scipy.ndimage frequency-domain filters
Gaussian blur lowpass filter with scipy.fftpack
Convolution in the frequency domain with a colored image using fftconvolve from SciPy signal
Edge detection with high pass filters in the frequency domain
Implementation of homomorphic filters Summary
Questions
Key terms References
Objectives
After studying this chapter, you should be able to: Apply up/down sampling to an image to increase/reduce the image size without artifacts.
Apply spatial and frequency domain convolution to an image with different kernels. Apply different types of blur kernels to an image.
Detect edges in image with a frequency domain HPF.
Apply a frequency domain filter to an image (such as Gaussian/Butterworth LPF/HPF filters).
Remove noise from an image with a notch/band-stop filter.
Problems
Fourier Transform Basics
This is a warmup section where we shall try to understand the basics of the DFT and how to apply it on images using Numpy’s fft module (alternatively you could use SciPy’s fftpack module too). Let us first start with the mathematical definition of the 2D-DFT and its inverse, as shown in the following figure:
As described earlier too, an (grayscale) image can be thought of a 2D function f(x,y) where (x,y) ∈ {0, . . ., M-1) × {0, . . ., As can be seen from the previous definitions, the DFT changes the representation of the image from its spatial f(x,y) to its frequency domain representation F(u,v), where (u,v) ∈ {0, . . ., M-1) × {0, . . ., N-1} represents the frequency components/Fourier basis vectors.
Now, let us proceed toward the implementation as follows:
First, import all the required libraries by using the following code snippet:
# comment the next line only if you are not running this code from jupyter notebook %matplotlib inline import numpy as np import numpy.fft as fp from scipy import signal import scipy.fftpack from skimage.io import imread from skimage.color import rgb2gray from skimage.metrics import peak_signal_noise_ratio from scipy.ndimage import convolve import cv2 import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import from mpl_toolkits.axes_grid1 import make_axes_locatable from matplotlib.ticker import LinearLocator, FormatStrFormatter
Define the following function to plot an image by using the matplotlib.pylab's imshow() function with a suitable title: def plot_image(im, title): plt.imshow(im, cmap='gray') plt.axis('off') plt.title(title, size=20)
Next, define the following function to plot the frequency (power) spectrum of an image. The function optionally shows the color bar (values of the color mapping of the power spectrum) and axis ticks.
Notice that you should visualize the logarithm of the power spectrum instead, as the value of the DC component that corresponds to the average intensity of an image (and other low frequency components) and the coefficient values of very low frequency components are usually way too high than those of comparatively higher frequency components for natural images.
You must apply the numpy.fft.fftshift() function to shift the spectrum to place the (0,0) coefficient at the centre of the power spectrum and then visualize the spectrum; the following code shows the relevant part from the function’s signature from NumPy’s documentation: numpy.fft.fftshift(x, axes=None)
Shift the zero-frequency component to the center of the spectrum. This function swaps half-spaces for all axes listed (defaults to all).
all). all). all). all). all). all). all). all). all). all). all). all). all). all). all). all). all). all). all).
Also, the result returned from the Fourier transform is a complex array; the complex part corresponds to the phase, and the real part corresponds to the magnitude that we are interested in showing in the following code: def plot_freq_spectrum(F, title, cmap=plt.cm.gray, show_axis=True, colorbar=False): plt.imshow((20*np.log10(0.1 + fp.fftshift(F))).real.astype(int), cmap=cmap) if not show_axis: plt.axis('off') if colorbar: plt.colorbar() plt.title(title, size=20) Now, let us create a few very basic images to start with and obtain their power spectrum with the DFT. First, generate a couple
of periodic images (of size 100x100) with (equispaced) horizontal and vertical bars, respectively, using the following code block: h, w = 100, 100 images = list() im = np.zeros((h,w)) for x in range(h): im[x,:] = np.sin(x) images.append(im) im = np.zeros((h,w)) for y in range(w): im[:,y] = np.sin(y) images.append(im)
Next, let us generate another periodic image, this time with diagonal lines. Use a circular mask of radius 10 (centered at the centre of the image) to create another image from this image, which masks out all the pixel values outside the radius, using the following piece of code: im = np.zeros((h,w)) for x in range(h): for y in range(w): im[x,y] = np.sin(x + y) images.append(im) im = np.zeros((h,w)) for x in range(h): for y in range(w): if (x-h/2)**2 + (y-w/2)**2 < 100: im[x,y] = np.sin(x + y)
images.append(im) Finally, generate another couple of images; the first one with a filled circle (of radius and the second one with a filled square of side centered at the image center, with the next code snippet: im = np.zeros((h,w)) for x in range(h): for y in range(w): if (x-h/2)**2 + (y-w/2)**2 < 25: im[x,y] = 1 images.append(im) im = np.zeros((h,w)) im[h//2 -5:h//2 + 5, w//2 -5:w//2 + 5] = 1 images.append(im) As you may have noticed, all the images are stored inside a list as follows:
Now, use the numpy.fft's fft2() function to compute the DFT using the Fast Fourier transform algorithm and plot the output power spectrums obtained for each of the input images generated using the previous code block; the relevant part from the NumPy documentation for the function is shown in the following table:
numpy.fft.fft2(a, s=None, axes=(-2, -1), norm=None) Compute the 2-dimensional discrete Fourier Transform using the Fast Fourier Transform (FFT).
(FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT).
(FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT). (FFT).
plt.figure(figsize=(25,10)) i = 1 for im in images: plt.subplot(2,6,i), plot_image(im, 'image {}'.format(i)) plt.subplot(2,6,i+6), plot_freq_spectrum(fp.fft2(im), 'DFT {}'.format(i), show_ax is=False) i += 1 plt.tight_layout() plt.show()
The following image shows the DFTs for the corresponding images:
Ideally, for Image 1 and 2, the DFT of the horizontal and the vertical periodic patterns should be two vertically and horizontally aligned dots, respectively (corresponding to the frequencies). However, as can be seen in the previous figure, due to edge effects, we obtain a vertical and horizontal line instead (with the bright points on the lines corresponding to the respective frequencies).
Similarly, for Image 3, the periodic diagonal pattern should result in two diagonal dots in the perpendicular diagonal direction, but again due to edge effects, we obtain some more additional lines. The edge effect is reduced if a binary circular mask is used to mask all but a circle around the center for the Image 3, resulting in input Image 4, for which we can see the prominent white dots along the principal diagonal, corresponding to the frequency components. Now, let us try to understand the contributions of different frequency components to an image. As you might already have guessed, if we apply an inverse DFT on the power spectrum (obtained from an image with the DFT), we shall get back the
same image (within numerical accuracy). Following diagram shows the application of Fourier transform on an image:
However, what if we deliberately remove a few frequency components from the spectrum and take the IDFT to obtain the reconstructed image? The following code block shows what happens if we keep the first few frequency components.
We shall use the function numpy.fft.ifft2() to take the inverse Fourier transform; the function has the following signature (taken from the NumPy documentation):
numpy.fft.ifft2(a, s=None, axes=(-2, -1), norm=None)
Compute the 2-dimensional inverse discrete Fourier Transform.
Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform.
Transform. Transform. Transform.
Transform. Transform. Transform.
Transform.
Transform.
Transform.
Transform.
Transform.
Transform. Transform. Transform. Transform.
Transform. Transform. Transform.
Transform. Transform. Transform.
Transform. Transform. Transform.
Transform. Transform. Transform.
Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform. Transform.
Transform. Transform. Transform. Transform.
Transform. Transform. Transform. Transform.
Transform. Transform. Transform. Transform.
Let us first read the Lena image and convert it to grayscale. Then, compute the DFT of the image and center the frequency components by using the following code snippet:
im = rgb2gray(imread("images/Img_03_01.jpg")) h, w = im.shape
F = fp.fft2(im) F_shifted = fp.fftshift(F)
Now, to observe the impact of eliminating frequency components on the output obtained using the IDFT, let us only choose a few frequencies with small (absolute) values around (0,0) and block all other frequencies. In other words, iteratively, only allow (2k1 + 1)× (2k2 + 1) total frequencies in between (−k1, − k2) and (+k1,+ k2) for different values of k1 and that is, to allow frequencies (Fx, Fy) s.t., | Fx |,| Fy|≤ (k1, k2) and reject all other frequencies outside this interval before transforming back to spatial domain with the IDFT. The following code block implements exactly this: xs = list(map(int, np.linspace(1, h//5, 10))) ys = list(map(int, np.linspace(1, w//5, 10))) plt.figure(figsize=(20,8)) plt.gray() for i in range(10): F_mask = np.zeros((h, w)) F_mask[h//2-xs[i]:h//2+xs[i]+1, w//2-ys[i]:w//2+ys[i]+1] = 1 F1 = F_shifted*F_mask im_out = fp.ifft2(fp.ifftshift(F1)).real #np.abs() plt.subplot(2,5,i+1), plt.imshow(im_out), plt.axis('off') plt.title('{}x{},PSNR={}'.format(2*xs[i]+1, 2*ys[i]+1, round(peak_signal_noise_ratio(im, im_out),2)), size=15) plt.suptitle('Fourier reconstruction by keeping first few frequency basis vectors', size=25) plt.show() The following image shows the Fourier
As can be seen in the previous figure, the coefficient values corresponding to the lower frequencies retain the average (coarse) information, and as we unblock higher and higher frequencies, the edges and the finer details in the image are captured and the PSNR increases. From first 89x89 frequencies, we can reconstruct the image without much visible loss of information. If you plot the PSNR of the IDFT-reconstructed image against a different number of chosen frequency components from the image’s power spectrum, you shall get a figure like the following one (left as an exercise to the reader):
Alternatively, you could have used fft2() and ifft2() functions from the scipy.fftpack module; as we shall see later.
Sampling to increase/decrease the resolution of an image
As described in the introduction section, sampling can be used to increase/decrease the resolution of an image by the selection/rejection of image pixels. In this section, we shall work on a couple of problems, the first one to implement is upsampling and the second one to implement is down-sampling.
Up-sampling an image by using the DFT and a low pass filter (LPF) In this problem, we shall see how to upsample and increase the resolution of an image using the DFT. Upsampling is generally done in the spatial domain, by guessing the unknown pixel values with nearest neighbour or bilinear/bi-cubic interpolation. To the contrary, our attempt in this problem will be to implement upsampling using the DFT followed by an application of an LPF in the frequency domain. To get the most of the approach, let us first understand how an LPF can be implemented with convolution in the spatial and frequency domain. As discussed earlier, filtering refers to transforming pixel intensity values to reveal certain image characteristics such as smoothing or
An LPF allows only the low frequencies from the frequency domain representation of the image (obtained with the DFT) and blocks all high frequencies beyond a cut-off value. The LPF can be implemented using convolution with a suitable kernel (for example, box or Gaussian kernel) in the spatial domain, where the kernel window is sided over the (grayscale) image (image is assumed to be at least as large as the kernel in both dimensions). The pixel values in the output image are computed by traversing the kernel window through the input image, as shown in the following figure:
Here, we shall implement the LPF in the frequency domain, as it is fast (by convolution theorem, all we need to undergo is just an elementwise multiplication operation) as follows:
For this problem, we shall use the Lena input image of size (220x220) and obtain an image double in size (440x440).
First, read the image, convert it to grayscale and double the size of the image by padding zero rows/columns at every alternate
position using the following code snippet:
im = 255*rgb2gray(imread('images/Img_03_01.jpg')) im1 = np.zeros((2*im.shape[0], 2*im.shape[1]))
print(im.shape, im1.shape) for i in range(im.shape[0]): for j in range(im.shape[1]): im1[2*i,2*j] = im[i,j] # (220, 220) (440, 440) The LPF kernel that we shall use is the following one. As can be seen, it has the highest weight in the center (allowing the lowest frequency component the most) and the weights gradually decrease along the boundary of the matrix:
kernel = [[0.25, 0.5, 0.25], [0.5, 1, 0.5], [0.25, 0.5, 0.25]]
Now, to implement the filter in the frequency domain, we need to multiply the image with the kernel, and for that, the kernel’s shape needs to be exactly equal to the image’s shape. Use the function pad() from NumPy to pad the kernel with additional zeros with a padding function pad_with_zeros() as def pad_with_zeros(vector, pad_width, iaxis, kwargs): vector[:pad_width[0]] = 0 vector[-pad_width[1]:] = 0 return vector # Now enlarge the kernel to the shape of the image kernel = np.pad(kernel, (((im1.shape[0]-3)//2,(im1.shape[0]-3)//2+ 1), ((im1.shape[1]-3)//2,(im1.shape[1]-3)//2+1)), pad_with_zeros)
Now, compute the power spectrum of the input image and the expanded kernel with the following lines of code. Note that the kernel is already centered; so, we need to apply the inverse of fftshift() before applying the DFT as follows:
freq = fp.fft2(im1) freq_kernel = fp.fft2(fp.ifftshift(kernel)) Now, compute the LPF by an element-wise multiplication in the frequency domain as follows:
freq_LPF = freq*freq_kernel # by the Convolution theorem Finally, use the inverse DFT to obtain the output image. Note that you need to extract the real part from the composite output (ignore the complex artefact) as follows:
im2 = fp.ifft2(freq_LPF).real Let us plot all the inputs, the kernel, and output images along with their power spectrum using the following code block: plt.figure(figsize=(15,10)) plt.gray() # show the filtered result in grayscale cmap = 'nipy_spectral' #'viridis' plt.subplot(231), plot_image(im, 'Original Input Image') plt.subplot(232), plot_image(im1, 'Padded Input Image')
plt.subplot(233), plot_freq_spectrum(freq, 'Original Image Spectrum', cmap=cmap) plt.subplot(234), plot_freq_spectrum(freq_kernel, 'Image Spectrum of the LPF', cmap=cmap) plt.subplot(235), plot_freq_spectrum(fp.fft2(im2), 'Image Spectrum after LPF', cmap=cmap) plt.subplot(236), plot_image(im2.astype(np.uint8), 'Output Image') plt.show()
Down-sampling with anti-aliasing using the Gaussian filter
To decrease the size of an image, we need to down-sample the image. Each pixel in the new-small image corresponds to multiple pixels in the original-large image. Dropping pixels from the original image can help reduce the resolution of the image, but it can introduce spatial aliasing (for example, Moire patterns) and result in the poorly pixelized output image. To prevent this, an antialiasing filter is needed to be used (before down-sampling) to remove frequency components above the Nyquist frequency. In this problem, you will learn a simple anti-aliasing technique that can be implemented by simply applying an LPF before downsampling. The following are the steps:
Import the additional libraries required. Read the input image and convert it to float (so that all the pixel values are in between 0 and Here, we shall use an RGB color image of a brick wall as input. Now, our goal will be to reduce the image size nine times by reducing both height and width by three times as follows:
from skimage.filters import gaussian from skimage import img_as_float #im = rgb2gray(imread('images/Img_03_03.jpg')) im = img_as_float(imread('images/Img_03_08.jpg')) print(im.shape) #(480, 720, 3)
Before down-sampling, apply a Gaussian filter (to smooth the image) for anti-aliasing, as a pre-processing step, using the following line of code. We shall use the skimage.filters.gaussian() function to apply Gaussian blur to the image, but you could use scipy.ndimage.gaussian_filter() function instead, try it on your own.
im_blurred = gaussian(im, sigma=1.25, multichannel=True)
Select every third pixel in the x and the y direction from the original image to compute the values of the pixels in the smaller image. Compare the quality of the output image obtained by down-sampling without a Gaussian filter (with aliasing). The following code block will perform the preceding steps: n = 4 # create and image 16 times smaller in size h, w = i m.shape[0] // n, im.shape[1] // n im_small = np.zeros((h, w, 3)) for i in range(h): for j in range(w): im_small[i,j] = im[n*i, n*j] im_small_aa = np.zeros((h, w, 3)) for i in range(h): for j in range(w): im_small_aa[i,j] = im_blurred[n*i, n*j]
Plot the original input image as follows: plt.figure(figsize=(15,15))
plt.imshow(im), plt.title('Original Image', size=15) plt.show()
Plot the down-sampled images without anti-aliasing as follows: plt.figure(figsize=(15,15)) plt.imshow(im_small), plt.title('Resized Image (without Antialiasing)', size=15) plt.show()
Plot the down-sampled images with anti-aliasing as follows: plt.figure(figsize=(15,15)) plt.imshow(im_small_aa), plt.title('Resized Image (with Antialiasing)', size=15) plt.show()
As we can see in the preceding figures, the down-sampled image without anti-aliasing shows that Moire patterns are not present in the original image, but the one with anti-aliasing the pattern is almost not there. As the Gaussian blur is an LPF, it removes the high frequencies from the original input image; hence, it is possible to achieve sampling rate above the Nyquist rate (by the sampling theorem) to avoid aliasing.
Denoising an image with LPF/Notch filter in the Frequency domain
In this problem, we shall learn how to use a few frequency domain filters (for example, the Gaussian LPF and notch filter) to remove noise from an input image. We shall start by removing periodic noise with the notch filter and then see how LPFs like Gaussian can be used to remove impulse noise from an image.
Removing periodic noise with Notch filter
The notch filter, also known as the band-stop/band-reject filter blocks/rejects a few chosen frequencies from the frequency domain representation of the image (obtained with the DFT), and hence, the name. It is useful for removing periodic noise from images. In this section, you will experience how a notch filter can be used to remove periodic noise from an image as follows:
For this section, we shall use a chest X-Ray image corrupted with sinusoidal noise. First, read the image and convert it to grayscale, as usual, to start with. Next, use the numpy.fft.fft2() function to obtain the power spectrum with the DFT by using the following code block:
im_noisy = rgb2gray(imread("images/Img_03_23.jpg")) F_noisy = fp.fft2((im_noisy)) print(F_noisy.shape) (380, 400)
To find out the frequencies from the power spectrum, we need to search for the unusual bright spots in the power spectrum; hence, we need to have the plot with more common axis ticks to locate them. The following code shows a slightly modified version of the plot_freq_spectrum function for this purpose:
def plot_freq_spectrum(F, title, cmap=plt.cm.gray):
plt.imshow((20*np.log10(0.1 + fp.fftshift(F))).real.astype(int), cmap=cmap) plt.xticks(np.arange(0, im.shape[1], 25))
plt.yticks(np.arange(0, im.shape[0], 25)) plt.title(title, size=20)
Plot the original noisy image and its power spectrum using the following few lines of code:
plt.figure(figsize=(20,10)) plt.subplot(121), plot_image(im_noisy, 'Noisy Input Image') plt.subplot(122), plot_freq_spectrum(F_noisy, 'Noisy Image Spectrum') plt.tight_layout() plt.show()
From the preceding figure, observe that there are two unusual bright spots in the power spectrums that may be causing the sinusoidal noise to enlarge the plot and to mark and locate the
points. The following figure shows that one such point is located at can you guess the location of the second one?
Now, let us block those two frequencies (set the corresponding coefficients to and let us go back to the spatial domain from the modified spectrum to check if it removes the periodic noise by using the following code:
F_noisy_shifted = fp.fftshift(F_noisy) F_noisy_shifted[180,210] = F_noisy_shifted[200,190] = 0
im_out = fp.ifft2(fp.ifftshift(F_noisy_shifted)).real #np.abs() Let us plot the output image obtained to see if the image is recovered as follows: #print(signaltonoise(im1, axis=None))
plt.figure(figsize=(10,8)) plot_image(im_out, 'Output Image') plt.show()
As you can see from the preceding output, we have removed the periodic noise by stopping those two frequencies (thereby, implementing a notch filter).
Removing salt and pepper noise using the Gaussian LPF with scipy fftpack In this problem, you will learn how to remove impulse (salt and pepper) noise from an image using an LPF in the frequency domain. We shall use the fft2() function from scipy.fftpack module to compute the power spectrum of the image and then use the Gaussian LPF to remove noise (which generally corresponds to the high frequency components) as follows:
Let us start by importing additional libraries we need by using the following code:
from scipy import ndimage from scipy import fftpack from skimage.util import random_noise
Read the input lion’s image, convert it to grayscale, and add random impulse noise with the image to obtain the noisy image by using the following code snippet. The random_noise() function from the util module is used; the description of the function is shown as follows (taken from the documentation):
skimage.util.random_noise(image, mode='gaussian', seed=None, clip=True, **kwargs) Adds random noise of various types to a floating-point image.
image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image.
image. image. image. image. image.
im = rgb2gray(imread('images/Img_03_02.jpg')) noisy = random_noise(im, mode='s&p')
Now, compute the DFT of the noisy image and apply the Gaussian LPF with standard deviation σ in the frequency domain, with a subsequent IDFT to obtain the smoothed image, as shown in the following code block. The fourier_gaussian() function from scipy.ndimage module is used for this purpose. The following is the description of the function:
scipy.ndimage.fourier.fourier_gaussian(input, sigma, n=-1, axis=-1, output=None) Function to add random noise of various types to a floating-point image.
image. image. image. image. image. image. image. image. image. image.
image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image. image.
image. image. image.
im_freq = fftpack.fft2(im) noisy_freq = fftpack.fft2(noisy) sigma = 1 #0.1 noisy_smoothed_freq = ndimage.fourier_gaussian(noisy_freq, sigma= sigma) noisy_smoothed = fftpack.ifft2(noisy_smoothed_freq) Now, plot all the images along with the power spectrums by using the following lines of code: fig, ((ax1, ax2), (ax3, ax4), (ax5, ax6)) = plt.subplots(3, 2, figsize= (20,20)) plt.gray() # show the filtered result in grayscale ax1.imshow(im), ax1.axis('off'), ax1.set_title('Original Image', size=20) ax2.imshow((20*np.log10(0.1 + fftpack.fftshift(im_freq))).real.astype(int)) ax2.set_title('Original Image (Freq Spec)', size=20) ax3.imshow(noisy), ax3.axis('off'), ax3.set_title('Noisy Image', size=20) ax4.imshow((20*np.log10(0.1 + fftpack.fftshift(noisy_freq))).real.astype(int)) ax4.set_title('Noisy Image (Freq Spec)', size=20) # the imaginary part is an artifact
ax5.imshow(noisy_smoothed.real), ax5.axis('off'), ax5.set_title ('Output Image (with L PF)', size=20) ax6.imshow((20*np.log10(0.1 + fftpack.fftshift(noisy_smoothed_freq))).real.astype(int)) ax6.set_title('Output Image (Freq Spec)', size=20) plt.tight_layout() plt.show()
As can be seen from the preceding figure, most of the salt and pepper noise (white/black dots) is removed with the Gaussian LPF, which does not allow high frequencies corresponding to the noise; however, this smooths out some fine details in the image.
We shall see in the later chapters that the ideal filter for removing impulse noise is the median filter.
Blurring an image with an LPF in the frequency domain
An LPF blocks the high frequency components and allows the low frequency components only to pass through it. Hence, the application of an LPF on an image results in an image with finer details/edges and noise/outliers removed. This process is known as blurring (or smoothing) the image. It can be a part of preprocessing an image before a complex image processing task (for example, image augmentation prior classification). In this problem, you will learn how different types of blur kernels that can be convolved with an image to generate blurred images.
Different blur kernels and convolution in the frequency domain
A blur applied to an image can be of the following types: Edge This type of blur is generally applied to an image explicitly with convolution, examples include linear filter kernels such as box-blur kernel or Gaussian kernel; they are applied to smooth/remove unnecessary details/noise from an image. Motion This blur generally is created when the camera to capture is shaken, that is, either the camera/objects to be captured are in the moving condition. We can use a kernel point spread function to simulate this blur.
Out-of-focus (de-focus) This blur is also created when the object to be captured by the camera is out of focus; we shall simulate this blur too using a blur kernel.
Now, follow the following steps to create three different types of kernels and apply them to images to see the result:
Let us first define the function to return a 2D Gaussian (edge) blur kernel to be used for edge blurring by using the following code block shown. Notice that it accepts the standard deviation (σ) of the Gaussian along with the size of the 2D kernel be created (for example, sz=15 will create a 15x15 kernel) as parameters to the function. As can be seen, first, a 1D Gaussian
kernel is created and then the outer product of two such kernels are taken to return a 2D kernel as follows:
def get_gaussian_edge_blur_kernel(sigma, sz=15): # First create a 1-D Gaussian kernel x = np.linspace(-10, 10, sz)
kernel_1d = np.exp(-x**2/sigma**2) kernel_1d /= np.trapz(kernel_1d) # normalize the sum to 1.0 # create a 2-D Gaussian kernel from the 1-D kernel kernel_2d = kernel_1d[:, np.newaxis] * kernel_1d[np.newaxis, :] return kernel_2d
Next, define the following function to generate a motion blur kernel. A line of a given length and oriented at a particular direction (angle) is to be used as the convolution kernel to create a motion-blurred version of the input image.
The function takes as an argument the length and angle of the blur, along with the size of the blur kernel. It uses OpenCVPython’s the warpAffine() function to return a kernel matrix with a line (using pixels of value of the given length and at the given angle, centered at the middle of the matrix as follows: def get_motion_blur_kernel(ln, angle, sz=15): kern = np.ones((1, ln), np.float32) angle = -np.pi*angle/180 c, s = np.cos(angle), np.sin(angle) A = np.float32([[c, -s, 0], [s, c, 0]]) sz2 = sz // 2 A[:,2] = (sz2, sz2) - np.dot(A[:,:2], ((ln-1)*0.5, 0))
kern = cv2.warpAffine(kern, A, (sz, sz), flags=cv2.INTER_CUBIC) return kern
Finally, define the following function to generate an out-of-focus kernel (to simulate de-focusing of an image). A circle of a given radius is to be used as the convolution kernel.
This function accepts the radius r (de-focus radius) and the size of the kernel to be generated as input parameters as follows: def get_out_of_focus_kernel(r, sz=15): kern = np.zeros((sz, sz), np.uint8) cv2.circle(kern, (sz, sz), r, 255, -1, cv2.LINE_AA, shift=1) kern = np.float32(kern) / 255 return kern
Next, implement the following function that performs the convolution in the frequency domain using element-wise multiplication of the image and the convolution kernel in the frequency domain (remember the convolution theorem). The function also plots the input image, the kernel PSF, and the output images obtained with convolution as follows:
def dft_convolve(im, kernel): F_im = fp.fft2(im) #F_kernel = fp.fft2(kernel, s=im.shape) F_kernel = fp.fft2(fp.ifftshift(kernel), s=im.shape) F_filtered = F_im * F_kernel im_filtered = fp.ifft2(F_filtered)
cmap = 'RdBu' plt.figure(figsize=(20,10)) plt.gray() plt.subplot(131), plt.imshow(im), plt.axis('off'), plt.title('input image', size=20) plt.subplot(132), plt.imshow(kernel, cmap=cmap), plt.title('kernel', size=20) plt.subplot(133), plt.imshow(im_filtered.real), plt.axis('off'), plt.title('output image', size=20) plt.tight_layout() plt.show() Let us apply the edge-blur kernel function to the image and plot the input, kernel, and output blurred images using the following lines of code: im = rgb2gray(imread('images/Img_03_03.jpg')) kernel = get_gaussian_edge_blur_kernel(25, 25) dft_convolve(im, kernel)
You will obtain the following figure:
Now, apply the motion_blur_kernel function to the image and plot the input, kernel, and output blurred images using the following lines of code: kernel = get_motion_blur_kernel(30, 60, 25) dft_convolve(im, kernel)
You will obtain the following figure:
Finally, apply the get_out_of_focus_kernel function to the image and plot the input, kernel, and output blurred images using the following lines of code. You will obtain a figure that follows the code block: kernel = get_out_of_focus_kernel(15, 20) dft_convolve(im, kernel)
Blurring with scipy.ndimage frequency-domain filters
The scipy ndimage module provides a bunch of functions to apply low pass filters on the image in the frequency domain. The following few subsections demonstrate some of these filters with examples.
With fourier_gaussian
Let us use the fourier_gaussian() function from the scipy ndimage library to run convolution with Gaussian kernel in the frequency domain as follows:
First, read the input Lena grayscale image and get its frequency domain representation with FFT by using the following lines of code: im = imread('images/Img_03_31.png') freq = fp.fft2(im)
Now, use the fourier_gaussian() function to obtain the blurred version of the Lena image, with a couple of different values for the standard deviation (σ) of the Gaussian kernel, using the following code block. Plot the input and the output images along with the power spectrums.
fig, axes = plt.subplots(2, 3, figsize=(20,15)) plt.subplots_adjust(0,0,1,0.95,0.05,0.05) plt.gray() # show the filtered result in grayscale axes[0, 0].imshow(im), axes[0, 0].set_title('Original Image', size=20) axes[1, 0].imshow((20*np.log10(0.1 + fp.fftshift(freq))).real.astype(int)), axes[1, 0].set_title('Original Image Spectrum', size=20) i = 1
for sigma in [3,5]: convolved_freq = ndimage.fourier_gaussian(freq, sigma=sigma) convolved = fp.ifft2(convolved_freq).real # the imaginary part is an artifact
axes[0, i].imshow(convolved) axes[0, i].set_title(r'Output with FFT Gaussian Blur, $\sigma$= {}'.format(sigma), size=20) axes[1, i].imshow((20*np.log10(0.1 + fp.fftshift(convolved_freq))).real.astype(int)) axes[1, i].set_title(r'Spectrum with FFT Gaussian Blur, $\sigma$= {}'.format(sigma), size=20) i += 1 for a in axes.ravel(): a.axis('off') plt.show() The following figure shows the output of the preceding code block:
As you can notice from the preceding figure, the more you increase the value of σ for the Gaussian kernel, the more the output image gets blurred, as the LPF blocks the high frequencies more.
With fourier_uniform
The function fourier_uniform() from the scipy ndimage module implements a multi-dimensional uniform Fourier filter. The frequency array is multiplied with the Fourier transform of a box kernel of a given size. The following code block demonstrates how to use the LPF (average filter) to blur the Lena grayscale image as follows:
As done earlier, read the image and get its frequency domain representation with the DFT.
Then, use the function fourier_uniform() to apply 10x10 box kernel (specified by the argument on the power spectrum to obtain the smoothed output as follows:
im = imread('images/Img_03_31.png') freq = fp.fft2(im) freq_uniform = ndimage.fourier_uniform(freq, size=10)
Plot the input and the blurred image using the following code snippet: fig, (axes1, axes2) = plt.subplots(1, 2, figsize=(20,10)) plt.gray() # show the result in grayscale im1 = fp.ifft2(freq_uniform) axes1.imshow(im), axes1.axis('off')
axes1.set_title('Original Image', size=20) axes2.imshow(im1.real) # the imaginary part is an artifact axes2.axis('off') axes2.set_title('Blurred Image with Fourier Uniform', size=20)
plt.tight_layout() plt.show()
The following figure shows the output of the preceding code block:
Use the next code block to display the power spectrum of the image after the box kernel is applied as follows: plt.figure(figsize=(10,10)) plt.imshow((20*np.log10(0.1 + fp.fftshift(freq_uniform))).real.astype(int)) plt.title('Frequency Spectrum with fourier uniform', size=20) plt.show()
The following figure shows the output of the preceding code block:
With fourier_ellipsoid
Using the same code as in the previous section by changing the single line of code to replace the box kernel with an ellipsoid kernel as shown in the following code block, you can generate the blurred output image with ellipsoid kernel:
As before, apply the function fourier_ellipsoid() on the power spectrum of the image and use the IDFT to obtain the blurred output image in the spatial domain by using the following lines of code:
freq_ellipsoid = ndimage.fourier_ellipsoid(freq, size=10) im1 = fp.ifft2(freq_ellipsoid)
Plot the input and the blurred images using the following code snippet:
fig, (axes1, axes2) = plt.subplots(1, 2, figsize=(20,10)) axes1.imshow(im), axes1.axis('off') axes1.set_title('Original Image', size=20) axes2.imshow(im1.real) # the imaginary part is an artifact axes2.axis('off') axes2.set_title('Blurred Image with Fourier Ellipsoid', size=20) plt.tight_layout() plt.show()
The following figure shows the output of the preceding code block:
Again, use the following code snippet to create a figure like the following one displaying the frequency spectrum of the image after the ellipsoid kernel is applied: plt.figure(figsize=(10,10)) plt.imshow((20*np.log10(0.1 + fp.fftshift(freq_ellipsoid))).real.astype(int)) plt.title('Frequency Spectrum with Fourier ellipsoid', size=20) plt.show()
The following figure shows the output of the preceding code block:
Gaussian blur LPF with scipy.fftpack
Up until now, we have been using numpy.fft module’s 2D-FFT implementation. In this section, you will see how scipy.fftpack module’s fft2() function can be used for the same purpose. The following are the steps:
We shall use a grayscale CT-scan image of the human brain as an input here. Use FFT to create the 2D frequency response array from the image as follows: #im = rgb2gray(imread('images/Img_03_04.jpg')) im = rgb2gray(imread('images/Img_03_11.jpg')) freq = fp.fft2(im)
Create a Gaussian 2D Kernel (to be used as an LPF) in the spatial domain by taking the outer product of two 1D Gaussian kernels as follows:
kernel = np.outer(signal.gaussian(im.shape[0], 1), signal.gaussian (im.shape[1], 1)) #assert(freq.shape == kernel.shape) Use the DFT to obtain the frequency response of the Gaussian Kernel as follows:
freq_kernel = fp.fft2(fp.ifftshift(kernel))
Use the convolution theorem to convolve the LPF with the input image in the frequency domain by element-wise multiplication:
convolved = freq*freq_kernel # by the Convolution theorem
Use IFFT to obtain the output image. Notice that to display the output image properly, you may need to scale the output image as follows: im_blur = fp.ifft2(convolved).real
im_blur = 255 * im_blur / np.max(im_blur) Plot the power spectrum of the image, the Gaussian kernel, and the image obtained after convolution in the frequency domain by using the following code block. Note that using Matplotlib colormap (for example, and the associated colorbar to the right side of the plot, we can have an idea of the value of the frequency response at different coordinates as follows: plt.figure(figsize=(20,20)) plt.subplot(221), plt.imshow(kernel, cmap='coolwarm'), plt.colorbar() plt.title('Gaussian Blur Kernel', size=20) # center the frequency response plt.subplot(222) plt.imshow((20*np.log10(0.01 + fp.fftshift(freq_kernel))).real.astype(int), cmap='inferno') plt.colorbar()
plt.title('Gaussian Blur Kernel (Freq. Spec.)', size=20) plt.subplot(223), plt.imshow(im, cmap='gray'), plt.axis('off'), plt.title('Input Image', size=20) plt.subplot(224), plt.imshow(im_blur, cmap='gray'), plt.axis ('off'), plt.title('Output Blurred Image', size=20) plt.tight_layout() plt.show()
The following figure shows the output of the preceding code block:
To plot the power spectrum of the input/output images and the kernel in 3D, let us define the following function. We shall use the plot_surface() function from mpl_toolkits.mplot3d to obtain a plot of the power spectrum in 3D, given the corresponding Y, and Z values passed as 2D arrays as follows: def plot_3d(X, Y, Z, cmap=plt.cm.seismic): fig = plt.figure(figsize=(20,20)) ax = fig.gca(projection='3d')
# Plot the surface. surf = ax.plot_surface(X, Y, Z, cmap=cmap, linewidth=5, antialiased=False) #ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10) #ax.set_zscale("log", nonposx='clip') #ax.zaxis.set_scale('log') ax.zaxis.set_major_locator(LinearLocator(10)) ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f')) ax.set_xlabel('F1', size=30) ax.set_ylabel('F2', size=30) ax.set_zlabel('Freq Response', size=30) #ax.set_zlim((-40,10)) # Add a color bar which maps values to colors. fig.colorbar(surf) #, shrink=0.15, aspect=10) #plt.title('Frequency Response of the Gaussian Kernel') plt.show()
Plot the frequency response of the Gaussian kernel in 3D using the following code, with the plot_3d() function as defined earlier: Y = np.arange(freq.shape[0]) #-freq.shape[0]//2,freq.shape[0]freq.shape[0]//2) X = np.arange(freq.shape[1]) #-freq.shape[1]//2,freq.shape[1]freq.shape[1]//2) X, Y = np.meshgrid(X, Y) Z = (20*np.log10(0.01 + fp.fftshift(freq_kernel))).real plot_3d(X,Y,Z)
The following figure shows how the power spectrum of the Gaussian LPF kernel looks in 3D:
Plot the power spectrum of the CT input image in 3D using the following lines of code:
Z = (20*np.log10(0.01 + fp.fftshift(freq))).real
plot_3d(X,Y,Z)
Finally, plot the power spectrum of the output image (obtained by convolving the Gaussian kernel with the input image) in 3D using the following lines of code:
Z = (20*np.log10(0.01 + fp.fftshift(convolved))).real plot_3d(X,Y,Z)
As can be seen from the preceding frequency response of the output image, the high frequency components are attenuated, causing the smoothing/loss of fine details and resulting in the blurred output image.
Convolution in the frequency domain with a colored image using fftconvolve from scipy signal In this section, you will learn how scipy.signal module’s fftconvolve() for frequency-domain convolution with an RGB color input image, producing an RGB color blurred output image. The following table describes the function (relevant parts were taken from SciPy’s documentation):
scipy.signal.fftconvolve(in1, in2, mode='full', axes=None) Convolve two N-dimensional arrays in1 and in2 using FFT, with the output size determined by the mode argument.
argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument. argument.
Modes of convolution are as follows:
The output is the full discrete linear convolution of the inputs (default).
The output consists only of those elements that do not rely on the zero-padding. Either in1 or in2 must be at least as large as the other in every dimension.
The output is the same size as in1 and centered concerning the output. Now, let us follow the following steps to implement a Gaussian low-pass filter and a Laplacian high-pass filter using First, import the required packages and read the input RGB image of a tiger, using the following lines of code: from skimage import img_as_float from scipy import signal im = img_as_float(plt.imread('images/Img_03_07.jpg')) Create a Gaussian kernel of size 15x15 with σ = 10 using the function get_gaussian_edge_blur_kernel() we implemented earlier. Reshape the kernel to 15x15x1 using np.newaxis and use signal.fftconvolve() with the mode as follows: kernel = get_gaussian_edge_blur_kernel(sigma=10, sz=15)
im1 = signal.fftconvolve(im, kernel[:, :, np.newaxis], mode='same') im1 = im1 / np.max(im1) As mentioned earlier, mode='same' used in the preceding code snippet enforces the output shape to be same as the input array shape (that is, avoid border effects).
Now, let us use the Laplacian HPF kernel and apply the frequency domain convolution with the same function by using the following code block.
Note, that you may need to scale/clip the output image to keep it in the range [0,1] required for plotting an image with floating-point values of pixels as follows: kernel = np.array([[0,-1,0],[-1,4,-1],[0,-1,0]]) im2 = signal.fftconvolve(im, kernel[:, :, np.newaxis], mode='same') im2 = im2 / np.max(im2) im2 = np.clip(im2, 0, 1) Finally, use the following code snippet to plot the input and the output images created using convolution.
plt.figure(figsize=(20,10)) plt.subplot(131), plt.imshow(im), plt.axis('off'), plt.title ('original image', size=20) plt.subplot(132), plt.imshow(im1), plt.axis('off'), plt.title ('output with Gaussian LPF', size=20)
plt.subplot(133), plt.imshow(im2), plt.axis('off'), plt.title ('output with Laplacian HPF', size=20) plt.tight_layout() plt.show()
The next figure shows the original image along with the output images:
As expected, the Gaussian LPF blurs the image whereas the Laplacian HPF extracts the finer details (edges) in the image (corresponding to high frequencies).
Edge detection with high pass filters (HPF) in the frequency domain The HPF refers to a family of filters that allows only high frequencies from the frequency response of the image (obtained with the DFT) and blocks all low frequencies beyond a cut-off value. The image is reconstructed with the inverse DFT, and as the high frequency components correspond to edges, details, noise, and so on, HPFs tend to extract or enhance them. In this problem, you will learn how to implement a few HPFs such as ideal, Gaussian, and Butterworth HPFs with OpenCV-Python’s implementation of FFT.
The following figure formulates the frequency response function H for a few popular HPFs:
Now, let us follow the following steps to implement the preceding HPFs using OpenCV-Python FFT/IFFT functions as follows:
Let us first define the following function to implement the 2D DFT in Python. You need to do the following:
Convert the input image to float
Apply the DFT with cv2.dft() function to obtain a complex output
Shift the origin from the upper-left corner to center of the image
Extract magnitude and phase images and return as follows:
def dft2(im):
freq = cv2.dft(np.float32(im), flags = cv2.DFT_COMPLEX_OUTPUT) freq_shift = np.fft.fftshift(freq) mag, phase = freq_shift[:,:,0], freq_shift[:,:,1] return mag + 1j*phase Now, define the following function to compute the 2D-IDFT. The following are the steps: Separate magnitude and phase from the complex power spectrum input Raise magnitude to some power near 1;values larger than one increase contrast; values smaller than one decrease contrast
Convert magnitude and phase into Cartesian real and imaginary components
Combine Cartesian components into one complex image
Shift origin from the center to the upper-left corner
Compute the IDFT using cv2.idft() to obtain the complex output Combine complex components into the spatial domain image again as follows: cv2.idft(src, flags, nonzeroRows) Calculates the inverse Discrete Fourier Transform of a 1D or 2D array.
array. array. array.
array. array. array. array. array. array. array. array. array. array. array. array. array. array. array. array. array. array. array. array. array. array. array. array. array. array.
None of cv2.dft() and cv2.idft() scales the result by default. So, you should pass DFT_SCALE to either of dft or idft explicitly to make these transforms mutually inverse as follows: def idft2(freq): real, imag = freq.real, freq.imag back = cv2.merge([real, imag]) back_ishift = np.fft.ifftshift(back) im = cv2.idft(back_ishift, flags=cv2.DFT_SCALE)
im = cv2.magnitude(im[:,:,0], im[:,:,1]) return im Now, let us implement the preceding HPFs using the formula described earlier. Each function accepts a size parameter sz for the kernel seize to be returned for a given cut-off frequency D0 as def ideal(sz, D0): h, w = sz
u, v = np.meshgrid(range(-w//2,w//2), range(-h//2,h//2)) #, sparse=True) return np.sqrt(u**2 + v**2) > D0 def gaussian(sz, D0): h, w = sz u, v = np.meshgrid(range(-w//2,w//2), range(-h//2,h//2)) #, sparse=True) return 1-np.exp(-(u**2 + v**2)/(2*D0**2)) def butterworth(sz, D0, n=1): h, w = sz u, v = np.meshgrid(range(-w//2,w//2), range(-h//2,h//2)) #, sparse=True) return 1 / (1 + (D0/(0.01+np.sqrt(u**2 + v**2)))**(2*n))
Implement the following function that accepts an input image an HPF function and a list of cut-off frequencies to plot the input/output image obtained by applying the HPF on the input
image, along with the power spectrums of the HPFs for different cut-off values as follows: def plot_HPF(im, f, D0s): freq = dft2(im) fig = plt.figure(figsize=(20,20)) plt.subplots_adjust(0,0,1,0.95,0.05,0.05) i = 1 for D0 in D0s: freq_kernel = f(im.shape, D0) convolved = freq*freq_kernel # by the Convolution theorem im_convolved = idft2(convolved).real im_convolved = (255 * im_convolved / np.max(im_convolved)).astype(np.uint8) plt.subplot(2,2,i) last_axes = plt.gca() img = plt.imshow((20*np.log10(0.01 + freq_kernel)).astype(int), cmap='coolwarm') divider = make_axes_locatable(img.axes) cax = divider.append_axes("right", size="5%", pad=0.05) fig.colorbar(img, cax=cax) plt.sca(last_axes), plt.title('{} HPF Kernel (freq)'.format(f.__name__), size=20) plt.subplot(2,2,i+2), plt.imshow(im_convolved), plt.axis ('off') plt.title(r'output with {} HPF ($D_0$={})'.format(f.__name__, D0), size=20) i += 1 plt.show()
Implement the following function to plot the 3D frequency response corresponding to an output image obtained by applying a given HPF function parameterized by a set of cut-off frequencies on an input image as follows: def plot_HPF_3d(im, f, D0s): freq = dft2(im) fig = plt.figure(figsize=(20,10)) plt.subplots_adjust(0,0,1,0.95,0.05,0.05) i = 1 for D0 in D0s: freq_kernel = f(im.shape, D0) convolved = freq*freq_kernel # by the Convolution theorem Y = np.arange(freq_kernel.shape[0]) X = np.arange(freq_kernel.shape[1]) X, Y = np.meshgrid(X, Y) Z = (20*np.log10(0.01 + convolved)).real ax = fig.add_subplot(1, 2, i, projection='3d') surf = ax.plot_surface(X, Y, Z, cmap=plt.cm.coolwarm, linewidth=0, antialiased=False) ax.zaxis.set_major_locator(LinearLocator(10)), ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f')) ax.set_xlabel('F1', size=30), ax.set_ylabel('F2', size=30) plt.title(r'output with {} HPF (freq)'.format(f.__name__, D0), size=20) fig.colorbar(surf, shrink=0.5, aspect=10) i += 1 plt.show()
Implement the following function to plot 3D power spectrum of the HPF kernel, given the kernel function size and cut-off frequencies
def plot_filter_3d(sz, f, D0s, cmap=plt.cm.coolwarm): fig = plt.figure(figsize=(20,10)) plt.subplots_adjust(0,0,1,0.95,0.05,0.05) i = 1 for D0 in D0s: freq_kernel = f(sz, D0) Y = np.arange(freq_kernel.shape[0]) X = np.arange(freq_kernel.shape[1]) X, Y = np.meshgrid(X, Y) Z = (20*np.log10(0.01 + freq_kernel)).real ax = fig.add_subplot(1, 3, i, projection='3d') surf = ax.plot_surface(X, Y, Z, cmap=cmap, linewidth=0, antialiased=False) ax.zaxis.set_major_locator(LinearLocator(10)) ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f')) ax.set_xlabel('F1', size=30) ax.set_ylabel('F2', size=30) ax.set_title('{} HPF Kernel (freq)'.format(f.__name__), size=20) fig.colorbar(surf, shrink=0.5, aspect=10) i += 1 We shall use the grayscale cameraman image in this problem as follows:
im = plt.imread('images/Img_03_12.png') im = rgb2gray(im)
plt.figure(figsize=(7,12)) plt.imshow(im), plt.axis('off'), plt.title('original image') plt.show() The following lines of code display the following image:
We shall demonstrate the HPFs for a couple of different cut-off frequencies values as follows:
D0 = [10, 30] Plot the frequency response of the ideal HPF and the input/output images with the plot_HPF() function as defined earlier using the following code snippet:
plot_HPF(im, ideal, D0)
As can be seen in the preceding figure, the ideal HPF attenuates the low-frequency components from the image; at the same time (due to sharp 0-1 transition), it introduces ringing artefacts (spurious patterns near sharp transitions in the image).
Next, let us plot the frequency response of the output image for different cut-off values with the function plot_HPF_3d() as defined earlier using the following code snippet: plot_filter_3d(im.shape, ideal, D0)
Now, let us plot the power spectrum of the ideal HPF kernel for different cut-off values using the function plot_filter_3d() as
plot_filter_3d(im.shape, ideal, D0)
Next, plot the frequency response of the Gaussian HPF and the input/output images with the plot_HPF() function as follows: plot_HPF(im, gaussian, D0)
As can be seen from the preceding figure, the Gaussian HPF attenuates the low frequency components from the image; at the same time, it removes the ringing artefacts that were introduced in the case of the ideal HPF. Next, let us plot the frequency response of the output image for different cut-off values with the function plot_HPF_3d() as defined earlier using the following: plot_HPF_3d(im, gaussian, D0)
Now, let us plot the power spectrum of the ideal HPF kernel for different cut-off values using the function plot_filter_3d() as plot_filter_3d(im.shape, gaussian, D0)
Next, plot the frequency response of the Butterworth HPF and the input/output images with the plot_HPF() function as follows: plot_HPF(im, butterworth, D0)
Next, let us plot the frequency response of the output image for different cut-off values with the function plot_HPF_3d() as defined earlier using the following:
plot_HPF_3d(im, butterworth, D0)
Now, let us plot the power spectrum of the ideal HPF kernel for different cut-off values using the function plot_filter_3d() as
plot_filter_3d(im.shape, butterworth, D0)
Implementation of homomorphic filters
Homomorphic filtering is a technique for removing multiplicative noise from an image; it is most commonly used for correcting non-uniform illumination in images. As per the illuminationreflectance model of image formation, an image f(x, y) can be characterized by the following two components:
The amount of source light incident on the scene being viewed. The amount of light reflected by the objects in the scene.
According to the model, the intensity at any pixel in an image, which is the amount of light reflected by a point on the object, is the product of illumination of the scene and reflectance of the object(s) in the scene. The Fourier transform is linear and associative under addition but is not associative under multiplication. Thus, Fourier methods are suitable for removing noise from images only when the noise can be modelled as an additive term to the original image.
However, if defects of the image, for example, uneven lighting, have to be modelled as multiplicative rather than additive, direct application of Fourier methods is inappropriate. This is where the homomorphic filtering comes in; for this filter, first, the multiplicative components are transformed to additive components by moving to the log domain. Then, an HPF in the log domain is
used to remove the low frequency illumination component while preserving the high frequency reflectance component.
The basic steps in homomorphic filtering are shown in the following diagram, the input image being f(x, y) and the output from the filter being g(x, y):
In this problem, we shall learn how to implement a homomorphic filter using a Butterworth HPF. The following are the steps you need to follow:
First, let us import the additional functions that will be required to implement homomorphic filters by using the following line of code: from skimage.filters import sobel, threshold_otsu
Let us define the following function that implements homomorphic filtering. The homomorphic filter in the frequency domain H(u,v) will be represented as follows:
where and are two sensible parameters. To avoid undefined operation in the log domain, a constant 1 is added with the input to ensure that the input to log is always ≥ 1, and in the end, that 1 is subtracted from the output as follows:
def homomorphic_filter(im, D0, g_l=0, g_h=1, n=1): im_log = np.log(im.astype(np.float)+1) im_fft = dft2(im_log) H = (g_h - g_l) * butterworth(im.shape, D0, n) + g_l #H = np.fft.ifftshift(H) im_fft_filt = H*im_fft #im_fft_filt = np.fft.ifftshift(im_fft_filt) im_filt = idft2(im_fft_filt) im = np.exp(im_filt.real)-1 im = np.uint8(255*im/im.max()) return im Read the input image (with non-uniform illumination), convert it to grayscale (ensure that the pixel values are in the range 0-255),
and then apply the homomorphic filter by invoking the preceding function. Here, the cut-off frequency used for the degree n= 2 for the Butterworth filter is
The and parameters are set to 0.3 and respectively, (try other values but note that > as follows: image = rgb2gray(imread('images/Img_03_13.jpg')) image_filtered = homomorphic_filter(image, D0=30, n=2, g_l=0.3, g_h=1) Use the sobel filter to extract the edges from the original image. Create a binary edges image using the Otsu optimal thresholding as follows: image_edges = sobel(image) image_edges = image_edges 0 im = im * 1.0 im = cv2.GaussianBlur(im, (feather_amount, feather_amount), 0) return im Define the following function to correct the color of the warped second face, given the first face along with the landmarks. The norm of the mean color difference of the eyes in the first image is used to compute the blur amount. The color of the warped second face is corrected to match the color of the first face as follows:
def correct_colours(im1, im2, landmarks1): mean_left = np.mean(landmarks1[left_eye_points], axis=0) mean_right = np.mean(landmarks1[right_eye_points], axis=0) blur_amount = color_correction_blur * np.linalg.norm(mean_left mean_right) blur_amount = int(blur_amount) if blur_amount % 2 == 0: # make the blur kernel size odd blur_amount += 1 im1_blur = cv2.GaussianBlur(im1, (blur_amount, blur_amount), 0) im2_blur = cv2.GaussianBlur(im2, (blur_amount, blur_amount), 0) # avoid division errors im2_blur += (128 * (im2_blur