190 58 6MB
English Pages 196 Year 2019
Siddhartha Bhattacharyya, Indrajit Pan, Abhijit Das, Shibakali Gupta (Eds.) Intelligent Multimedia Data Analysis De Gruyter Frontiers in Computational Intelligence
De Gruyter Frontiers in Computational Intelligence
Edited by Siddhartha Bhattacharyya
|
Volume 2 Already published in the series Volume 1: Machine Learning for Big Data Analysis S. Bhattacharyya, H. Baumik, A. Mukherjee, S. De (Eds.) ISBN 978-3-11-055032-0, e-ISBN (PDF) 978-3-11-055143-3, e-ISBN (EPUB) 978-3-11-055077-1
Intelligent Multimedia Data Analysis | Edited by Siddhartha Bhattacharyya, Indrajit Pan, Abhijit Das, Shibakali Gupta
Editors Prof. (Dr.) Siddhartha Bhattacharyya RCC Institute of Information Technology, Canal South Road, Beliaghata, Kolkata-700015, India [email protected] Dr. Indrajit Pan RCC Institute of Information Technology Canal South Road Kolkata-700015, India [email protected] Dr. Abhijit Das RCC Institute of Information Technology Canal South Road Kolkata-700015, India [email protected] Dr. Shibakali Gupta University Institute of Technology The University of Burdwan Burdwan, India [email protected]
ISBN 978-3-11-055031-3 e-ISBN (PDF) 978-3-11-055207-2 e-ISBN (EPUB) 978-3-11-055033-7 ISSN 2512-8868 Library of Congress Control Number: 2018954507 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2019 Walter de Gruyter GmbH, Berlin/Munich/Boston Typesetting: le-tex publishing services GmbH, Leipzig Printing and binding: CPI books GmbH, Leck www.degruyter.com
Preface Multimedia data is a combination of different discrete and continuous content forms like text, audio, images, videos, animations and interactional data. It differs from media that employs text data or traditional forms of printed or hand-produced data. At least, a single continuous media in the transmitted information generates multimedia information. Due to these different varieties, multimedia data present varied degrees of uncertainties and imprecision, that are not easy to deal with by the conventional computing paradigm. Multidimensional data like true color images which show a wide variety of color content over the entire color gamut, and video data that includes elements of synchronization apart from the curse of dimensionality, are not easy to handle and the classification of the these types of data is an important task as far as intelligent analysis of the true color image data and video content is concerned. Basically, multimedia data analysis used to handle feature extraction, image/video coding and transmission, image/video processing, image/video storage, retrieval, and authentication, video summarization, video indexing, etc. With the limitations and shortcomings of classical platforms of computation, particularly for handling uncertainty and imprecision in image/video data, intelligence technology, as an alternative and extended computation paradigm, is bringing in a new era of technology. Intelligence technology is mainly dependent on the soft computing paradigm which implies a synergistic integration of essentially four other computing paradigms, viz., neural networks, fuzzy logic, rough sets and evolutionary computation, incorporating probabilistic reasoning (belief networks, genetic algorithms and chaotic systems). Soft computing technologies are quite efficient at handling the imprecision and uncertainty of multimedia data and they are flexible enough to process real-world information. In spite of the several advantages of soft computing techniques, many times soft computing tools and techniques are unable to provide a solution to problems. As a result, different types of hybridization of soft computing techniques such as neuro-fuzzy, neuro-genetic, fuzzy-genetic, neuro-fuzzy-genetic, and rough-fuzzy, etc. are proposed to provide more robust and failsafe solutions. Hybrid soft computing stems from the synergistic integration of the different soft computing tools and techniques. The fusion of these techniques to enhance performance and produce more robust solutions can be achieved through proper hybridization. An intelligent machine inherits the boon of intelligence by virtue of the various methodologies offered by the soft computing paradigm encompassing fuzzy and rough set theory, artificial neurocomputing, evolutionary computing, as well as approximate reasoning. At times the situation demands in reality, where any of the techniques listed above does not provide any comprehensible solution, an effective symbiosis of more than one of the above techniques, and thus offers a formidable solution. This gives rise to the advent of several hybrid methodologies. Of late, there has been an enormous growth in exploration of injecting elements of intelligence using efficient hybrid techhttps://doi.org/10.1515/9783110552072-201
VI | Preface
niques. All these initiatives indicate that the individual soft computing techniques do not behave in conflicting manners, but behave as complimentary to one another. In fact, recent reports reveal the inherent strength of such hybridization of computation methods. This volume comprises eight well versed contributed chapters devoted to reporting the latest findings on intelligent multimedia data analysis. Medical color image enhancement is an emerging area of color image processing research where digital image processing algorithms are used to increase the quality of medical images for better analysis, and to smooth out the medical diagnosis process which in turn increases the accuracy and efficiency of the overall analysis process. Usually, enhancement techniques used for gray images are extended to color images. But color images need special treatment for quality improvement as brightness issue need to be dealt with very carefully otherwise it may cause deterioration of the images. Also, if hue values are altered during enhancement, then it will result in missing color information. So, there are various problems and challenges involved in color image enhancement techniques. In Chapter 1, an effort has been made to shed light on these issues. Also, a study of recent techniques for medical color image enhancement techniques is carried out to examine their pros and cons. Finally, a few suggestions are proposed to increase the efficiency of such techniques. Large and complex multimedia data network analysis through the community detection approach is one of the recent trends of research. Community detection helps to group multiple members with similar characteristics within this network. It has many prospective applications which make it more relevant to use in multi-disciplinary research domains. Initial work on community detection started with static and disjoint community identification but recently that trend has evolved into the exploration of dynamic and overlapping community structures. Chapter 2 explores many recent research reports on a wide range of community detection related works. Those works mainly explore different primitive graph clustering methods for the said purpose. However the application of different intelligent algorithms, meta-heuristic techniques, hybrid intelligent approaches and biologically inspired intelligent techniques are yet to be considered on a large scale. This paper leaves some potential clues to research on applying these intelligent mechanisms in community detection. Chapter 3 presents a novel unsupervised framework for the graph-based representation of coronary arteries in X-ray angiograms. The framework steps consists of of vessel detection, segmentation, and skeleton simplification. Vessel detection is performed by Gaussian matched filters (GMF) trained by the univariate marginal distribution algorithm for continuous domains. The detection results are evaluated in terms of the area under the receiver operating characteristic curve. The second step is focused on the binary classification of the response obtained from the GMF method. In the final step, the vessel skeleton simplification is carried out by using the Ramer–Douglas– Peucker algorithm. During the computational experiments, the proposed framework obtained a detection performance of 0.926. The inter-class variance method was se-
Preface |
VII
lected from five state-of-the-art thresholding methods according to its classification accuracy on the segmentation of the detection response (0.923). In the final step, a graph-based structure of the coronary arteries with a data compression ratio of 0.954 is obtained. Based on the experimental results, the proposed method has demonstrated that it is suitable for a variety of applications in computer-aided diagnosis. Feature vector extraction has represented a deeply ingrained component for grouping images into meaningful categories. Two novel diversities of the feature extraction technique, using global, local and mean threshold selections for feature extraction, are introduced using eight different methods in Chapter 4. The techniques are implemented on even and odd variations of generic images for feature extraction. Quantitative comparison of the proposed techniques is made with reference to existing methods of feature extraction using even and odd images. The misclassification rate (MR) and the F1 Score are considered as parameters of comparison. The comparisons revealed the superior performances of the proposed techniques with respect to existing methods. The superiority of the proposed techniques is further established especially for methods adopting local threshold approaches. Chapter 5 proposes a feasible implementation of a student-specific automated system of learning, and tutoring of school level geometry concepts. Unlike standard geometry drawing or tutoring software, the proposed system can intelligently assess the students understanding of a geometry concept by testing the student with relevant problems and accordingly takes him to higher or lower level of concepts. A diagram drawn by a student in a digital interface is evaluated to find the degree of geometric correctness which is otherwise a challenging task. For tutoring primary school level geometry, a figure database with different difficulty levels is maintained while for secondary school level tutoring, earlier developed text-to-diagram drawing functionality is incorporated. The novelty of the tutoring system lies in the fact that no such system exists that can draw, recognize and compare digital geometry diagrams and adjust the diagram based content/problems through different learning and testing stages based on dynamic student-specific responses. The representative test cases cited in this study clearly demonstrates the usefulness and intelligent treatment of the system. Image processing is used to extract some useful information from images. It is a rapidly growing technology today and forms a core research area within engineering and computer science disciplines. Uncertainty based models play major roles in image processing in general and image segmentation leading to their applications in medical image processing and satellite image processing. From among the uncertainty models; namely fuzzy set, rough set, intuitionistic fuzzy set, soft set and their hybrid models, the authors deal with only two, as far as their role in image processing is concerned, in Chapter 6. These are rough set and soft set introduced by Pawlak in 1982 and Molodtsov in 1999 respectively. The authors also deal with some hybrid models of these two models, and models mentioned above will be applied to image segmentation.
VIII | Preface
Clustering is a popular data mining tool whose aim is to divide a data set into a number of groups, or clusters. The aim of this work is to develop a quantum inspired algorithm which is capable of automatically finding the optimal number of clusters in an image data set. Chapter 7 focuses on the quantum inspired automatic clustering technique based on a meta-heuristic algorithm named simulated annealing. The quality of this clustering algorithm has been measured by two separate fitness functions, called the DB index and the I index. A comparison is made between the quantum inspired algorithm with its classical counterparts on the basis of the mean of the fitness, standard deviation, standard error and computational time. Finally, the proof of the superiority of the proposed technique over its classical counterparts is made by the statistical superiority test namely the t-test. The proposed technique is used to cluster four publicly available real life image data sets and four Berkeley image datasets of different dimensions. Nowadays researches on influence propagation and influence maximization in various multimedia data networks and social networks are gaining immense attentions. Chapter 8 introduces two new heuristic metrics, in the form of diffusion propagation and diffusion strength. The diffusion propagation capacity of the connecting edges of a network is measured on the basis of diffusion strength. Diffusion strength and diffusion propagation are further utilized to design a greedy method for seed selection within a k-budgeting scheme where the k-value is derived from the cardinality of the vertex set. This approach makes the seed selection process more intelligent and practically consolidated. These sorted seeds are further used to maximize influence diffusion within the non-seed node of the network. The proposed algorithm is assessed on standard benchmark data sets and experimental findings are compared with other state-of-the-art methods. Comparative analysis reveals better computational time and high influence spread using this method. This volume is intended to be used as a reference by undergraduate and post graduate students of the disciplines of computer science, electronics and telecommunication, information science and electrical engineering. May, 2018 Kolkata, India
Siddhartha Bhattacharyya Indrajit Pan Abhijit Das Shibakali Gupta
| Siddhartha Bhattacharyya would like to dedicate this book to his late father Ajit Kumar Bhattacharyya, his late mother Hashi Bhattacharyya, his beloved wife Rashni Mukherjee, his cousin and sisters-in-law Hena Banerjee, Paplu Banerjee, Ruma Banerjee and Papiya Banerjee Indrajit Pan would like to dedicate this book to his beloved wife Subhamita Abhijit Das would like to dedicate this book to his dearest student the late Indradip Shibakali Gupta would like to dedicate this book to his daughter Prishita, his father the late Sunil Kumar Gupta, his mother Juthika Gupta and his loving wife Ayana Gupta Chakraborty
Contents Preface | V Dedication | IX Dibya Jyoti Bora 1 Medical color image enhancement: Problems, challenges & recent techniques | 1 Sudeep Basu, Indrajit Pan, and Siddhartha Bhattacharyya 2 Exploring the scope of intelligent algorithms for various community detection techniques | 19 Sebastian Salazar-Colores, Fernando Cervantes-Sanchez, Arturo Hernandez-Aguirre, and Ivan Cruz-Aceves 3 An unsupervised graph-based approach for the representation of coronary arteries in X-ray angiograms | 43 Rik Das, Sourav De, and S. N. Singh 4 A study of recent trends in content based image classification | 65 Arindam Mondal, Anirban Mukherjee, and Utpal Garain 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students | 95 B. K. Tripathy, T. R. Sooraj, and R. K. Mohanty 6 Rough set and soft set models in image processing | 123 Alokananda Dey, Sandip Dey, Siddhartha Bhattacharyya, Vaclav Snasel, and Aboul Ella Hassanien 7 Quantum inspired simulated annealing technique for automatic clustering | 145 Mithun Roy, Indrajit Pan, and Siddhartha Bhattacharyya 8 Intelligent greedy model for influence maximization in multimedia data networks | 167 Index | 183
Dibya Jyoti Bora
1 Medical color image enhancement: Problems, challenges & recent techniques Abstract: Medical color image enhancement is an emerging area in color image processing research where digital image processing algorithms are used for increasing the quality of medical images for better analysis of the same, to smooth out the medical diagnosis process which in turn increases the accuracy and efficiency of the overall analysis process. Usually, enhancement techniques used for gray images are extended to color images. But color images need special treatment for its quality improvement as here brightness issue needs to be dealt very carefully otherwise it may cause deterioration of the images. Also, if hue values altered during enhancement, then it will result in missing in color information. So, there are various problems and challenges involved in color image enhancement techniques. In this chapter, an effort has been made to put shed on these issues. Also, a study of recent techniques for medical color image enhancement techniques is carried out to examine their pros and cons. Finally, few suggestions are proposed to increase the efficiency of such techniques. Keywords: Color Image Processing, Color Space Conversion, fuzzy logic, HSV Color space, Medical image enhancement, Soft Computing, Type-2 Fuzzy Set
1.1 Introduction Image enhancement is a crucial task in medical image processing as the accurate result of any diagnosis process depends on the input image’s quality. By image enhancement, we mean any process or technique through which the visual quality of the image can be increased further. These techniques generally comprise of noise removal techniques, contrast improvement techniques, edge sharpening, etc. Out of these, contrast improvement is the mandatory one for medical image enhancement since medical images often suffer from poor contrast and poor visual quality. Here contrast improvements are needed to highlight the lower intensity areas, edge sharpening is needed for such areas where edges are not clearly visible [1]. Before this, noise removal techniques should be applied to remove noise if present in the image, so that they will not be amplified during contrast enhancement [2]. In the case of color medical image enhancement, little research has been done. While doing color image enhancement, some important things should be noted, of which color space selection is among the most important. In the remaining portion of this chapter, in Section 1.2, a discussion on medical color image enhancement and its problems and challenges are discussed. Dibya Jyoti Bora, School of Computing Sciences, Assam Kaziranga University, Jorhat (Assam)785006, [email protected] https://doi.org/10.1515/9783110552072-001
2 | 1 Medical color image enhancement: Problems, challenges & recent techniques
A review of literature is presented in Section 1.3. In Section 1.4, a study on recent techniques for color image enhancement is provided. Section 1.5 presents a flowchart of the proposed approach. Experiments for comparative research are discussed in Section 1.6. Finally, in Section 1.7, suggestions are put forward on how to develop an efficient color image enhancement technique.
1.2 Medical color image enhancement To improve the visual quality, resolution, and accuracy of the original image, we need an enhancement technique for medical color images. Enhancement helps to get accurate results in the further processing of the medical diagnosis process. Depending on the type of the image, enhancement techniques can be either gray image enhancement techniques or color image enhancement techniques. Gray image techniques involve comparatively less complexity than color image enhancement techniques. In this chapter, our focus will be on color image enhancement. These technique sare mainly classified into two domains: the spatial or frequency domain. In the case of the spatial domain, enhancement techniques operate directly on pixels, while in case of the frequency domain, the techniques are applied to the Fourier transform of the image. Both of these two domains are of equally effective although many popular enhancement techniques are within the spatial domain.
1.2.1 Why enhancement is needed for medical images Medical images often suffer from low contrast issues. Contrast is the factor by which we can recognize an object present in an image separately from its background. This property is totally based on human perception [3]. Contrast, C can be defined as: (G x − G y ) C= (1.1) (G x + G y ) where G x and G y are the mean gray levels of the two regions for which contrast needs to be calculated. This is known as relative brightness contrast [4]. This definition is going to be used in this chapter. For many reasons, the contrast is degraded and this occurs during the image acquisition process. In any diagnosis process, physicians want a good contrast image so that different cells and nuclei are easily distinguishable in the image. A good contrast image generally refers to an increase in the range distribution of the pixels within the allowable range [5]. While low contrast means the pixel distribution is confined to a very small range. The use of fewer sensors during the image acquisition process and poor illumination imply a poor contrast image. These poor contrast images do not help in the further analysis process like segmentation and makes it difficult for physicians to distinguish different objects present in the image. So, the contrast of such images should be improved before proceeding to further
1.2 Medical color image enhancement | 3
analysis processes. In the case of medical color images, this is more important than for gray ones, as poor contrast may lead to misclassification of colors and indirectly fuels inaccurate diagnosis results. For examples, staining images are the most commonly used images for medical diagnosis of different serious diseases. HE staining is one of such type staining technique regarded as the gold standard in histology. In this chapter, we consider HE stain images for the experimental evaluations of different enhancement techniques. HE-stained images are popular and very widely used medical color images. These images contribute to timely discovery of many serious diseases including cancer etc. They are the input for histological image segmentation techniques which helps an expert to make the analysis process more objective and less time consuming for detecting specific neoplastic regions in cancer diagnosis processes [5]. But, this can be expected only when the input HE stain color images are of good quality. So, contrast improvement is one of the mandatory steps that should be included in every color medical imaging to make the results more accurate and less time consuming for the diagnosis process.
1.2.2 Challenges in medical color image enhancement Although image enhancement makes the diagnosis process easy and accurate it is not an easy task to improve the quality of color medical images. This involves many challenges for a color medical image enhancement technique to be efficient. Two major challenges that every such technique have to face are mentioned below: 1. Selection of a proper color space for the color computations that will be involved in the concerned technique. This is very important, as our main aim is to improve the contrast but not to change the hue and saturation values. As otherwise, alteration of hue values may result in color misclassification and thereby lead to the wrong identification or misinterpretation of objects for which the diagnosis is conducted. 2. Secondly and most importantly, while dealing with medical images, we have to consider the vague nature of such images. Most medical images are fuzzy in nature and boundaries surrounding the region of interest are very hard to be identify. So, traditional algorithms almost always fail to deal with this fuzzy behavior of medical images.
1.2.3 Solutions for these challenges 1.
When developing a technique for medical color image enhancement, the first thing we need is to consider a proper color space [6]. Color space is a mathematical model describing the organization of different colors of an image. Usually, color space characterizes different color attributes with respect to three or more components that facilitate learning of what each color spectrum looks like [7, 8]. Red, green and blus (RGB) is the default color space in most cases. But, RGB color
4 | 1 Medical color image enhancement: Problems, challenges & recent techniques
2.
space is not suitable for color image enhancement purpose as here hue values are associated with each of the three channels R(Red), Green(G) and Blue(B). There is no one particular channel which is devoted to luminance value adjustment. And for an efficient result, we have to employ the enhancement task to the intensity channel and in this way the hue values will not be affected. So, it is recommended that instead of using RGB color space, to use an other color space like LAB, HSV etc. where there is a devoted intensity channel(L channel in the case of LAB color space and V channel in the case of HSV color space). Since the traditional techniques are not able to deal with the fuzzy behavior of medical images we try to adopt a fuzzy-based method while developing a technique for medical color image enhancement. Advanced fuzzy based techniques like the intuitionistic fuzzy set, type-2 fuzzy set have proven to be more beneficial in this case.
1.3 Literature review In this section, a review of a few noteworthy contributions in the field of color image enhancements is presented. Gu et al. [1], suggested a new enhancement technique for medical color image enhancement by Y-H(Young–Helmholtz) transformation with the adaptive equalization of intensity numbers matrix histogram. The contrast is enhanced by adaptive histogram equalization and in that way suppresses the noises present in the original image. The transformation from Y-H to RGB is carried out to show the enhanced color image without affecting hue and saturation values. Experiments prove that the proposed technique is well suitable for diagnosis of medical images as it carries low computational complexity. Lin et al. [9] put forward a novel method FACE (Fuzzy Automatic Contrast Enhancement). FACE first performs fuzzy clustering to segment the input image. The pixels with similar colors in the CIELAB color space are then classified into similar clusters with smaller characteristics. Hsu et al. [10] introduced a medical image enhancement technique based on a modified color histogram equalization where they focused on hue preservation while doing the enhancement. For this, they employed two methods. The first one tries to obtain an equalized color image preserving hue by using the ratio of the original grayscale image and the equalized image. The other one achieves hue preservation by applying the difference between the original gray version of the image and the equalized one, thus obtaining the final equalized color image. Khatkar et al. [11] presented a method to enhance biomedical images on the basis of a combination of wavelets. They first applied the SIFT(Scale Invariant Feature Transforms) algorithm on the image, then the first wavelet D’Mayer is applied. After that, the image is extracted and the second wavelet Coieflet is applied that produces
1.3 Literature review
|
5
the enhanced version. The results are compared with other wavelets with different metrics like PSNR (Peak signal to noise ratio) and Beta coefficient and it is found that the proposed method succeeds in producing better results than other state of the art techniques. Kaur et al. [12] performed a comparative study on different histogram equalization based techniques like Histogram Equalization (HE), Local Histogram Equalization(LHE), Adaptive Histogram Equalization(AHE) and Contrast Limited Adaptive Histogram Equalization(CLAHE) for MRI image enhancement. They have employed different metrics for this comparative study. What they found can be briefly stated as follows: HE although simple and effective enhancement technique but it changes the brightness of the image. CLAHE is better than LHE with respect to the time involved during the process. CLAHE may be recommended for the contrast improvement but still has some tendency to noise. A dualistic sub-image histogram equalization based enhancement and segmentation techniques for medical images is proposed by Mohan et al. in [13]. In short, the proposed technique can be illustrated as follows: first, make the histogram of the original image and calculate the median of it. Then divide the histogram on the median. Each partition is equalized independently using a probability distribution function and cumulative distribution function. The two equalized sub-images are composed into one image to obtain the enhancement result of the dualistic sub-image histogram. After the enhancement process, the authors segment the enhanced image using a proposed segmentation technique based on directional homogeneity using the modified metric of D. Chaudhuri [14]. They consider eight directions, as eight directions are sufficient to provide efficient and accurate extraction of brain image pixels. Then, after the segmentation process, the holes in the segmented image that generally arise due to noise are removed by the inversion technique. Finally, a technique for minor branch removal is carried out where the regions connected to the original image with lengths less than a threshold value are removed. The authors tested the proposed method on several medical images and proved the efficiency of the same by comparing the results with a hierarchical grouping technique through different performance measures such as completeness and clearness. Stephanakis et al. [15] proposed a novel model for fuzzy equalization of medical images where a preprocessing step is carried out to accomplish a fuzzy segmentation of the image into N segments with overlapping fuzzy boundaries. Then, histogram equalization is performed according to individual equalization functions derived from each segment of the image. Here, a fuzzy membership function is connected with each segment. The authors claim through experiments that the proposed technique performs better than the global histogram equalization technique. Puniani et al. [16] proposed an improved fuzzy image enhancement technique using L*a*b* color space and edge preservation. They stretched the L component and evaluate the fuzzy membership value on it, thereby preserving the chromatic information a and b. An edge-preserving smoothing is also integrated with the fuzzy image
6 | 1 Medical color image enhancement: Problems, challenges & recent techniques
enhancement to preserve the edges. The experimental results prove that the proposed technique outperforms the existing techniques like histogram equalization, adaptive histogram equalization, and fuzzy based enhancement. Zhao et al. [17] proposed a medical image enhancement technique for suppressing background noise where they embed the PLIP multiplication into the unsharp masking framework. The results are found to be efficient with respect to medical images. So, we have seen from the review of literature that there are a number of techniques for medical color image enhancement. Also, the technical bases of these techniques are different from each other. Hence, for the researchers, while choosing a particular base or domain for developing an enhancement technique for medical images, they should have an idea about which base will be a beneficial one and offer a better enhancement result. So, we have selected only the most efficient recent techniques and introduce them in Section 1.4. The methodologies involved in these techniques are thoroughly discussed. Our proposed technique for medical color image enhancement is also presented in this section. Then, for a clear illustration, a flowchart of the proposed technique is presented in Section 1.5. After that, in the experimental section, a comparative study is done among these recent techniques to find an optimal one for this problem.
1.4 Recent techniques for color image enhancement In this section, a few recent state of the art techniques for medical color image enhancement proposed by some prominent researchers in this area are selected and explained. Although there are many state of the art algorithms, we have picked only four different techniques from four different domains covering: classical color histogram equalization based, adaptive equalization of intensity numbers matrix histogram based, fuzzy logic based, and type-2 fuzzy set based. As all these four domains are frequently adopted for color medical image enhancement, so, a comparative study on them are also presented in the experiment section taking HE stain as color medical image data. Hsu et al. [10] proposed a medical image enhancement technique based on color histogram equalization. The proposed technique is trying to deal with a major drawback of histogram equalization. Although histogram equalization is classical and the most adopted technique for image enhancement, it has the drawback that it does not consider the hue preservation which is very important in the case of color medical image enhancement. Hue refers to colors (e.g., blue and red) that can be perceived by the human eye in the course of light irradiation at different wavelengths [10]. The steps involved in the proposed approach [10] can be simply stated as follows: Step 1: The color image is first converted to grayscale. Step 2: The values of the probability density function (pdf) and cumulative distribution function (CDF) are calculated from the converted grayscale image.
1.4 Recent techniques for color image enhancement |
7
Step 3: Perform grayscale histogram equalization using pdf and cdf calculated at step 2. Step 4: This is the most important step and deals with hue preservation. The authors employed two methods for this purpose. The first one tries to obtain an equalized color image by using the ratio of the original grayscale image and the equalized image. The second one applies the difference between the original grayscale image and the equalized image to obtain an equalized color image that achieves hue preservation. Step 5: Output the equalized enhanced color image. They have conducted experiments on retina and prostate cancer images to assess the effectiveness of various image contrast enhancement methods. The results obtained are verified with MSE and PSNR value evaluations. The results are found to be superior to the those produced by the traditional histogram equalization method Gu et al. [1] introduced a new enhancement technique for medical color images. This technique is developed by combining the Young–Helmholtz (Y-H) transformation with the adaptive equalization of intensity numbers matrix histogram. They employed adaptive histogram equalization to reinforce the details, increase the contrast, and curb the noise of the original image effectively. Then inversion Y-H transformation is done to show the enhanced image on Y-H color space. The hue-saturation values are not altered. The proposed method found to be efficient and computationally less complex and hence can be recommended for utilizing as a preprocessing technique in the computerized medical diagnosis process. The idea of an intensity numbers matrix histogram of a color image was first put forward by Hua et al. in [18]. They also confirmed the actual range of intensity as 0−√(2552 − 3). After rounding off, this range becomes: 0−442. The intensity numbers matrix histogram of the color image can be expressed as the following discrete function [1, 19]: g(K) = m K ,
K = 0, 1, . . . , 442
(1.2)
here, m K is the number of the pixel with the intensity level K in f(x, y). The authors deduced the following mathematical expression for adaptive intensity numbers matrix histogram with complete detail enhancement: σ2
{ T(x i,j ) + k ( i,j2 − 1) (x i,j − m i,j ) , if 0 ≤ x i,j ≤ 442 σn (1.3) xi,j = { T(x ) , Otherwise i,j { where m i,j is the neighborhood average value of the window centered with, T is its transformation function. k(x i,j −m i,j ) plays the role of enhancing local intensity levels, the k coefficient is used to amplify the value of details and reduce background noise, σ 2 i,j , σ 2n are the intensity level, the variance of window W and the noise variance of whole image intensity level, respectively. The results of the proposed technique are verified with both subjective and objective evaluations and found to be quite promising for color medical image enhancement.
8 | 1 Medical color image enhancement: Problems, challenges & recent techniques
Another very good contribution to medical color image enhancement is by Raju et al. [20]. In this paper, they have introduced a fuzzy logic and histogram-based technique for enhancement of low contrast color images. This technique is based on two parameters: M and K, where M is the average intensity value of the image, calculated from the histogram and K is the contrast intensification parameter. First the RGB image is converted into HSV color space. M is calculated using the following equation (1.4): ∑ Xh(X) M= X (1.4) ∑X h(X) where h(X) indicates the number of pixels in the image with intensity value X. Here the histogram is the histogram of the V-channel of the HSV converted image. M divides the histogram h(X) into two classes C1 and C2. Where C1 contains pixels values in the range [0, M − 1] and C2 in the range [M, 255]. The V-channel is stretched with two fuzzy membership values μ(D1) and μ(D2) calculated for C1 and C2 class of pixels respectively. 1 − (M − X) (1.5) μ(D1) = M where X ∈ C1. The above membership equation stands for the fuzzy rule [20]: “If the difference between x and M is LARGE then the intensity of stretching should be SMALL”. The contrast-enhanced or intensified value for this class C1 is obtained using the following equation: (1.6) Xe = X + μ D1 (X) ⋅ K Then the other membership function is calculated using the following equation (1.7): μ D2 (X) =
E−X E−M
(1.7)
where E is the extreme value i.e., E = 255 for eight bits. The above equation stands for fuzzy rule [20]:” If the difference between x and E is LARGE then the intensity of stretching should be LARGE.”. The contrast-enhanced using this membership function can be obtained using the following equation: Xe = (Xμ D2 (X)) + (E − μ D2 (X)K)
(1.8)
So, finally, old x values of the V-channel are replaced by the enhanced values to obtain an enhanced HSV version. The conversion to RGB is done to obtain the final version of the enhanced color image. Results for this technique are claimed to be superior to the other state of the art algorithms through proper experiments. But, the only drawback of this method is that it can be applied only to low contrast and low bright color images [20]. In intuitionistic fuzzy (IF) set based enhancement, both membership and nonmembership values of an IF image are required to be determined. Some intuitionistic based enhancement techniques for medical image enhancement techniques can be found in [21–23] and [24]. These techniques are good for medical image enhance-
1.4 Recent techniques for color image enhancement | 9
ment, but involve more complexity than its counterpart techniques. The author also proved through her proposed type-2 fuzzy set-based technique that its enhancement ability is better than that of the intuitionistic fuzzy (IF) set based technique. So, we focus on the type-2 based enhancement technique. In [25], we have introduced a new efficient type-2 fuzzy set based approach for color medical image enhancement. The steps involved in the proposed approach are illustrated below: Step 1. Input a poor contrast medical color image. The RGB image is converted into a LAB one and the L- channel is extracted from the converted image. The L-channel is used for luminance level measurement. Step 2. The L-channel of the LAB converted image is then forwarded to Type-2_ Fuzzy_Enhnc() to deal with the vagueness and thereby improving the contrast. In this proposed type-2 based fuzzy technique, two membership functions are involved. The second membership function is especially good for dealing the fuzziness involved in the first membership function. A definition for the type-2 fuzzy set can be put forward as follows [26–29]: ⌢
ATYPEII = {x, μ A (x, u) | ∀x ∈ X, ∀u ∈ J x ⊆ [0, 1]}
(1.9)
⌢
where μ A (x, μ) is the Type II membership function, Jx is the primary membership function of x and The upper and lower limits are defined as: μupper = [J x ]α μ lower = [J x ]1/α
(1.10)
where 0 distmax then index = i; distmax = dist;
9: if distmax > ϵ then 10: 11: 12:
r1 = RDP(points(:, 1 : index), ϵ); r2 = RDP(points(:, index : last), ϵ); r = [r1(:, 1 : length(r1) − 1) r2(:, 1 : length(r2))];
13: else 14:
r = [p(:, 1)
p(:, last)];
3.3 Proposed unsupervised framework | 53
3.3 Proposed unsupervised framework In general, coronary artery detection and segmentation represent the first and second stages for different medical information systems. The proposed unsupervised framework is useful for the shape representation of the coronary arteries as a graph-based approach. This framework consists of three steps for automatic vessel detection, segmentation and parametric graph-based representation of vessels in X-ray angiograms. In the first step, Gaussian matched filters trained by UMDAc strategy are used to detect blood vessels in the angiograms. For this end, the tuning of the GMF parameters is defined as a numeric optimization problem and the UMDAc algorithm is used to solve it. The use of UMDAc instead of the classical UMDA using binary encoding allows performing a better exploration of the search space of the three GMF parameters, since the individuals are not discretized, which is a great advantage to obtain a high detection rate. Each individual is a set of parameters x = {T, L, σ} to be used by the GMF filter. The parameter κ is set to 12 orientations for all the individuals. For the UMDAc algorithm, each parameter T, L and σ is modelled as an independent feature. The evaluation of the population consists of the filtering of the input images using the GMF filter with the parameters of each individual, and the detection performance is assesed in terms of the area (Az ) under the receiver operating characteristic (ROC) curve [34, 35]. Because the objective function is the maximization of the Az , the evolutionary process performed by UMDAc is able to find the fittest parameters for the GMF filter as xbest = {Tbest , Lbest , σ best }. The second step consisting of the binary classification (vessel and nonvessel pixels) of the gray-scale filter response is evaluated using the accuracy metric as follows: Accuracy =
TP + TN , TP + FP + TN + FN
(3.4)
where TP and TN are the correct classified pixels as such by the computational method and FP and FN are the incorrect classified pixels. In the final step, a parametric line simplification method is applied to determine the optimal nodes over the skeleton of a segmented coronary artery tree. This step is evaluated in terms of computational time and by the data compression ratio (CR) as follows: Compressed data CR = 1 − , (3.5) Uncompressed data where compressed data is the number of interest points obtained from the parametric line simplification method and uncompressed data represents the number of pixels of the automatic segmented coronary artery skeleton.
54 | 3 Unsupervised graph-based approach for coronary arteries
3.4 Computational experiments In this section, the steps of vessel segmentation and the graph-based representation approach are analyzed and discussed in different sections. The computational experiments were performed on a computer with an Intel Core i3, 2.13 GHz processor, and 4 GB of RAM using the Matlab software version 2016b. The database used in the present work consists of 80 (8-bits) X-ray coronary angiograms of size 300 × 300 pixels. Each angiogram was outlined by a specialist and ethics approval was provided by the Mexican Social Security Institute. The training set consists of 40 angiograms and the testing set with the remaining 40 angiograms.
3.4.1 Results of blood vessel detection and segmentation As part of the unsupervised framework for the graph-based representation of coronary arteries in X-ray angiograms, the vessel detection and segmentation steps are analyzed below. Due to the design of the Gaussian matched filters, an optimization strategy for selecting an optimal set of parameters (L, T, κ, σ) is required. This optimization stage was carried out by using UMDAc on the predefined search space: L = {8, 9, . . . , 15}, T = {8, 9, . . . , 15}, σ = [1, 5], keeping constant the number of oriented filters as κ = 12. The optimal set of parameters was obtained using the training set of 40 images and the Az value of the ROC curve. Since the UMDAc strategy only requires two parameters, the population was set as 30 individuals and the crossover rate was set as 0.7. The set of parameters for the GMF method was established as L = 15, T = 15, and σ = 2.4 for further analysis. In Table 3.1, the proposed strategy (GMF + UMDAc ) for vessel detection is compared in terms of the Az value using the test set of images. This comparative analysis was performed with two state-of-the-art GMF-based methods. In this comparison, the proposed strategy outperforms the comparative methods, which is useful to obtain suitable results in the following step involving vessel segmentation. In order to visualize the vessel detection results obtained from the proposed strategy, in Figure 3.7 a subset of X-ray angiograms along with the Gaussian filter response is presented. These gray-scale response images present the problem of the nonuniTab. 3.1: Performance analysis between two GMF-based techniques and the proposed method using the test set of angiograms Method
Az value
Proposed method Chaudhuri et al. Kang et al.
0.926 0.905 0.887
3.4 Computational experiments |
55
Fig. 3.7: First row: subset of X-ray coronary angiograms. Second row: hand-labeled images of angiograms in first row. Last row: Gaussian response from the images of the first row using the proposed method
form illumination and weak contrast. The vessel-like structures are detected while the background image is set to a uniform intensity value. In the vessel segmentation step, the gray-scale intensities of the Gaussian response must be classified into two categories; vessel-pixel and nonvessel-pixel. Consequently, five state-of-the-art thresholding methods are compared in this stage. In Table 3.2, the thresholding comparative analysis in terms of segmentation accuracy is introduced. This comparison presents that the classical method proposed by Otsu [20] is the most appropriate for the classification of vessel pixels from the Gaussian filter response. In order to illustrate the vessel segmentation step, in Figure 3.8, a subset of segmentation results by applying the proposed method is introduced. Tab. 3.2: Performance analysis of five state-of-the-art thresholding methods using the Gaussian response image on the test set Method
Accuracy
Otsu [20] Ridler and Calvard [18] Pal and Pal [17] Rosenfeld and De la Torre [19] Kapur et al. [16]
0.923 0.920 0.915 0.910 0.892
56 | 3 Unsupervised graph-based approach for coronary arteries
Fig. 3.8: First row: subset of X-ray coronary angiograms. Second row: hand-labeled images of angiograms in first row. Last row: Segmentation of vessel-like structures by using the GMF optimized by UMDAc for detection and the inter-class variance method for classification
3.4.2 Graph-based representation of coronary arteries The final stage of the proposed automatic framework, consists of the graph-based representation of the segmented coronary artery tree. In this step, the RDP method is used to reduce the number of pixels of the coronary artery skeleton. Since the RDP method is a parametric strategy, in Table 3.3 the performance of different values of the epsilon parameter is presented. The data compression ratio shows that the optimal epsilon value is set as ϵ = 4 for the test set of angiograms. Finally, before of applying the line simplification step, the skeletonization of segmented vessels has to be performed, which is illustrated in Figure 3.9. The results of Tab. 3.3: Performance analysis for different values of the RDP method using the segmented angiograms Epsilon value
CR
time (in seconds)
ϵ=5 ϵ=4 ϵ=3 ϵ=2 ϵ=1
0.931 0.954 0.951 0.942 0.911
0.0085 0.0311 0.0307 0.0318 0.0261
3.4 Computational experiments | 57
Fig. 3.9: Vessel detection, segmentation, and skeletonization results of the proposed unsupervised framework
58 | 3 Unsupervised graph-based approach for coronary arteries
Fig. 3.10: First row: subset of X-ray coronary angiograms. Second row: skeletons of segmented arteries. Third row: Graph representation on the skeleton artery. Last row: Graph-based approach of the angiograms in first row using the proposed unsupervised framework
the graph-based representation stage are illustrated in Figure 3.10. The main advantage of the graph-based representation stage is the compression ratio CR, which has obtained good performance achieving suitable results in shape preservation while reducing the number of points to represent the segmented vessels. In addition, to illustrate the RDP method over an input angiogram, in Figure 3.11, the data compression result is presented.
3.5 Concluding remarks
|
59
Fig. 3.11: Results of the vessel skeleton reduction using the Ramer–Douglas–Peucker algoritm
3.5 Concluding remarks In this chapter, a novel unsupervised framework for the segmentation and graphbased representation of coronary arteries in X-ray angiograms has been proposed. In the first stage, the automatic vessel detection was performed using Gaussian matched filters trained by a population-based optimization strategy (UMDAc ). This method was compared with two state-of-the-art GMF-based methods achieving superior performance in terms of area under the ROC curve (Az = 0.926). In the second step, a comparative analysis of five thresholding methods to segment the gray-scale Gaussian filter response was performed. The inter-class variance strategy obtained a segmentation accuracy of 0.923 on a test set of 40 angiograms. In the final step of the proposed framework, the line simplification method proposed by Ramer–Douglas–Peucker was used to reduce the number of skeleton pixels and to obtain a graphic representation of the coronary artery tree. This stage was evaluated taking into account the data compression ratio, where the best result was of 0.954 in 0.0311 seconds. Moreover, the experimental results suggest that the proposed framework can be suitable for computeraided diagnosis and useful for further purposes in medical image analysis such as vessel width estimation, image registration and 3-D reconstruction. Acknowledgment: The present research has been supported by the National Council of Science and Technology of Mexico under the Project Catedras-CONACYT (No. 31503097).
60 | 3 Unsupervised graph-based approach for coronary arteries
Appendix A. Ramer–Douglas–Peucker The source code in Matlab language to perform the Ramer–Douglas–Peucker (RDP) algorithm over X-ray angiograms is presented below. This Matlab code is used to obtain the graph-based representation of segmented coronary arteries. The function main is used to execute the RDP algorithm. This function requires a binary image of the skeleton of coronary arteries. On the other hand, the function simplify_artery_RDP divides the arteries in branches to be individually processed by the parametric_RDP function. Finally, the function p_to_line_distance is used to compute the distance between two points. function [] = main() [file,absPath]= uigetfile... ('*.tif;*.bmp;*.png;*.jpeg;*.jpg;*.gif','Images files'); im_name=[absPath file]; im=imread(im_name); % A binary image with the artery skeleton [result] = simplify_artery_RDP(im,2); figure(1) clf imshow(im); hold on for i=1:length(result) res=result{i}; plot(res(2,:),res(1,:),'--o','LineWidth',0.4,'MarkerSize'... ,2,'MarkerEdgeColor','r','MarkerFaceColor','r'); hold on end end
3.5 Concluding remarks
|
function [result] = simplify_artery_RDP(im_sekeleton,epsilon) [a, b]=size(im_sekeleton); im1=zeros(a, b); stre=strel('square',3); [x_branch, y_branch] = find(bwmorph(im_sekeleton,'branchpoints')); for i=1:length(x_branch) im1(x_branch(i),y_branch(i))=1; end [im2,num]=bwlabeln(im_sekeleton.*imcomplement(imdilate(im1,stre))); result = cell(num,1); counter=0; for i=1:num counter=counter+1; im3=zeros(a, b); [x_end, y_end] = find(bwmorph(im2==i,'endpoints')); for j=1:length(x_end) im3(x_end(j),y_end(j))=1; end im3=imdilate(im3,stre); im3=im3.*im_sekeleton; im3=im3+(im2==i)+im1; im4=bwareafilt(logical(im3),1); im5=im4.*im1; [x, y]=find(im5,1); try contour = bwtraceboundary(im4,[x y],'N',8,Inf); catch contour = bwtraceboundary(im4,[x y],'N',8,Inf,... 'counterclockwise'); end contour = unique(contour,'rows','stable'); res = parametric_RDP(contour',epsilon); result{counter}=res; end end
61
62 | 3 Unsupervised graph-based approach for coronary arteries
function [ result ] = parametric_RDP(p,epsilon) max_distance=0; index=0; last=length(p); for i=2:last-1 dist= p_to_line_distance(p(:,i), p(:,1), p(:,last)); if dist>max_distance index=i; max_distance=dist; end end if max_distance > epsilon res1= parametric_RDP(p(:,1:index),epsilon); res2= parametric_RDP(p(:,index:last),epsilon); result=[res1(:,1:length(res1)-1) res2(:,1:length(res2))]; else result=[p(:,1) p(:,last)]; end end function [distance] = p_to_line_distance(point, p1r, p2r) distance=... abs(((p2r(2,1)-p1r(2,1))*point(1,1))+((p1r(1,1)-p2r(1,1))... *point(2,1))+(p2r(1,1)*p1r(2,1)-p2r(2,1)*p1r(1,1)))... /sqrt((p2r(2,1)-p1r(2,1))^2+(p2r(1,1)-p1r(1,1))^2); end
References | 63
References [1]
[2]
[3]
[4]
[5] [6]
[7]
[8] [9]
[10]
[11] [12]
[13]
[14]
[15] [16]
[17]
S. Chaudhuri, S. Chatterjee, N. Katz, M. Nelson, and M. Goldbaum. Detection of blood vessels in retinal images using two-dimensional matched filters. IEEE Trans. Med. Imaging, 8:263– 269, 1989. W. Kang, K. Wang, W. Cheng, and W. Kang. Segmentation method based on fusion algorithm for coronary angiograms. In 2nd International Congress on Image and Signal Processing, pp. 1–4, 2009. W. Kang, W. Kang, Y. Li, and Q. Wang. The segmentation method of degree-based fusion algorithm for coronary angiograms. In 2nd International Conference on Measurement, Information and Control, pp. 696–699, 2013. T. Chanwimaluang and G. Fan. An efficient blood vessel detection algorithm for retinal images using local entropy thresholding. Proc. IEEE International Symposium on Circuits and Systems, 5:21–24, May 2003. T. Chanwimaluang, G. Fan, and S. R. Fransen. Hybrid retinal image registration. IEEE Transactions on Information Technology in Biomedicine, 10(1):129–142, 1 2006. M. G. Cinsdikici and D. Aydin. Detection of blood vessels in ophthalmoscope images using MF/ant (matched filters and ant colony) algorithm. Computer methods and programs in biomedicine, 96:85–96, 2009. M. Al-Rawi and H. Karajeh. Genetic algorithm matched filter optimization for automated detection of blood vessels from digital retinal images. Computer methods and programs in biomedicine, 87:248–253, 2007. M. Al-Rawi, M. Qutaishat, and M. Arrar. An improved matched filter for blood vessel detection of digital retinal images. Computers in Biology and Medicine, 37(2):262–267, 2007. I. Cruz-Aceves, A. Hernandez-Aguirre, and I. Valdez-Pena. Automatic coronary artery segmentation based on matched filters and estimation of distribution algorithms. In Proceedings of the 2015 International Conference on Image Processing, Computer Vision, Pattern Recognition, pp. 405–410, 2015. I. Cruz-Aceves, A. Hernandez-Aguirre, and S. Ivvan-Valdez. On the performance of nature inspired algorithms for the automatic segmentation of coronary arteries using gaussian matched filters. Applied Soft Computing, 46:665–676, 2016. M. Hauschild and M. Pelikan. An introduction and survey of estimation of distribution algorithms. Swarm and Evolutionary Computation, 1(3):111–128, 2011. I. Cruz-Aceves, J. G. Avina-Cervantes, J. M. Lopez-Hernandez, et al. Automatic image segmentation using active contours with univariate marginal distribution. Mathematical Problems in Engineering, ID 419018:1–9, 2013. F. Cervantes-Sanchez, I. Cruz-Aceves, A. Hernandez-Aguirre, and J. G. Aviña-Cervantes. Segmentation of coronary angiograms using gabor filters and boltzmann univariate marginal distribution algorithm. Computational Intelligence and Neuroscience, 2016:1–9, 2016. J. de Jesus Guerrero-Turrubiates, I. Cruz-Aceves, S. Ledesma, et al. Fast parabola detection using estimation of distribution algorithms. Computational and Mathematical Methods in Medicine, ID 6494390:1–13, 2017. M. Sezgin and B. Sankur. Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging, 13(1):146–165, 1 2004. J. N. Kapur, P. K. Sahoo, and A. K. C. Wong. A new method for gray-level picture thresholding using the entropy of the histogram. Computer Vision, Graphics, and Image Processing, 29:273– 285, 1985. N. R. Pal and S. K. Pal. Entropic thresholding. Signal Processing, 16:97–108, 1989.
64 | 3 Unsupervised graph-based approach for coronary arteries
[18] T. W. Ridler and S. Calvard. Picture thresholding using an iterative selection method. IEEE Transactions on Systems, Man and Cybernetics, 8:630–632, 8 1978. [19] A. Rosenfeld and P. D. la Torre. Histogram concavity analysis as an aid in threshold selection. IEEE Transactions on Systems, Man and Cybernetics, 13:231–235, 1983. [20] M. Otsu. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man and Cybernetics, 9:62–66, 1979. [21] U. Ramer. An iterative procedure for the polygonal approximation of plane curves. Computer Graphics and Image Processing, 1(3):244–256, 1972. [22] D. Douglas and T. Peucker. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. The Canadian Cartographer, 10(2):112–122, 1973. [23] A. Salhi, B. Minaoui, and M. Fakir. Robust Automatic Traffic Signs Detection using fast polygonal approximation of digital curves. In International Conference on Multimedia Computing and Systems -Proceedings, vol. 1, pp. 433–437, 2014. [24] J. Muckell, J.-H. Hwang, C. T. Lawson, and S. S. Ravi. Algorithms for compressing GPS trajectory data. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS ’10, p. 402, 2010. [25] H. Neidhart and M. Sester. Extraction of building ground plans from LiDAR data. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 37(Part 2):405–410, 2008. [26] P. Gil, J. Pomares, S. Diaz, C. Puente, F. Candelas, and F. Torres. Flexible multi-sensorial system for automatic disassembly using cooperative robots. International Journal of Computer Integrated Manufacturing, 20(8):757–772, 2007. [27] C. Lupascu and D. Tegolo. Graph-based minimal path tracking in the skeleton of the retinal vascular network. In 25th International Symposium on Computer-Based Medical Systems (CBMS), pp. 1–6, 2012. [28] M. Khakzar and H. Pourghassem. A retinal image authentication framework based on a graphbased representation algorithm in a two-stage matching structure. Biocybernetics and Biomedical Engineering, 37:742–759, 2017. [29] A. Reyes-Figueroa, J. Camacho-Gutierrez, F. Cervantes-Sanchez, S. Salazar-Colores, and I. CruzAceves. An automatic framework for the graph-based representation of coronary arteries. In International Conference on Applied Electronics (ICApplE-2017), pp. 1–5, 2017. [30] H. Mühlenbein and G. Paaß. From recombination of genes to the estimation of distributions I. binary parameters. Parallel Problem Solving from Nature, pp. 178–187, 1996. [31] P. Larrañaga and J. Lozano. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer, Boston, MA, 2002. [32] M. Pelikan, D. Goldberg, and F. Lobo. A survey of optimization by building and using probabilistic models. Computational Optimization and Applications, 21:5–20, 2002. [33] P. S. Heckbert and M. Garland. Survey of polygonal surface simplification algorithms. Multiresolution Surface Modeling Course SIGGRAPH, 97:1–31, 1997. [34] D. A. Turner. An intuitive approach to receiver operating characteristic curve analysis. Journal of Nuclear Medicine, 19(2):213–220, 1979. [35] C. E. Metz. Basic principles of ROC analysis. Seminars of Nuclear Medicine, 8(4):283–298, 10 1978.
Rik Das*, Sourav De, and S. N. Singh
4 A study of recent trends in content based image classification Abstract: Feature vector extraction has represented a deeply ingrained component for grouping images into meaningful categories. Two novel diversities of the feature extraction technique, using global, local and mean threshold selection for feature extraction are introduced using eight different methods. The techniques are implemented on even and odd variations of generic images for feature extraction. Quantitative comparison of the proposed techniques is made with reference to existing methods of feature extraction using even and odd images. The misclassification rate (MR) and F1 score are considered as parameters of comparison. The comparisons revealed the superior performances of the proposed techniques with respect to existing methods. The superiority of the proposed techniques is further established especially for methods adopting local threshold approaches. Keywords: Feature Extraction; Image Variance; Global Threshold; Adaptive Threshold; Even and Odd Image; Classifier
4.1 Introduction Feature extraction by binarization of images is facilitated by the selection of an appropriate threshold for intensity images. Various approaches of content based image classification have considerably differentiated the gray levels of the object pixels from their backgrounds [1]. Thresholding has become a simple and efficient tool in such situations for distinguishing the object of interest from its background. Applying binarization of images as a precursor for feature extraction for content based image classification decreases the computational load for the entire application [2–4]. Different techniques for determining the threshold for binarization have been enlisted in the literature for graphic image and document image binarization [5–12]. Feature extraction with image binarization has its breadth of applications for document analysis, optical character recognition (OCR), scene matching, medical imaging, etc. Conventional techniques of feature extraction yields huge feature dimensions that increase the convergence time and computational overhead. In this paper, we have attempted to design feature vectors with reduced dimensions and increased robustness for minimized convergence time and enhanced classification accuracy. *Corresponding author: Rik Das, Dept. of Information Technology, Xavier Institute of Social Service (XISS), Ranchi, Jharkhand Sourav De, Dept. of Computer Science and Engineering, Cooch Behar Government Engineering College, Cooch Behar, West Bengal S. N. Singh, Dept. of Information Technology, Xavier Institute of Social Service (XISS), Ranchi, Jharkhand https://doi.org/10.1515/9783110552072-004
66 | 4 A study of recent trends in content based image classification
4.2 Related work This work presents a simple but effective modification of the technique proposed in [13]. The feature extraction techniques proposed in [13] involved the even and odd variations of generic images for thresholding using the mean threshold method. One limitation of [13] is not considering image variance as a factor to facilitate the feature extraction process. Modification of [13] by applying global and local threshold techniques based on image variance has outclassed the previous results. Traditionally Otsu’s method of global threshold selection is based on image variance [12]. Adaptive threshold techniques including Niblack [10], Sauvola [7] and Bernsen [11] have considered image variance and contrast as factors for image binarization. The proposed techniques have implemented the methods of image variance based threshold calculations to binarize the generic even and odd variations of images for feature extraction. The techniques have been compared with the existing scheme presented in [13] to prove the efficiency of the novel methods for content based image classification.
4.3 Methodology The proposed methodologies are divided primarily into two different parts. The first part is the generation of even and odd images and the second part is the calculation of threshold for binarization as explained in the following subsections.
4.3.1 Generation of even and odd images A generic image is flipped horizontally across the X and Y axis. The flipped image is added to the generic image to create an even image. The odd image is created by subtracting the flipped image from the generic image. The flipped version of the same image is used in order to explore the option of extracting more information from the same given data. 4.3.1.1 Threshold calculation with Otsu’s method (Global threshold) Otsu’s algorithm considers two classes of pixels for thresholding of each color component. The pixels are termed as foreground and background pixels. Calculation of the optimum threshold is done to separate the two classes of pixels for each color component in such a way that their combined intra-class variance is minimal. An exhaustive search is performed for a threshold that minimizes the intra-class variance given by the weighted sum of variances of the two classes of pixels for each of the three color components. The weighted within-class variance is given in equation (4.1) σ2w (t) = q1 (t)σ 21 (t) + q2 (t)σ 22 (t)
(4.1)
4.3 Methodology | 67
where the class probabilities of different gray level pixels are estimated as: t
q1 (t) = ∑ P(i)
(4.2)
i=0
and
255
q2 (t) = ∑ P(i),
(4.3)
i=t+1
and the class means are given by t
i ∗ P(i) q1 (t) i=0
μ 1 (t) = ∑ and
(4.4)
255
i ∗ P(i) . q2 (t) i=t+1
μ2 (t) = ∑
(4.5)
The total variance (σ 2 ) = within-class variance (σ 2w (t)) + between-class Variance (σ 2b (t)), where σ 2b (t) = q1 (t)[1 − q1 (t)][μ 1 (t) − μ 2 (t)]2 . Since the total variance is constant and independent of t, the effect of changing the threshold is purely to shift the contributions of the two terms back and forth. Between-class variance is given in equation (4.6). σ 2b (t) = q1 (t)[1 − q1 (t)][μ 1 (t) − μ 2 (t)]2 .
(4.6)
Thus, minimizing the within-class variance is the same as maximizing the betweenclass variance. 4.3.1.2 Threshold calculation with Niblack’s method (adaptive threshold) Niblack’s method is a popular local thresholding method for image binarization. The method calculates the pixel-wise threshold for each color component by sliding a rectangular window over the component. The threshold is adopted based on the local mean m(i, j) and standard deviation σ(i, j). The window size is calculated as b × b. The threshold is given as in equation (4.7). T(i, j) = m(i, j) + k ⋅ σ(i, j),
(4.7)
where k is a constant having a value in between 0 and 1. 4.3.1.3 Threshold calculation with Sauvola’s method (adaptive threshold) Sauvola’s threshold is a local-variance based method. The thresholding method is an improvement over Niblack’s method and the threshold for each color component is given in equation (4.8) T(i, j) = m(i, j) ∗ [1 + k (
σ(i, j) − 1)] . R
(4.8)
68 | 4 A study of recent trends in content based image classification
4.3.1.4 Threshold calculation with Bernsen’s method (adaptive threshold) The technique maintains a local window size of w = 31. The threshold is calculated as the mean of the minimum and the maximum gray value within the local window specified for each color component. The contrast is defined as the difference of the maximum and the minimum gray value. If the contrast is below certain contrast threshold k the pixel inside the window is set to 0 or 1 according to the class that most suitably described the window. Thus the algorithm is dependent on the value of k and the size of the window.
4.4 Proposed methodologies 4.4.1 Method one We have used the traditional Otsu’s global thresholding algorithm to calculate the threshold value for the even and odd images. The proposed methodology assumes that the images cluster toward two major gray levels. Thus the histogram analysis results in two sharp peaks where the application of Otsu’s method is logical. The images are separated into R, G and B components and Otsu’s method is applied on each block of R, G and B separately for feature vector extraction. The algorithm is given in Algorithm 4.1. Algorithm 4.1: Method one 1: Input an image I with three different color components denoted as cc1, cc2 and cc3 respectively
each of size m ∗ n. /* where {(cc1), (cc2), (cc3)} = {(R), (G), (B)} for RGB color space */ 2: Perform the horizontal flip of I and name it I across the X and Y axis of dimension m ∗ n. 3: Create the even image Ieven and the odd image Iodd of dimension m ∗ n. Ieven =
(I + I ) 2
and
Iodd =
(I − I ) 2
4: Calculate the threshold value T x for both the even and odd image for each color component R, G
and B to binarize them directly using Otsu’s method /* x = R, G and B */ {1 , BitMapx (i, j) = { 0, {
x(i, j) ≥ T x x(i, j) ≤ T x
5: Generate image features for binary BTC for the given image.
/* Xupmean is calculated from the pixel values which are assigned 1 and Xlomean is calculated from the pixel values assigned to 0 */ Xupmean = Xlomean =
∑m i=1
∑nj=1
m∗n
m n 1 ∗ ∑ ∑ BitMapx (i, j) ∗ X(i, j) BitMapx (i, j) i=1 j=1
− ∑m i=1
m n 1 ∗ ∑ ∑ (1 − BitMapx (i, j)) ∗ X(i, j) n ∑j=1 BitMapx (i, j) i=1 j=1
(4.9)
(4.10)
4.4 Proposed methodologies
| 69
4.4.2 Method two The proposed technique uses Otsu’s method of thresholding on each of the color components R, G and B of a given generic image. The color components are extracted separately from the image. The algorithm for the technique of feature extraction is implemented by Algorithm 4.2. Algorithm 4.2: Method two 1: Input an image I with three different color components denoted as cc1, cc2 and cc3 respectively
each of size m ∗ n. /* where {(cc1), (cc2), (cc3)} = {(R), (G), (B)} for RGB color space */ 2: Calculate the threshold value T x for each color component R, G and B to binarize them directly using Otsu’s method./* x = R, G and B */ {1 , Bitmapx (i, j) = { 0, {
x(i, j) ≥ T x x(i, j) ≤ T x
3: Generate image features for binary BTC for the given image. /* Xupmean is calculated from the
pixel values that are set to 1 and Xlomean is calculated from the pixel values assigned to 0 */ Xupmean = Xlomean =
∑m i=1
∑nj=1
m∗n
m n 1 ∗ ∑ ∑ BitMapx (i, j) ∗ X(i, j) BitMapx (i, j) i=1 j=1
− ∑m i=1
m n 1 ∗ ∑ ∑ (1 − BitMapx (i, j)) ∗ X(i, j) n ∑j=1 BitMapx (i, j) i=1 j=1
(4.11)
(4.12)
4.4.3 Method three This method calculates the threshold at each pixel based on the local statistics of the even and odd image variety. The feature extraction technique implements the famous Niblack’s adaptive threshold method for image binarization. Three separate color components R, G and B are extracted for the purpose of calculating the individual thresholds for each color component as in Algorithm 4.3. The size of the neighbourhood should be small enough to reflect the local illumination level and large enough to include both objects and the background. Algorithm 4.3: Method three 1: Input an image I with three different color components denoted as cc1, cc2 and cc3 respectively
each of size m ∗ n. /* where {(cc1), (cc2), (cc3)} = {(R), (G), (B)} for RGB color space */ 2: Perform the horizontal flip of I and name it I across the X and Y axis of dimension m ∗ n. 3: Create the even Image Ieven and the odd image Iodd of dimension m ∗ n Ieven =
(I + I ) 2
and
Iodd =
(I − I ) 2
70 | 4 A study of recent trends in content based image classification
4: Generate bitmaps for each color component directly using Niblack’s method for both the even
5: 6: 7: 8: 9: 10:
and odd images for each color component R, G and B. /* performs local thresholding with the M-by-N neighbourhood (default is 3-by-3). The default value for K is -0.2. The default value of OFFSET is 0.The border pixels are padded with 0*/ if Bitmapx (i, j) > (mean + k ∗ standard_deviation) then Bitmapx (i, j) = 1 else Bitmapx (i, j) = 0 end if Generate image features for binary BTC for the given image. /* Xupmean is calculated from the pixel values which are set to 1 and Xlomean is calculated from the pixel values assigned to 0*/ Xupmean = Xlomean =
∑m i=1
∑nj=1
m∗n
m n 1 ∗ ∑ ∑ BitMapx (i, j) ∗ X(i, j) BitMapx (i, j) i=1 j=1
− ∑m i=1
m n 1 ∗ ∑ ∑ (1 − BitMapx (i, j)) ∗ X(i, j) n ∑j=1 BitMapx (i, j) i=1 j=1
(4.13)
(4.14)
4.4.4 Method four Each color component R, G and B of a generic image are extracted as blocks by the proposed methodology. Image bitmaps are generated by implementing Niblack’s adaptive thresholding algorithm as given in Algorithm 4.4. Feature vectors are calculated from the generated bitmaps. Algorithm 4.4: Method four 1: Input an image I with three different color components denoted as cc1, cc2 and cc3 respectively
2:
3: 4: 5: 6: 7: 8:
each of size m ∗ n. /* where {(cc1), (cc2), (cc3)} = {(R), (G), (B)} for RGB color space */ Generate bitmaps for each color component directly using Niblack’s method for both the even and odd images for each color component R, G and B. /* performs local thresholding with M-by-N neighbourhood (default is 3-by-3). The default value for K is −0.2. The default value of OFFSET is 0.The border pixels are padded with 0*/ if Bitmapx (i, j) > (mean + k ∗ standard_deviation) then Bitmapx (i, j) = 1 else Bitmapx (i, j) = 0 end if Generate image features for binary BTC for the given image. /* Xupmean is calculated from the pixel values which are assigned 1 and Xlomean is calculated from the pixel values assigned to 0*/ Xupmean =
m n 1 ∗ ∑ ∑ BitMapx (i, j) ∗ X(i, j) n BitMap (i, j) ∑m ∑ x i=1 j=1 i=1 j=1
(4.15)
Xlomean =
m n 1 ∗ ∑ ∑ (1 − BitMapx (i, j)) ∗ X(i, j) n m ∗ n − ∑m BitMap (i, j) ∑ x i=1 j=1 i=1 j=1
(4.16)
4.4 Proposed methodologies | 71
4.4.5 Method five We have considered Sauvola’s method of adaptive thresholding for binarizing the even and odd images in this method. This method of threshold selection is an improvement over Niblack’s method of threshold for backgrounds with light texture and big variations. It adapts the contribution of the standard deviation. The algorithm is given in Algorithm 4.5. Algorithm 4.5: Method five 1: Input an image I with three different color components denoted as cc1, cc2 and cc3 respectively
each of size m ∗ n. /* where {(cc1), (cc2), (cc3)} = {(R), (G), (B)} for RGB color space */ 2: Perform the horizontal flip of I and name it I across X and Y axis of dimension m ∗ n. 3: Create the even Image Ieven and the odd image Iodd of dimension m ∗ n Ieven =
(I + I ) 2
and
Iodd =
(I − I ) 2
4: Generate bitmaps for each color component directly using Sauvola’s method for both the even
5: 6: 7: 8: 9: 10:
and odd images for each color component R, G and B. /* performs local thresholding with M-by-N neighbourhood (default is 3-by-3) and threshold between 0 and 1 (default is 0.34). The border pixels are padded with 0s*/ if Bitmapx (i, j) > (mean ∗ (1 + k ∗ standard_deviation/R − 1) then Bitmapx (i, j) = 1 else Bitmapx (i, j) = 0 end if/* R is the maximum value of standard deviation and k takes positive values in the range [0.2,0.5] */ Generate image features for binary BTC for the given image. /* Xupmean is calculated from the pixel values which are assigned 1 and Xlomean is calculated from the pixel values assigned to 0*/ Xupmean = Xlomean =
∑m i=1
∑nj=1
m n 1 ∗ ∑ ∑ BitMapx (i, j) ∗ X(i, j) BitMapx (i, j) i=1 j=1 m
1 m∗n
n − ∑m i=1 ∑ j=1
BitMapx (i, j)
(4.17)
n
∗ ∑ ∑ (1 − BitMapx (i, j)) ∗ X(i, j)
(4.18)
i=1 j=1
4.4.6 Method six The proposed methodology extracts the R, G and B component separately as blocks from a generic image. Bitmaps are generated from each of the color component by using Sauvola’s threshold as given in Algorithm 4.6. The upper and the lower mean are calculated from the bitmaps thus generated after the thresholding operation.
72 | 4 A study of recent trends in content based image classification
Algorithm 4.6: Method six 1: Input an image I with three different color components denoted as cc1, cc2 and cc3 respectively
2:
3: 4: 5: 6: 7: 8:
each of size m ∗ n. /* where {(cc1), (cc2), (cc3)} = {(R), (G), (B)} for RGB color space */ Generate bitmaps for each color component R, G and B directly using Sauvola’s method for given image. /* performs local thresholding with M-by-N neighbourhood (default is 3-by-3) and threshold between 0 and 1 (default is 0.34). The border pixels are padded with 0s*/ if Bitmapx (i, j) > (mean + (1 + k ∗ (standard_deviation/R − 1))) then Bitmapx (i, j) = 1 else Bitmapx (i, j) = 0 end if/* R is the maximum value of standard deviation and k takes positive values in the range [0.2,0.5] */ Generate image features for binary BTC for the given image. /* Xupmean is calculated from the pixel values which are assigned 1 and Xlomean is calculated from the pixel values assigned to 0*/ Xupmean = Xlomean =
∑m i=1
∑nj=1
m∗n
m n 1 ∗ ∑ ∑ BitMapx (i, j) ∗ X(i, j) BitMapx (i, j) i=1 j=1
− ∑m i=1
m n 1 ∗ ∑ ∑ (1 − BitMapx (i, j)) ∗ X(i, j) n ∑j=1 BitMapx (i, j) i=1 j=1
(4.19)
(4.20)
4.4.7 Method seven The contrast of even and odd images is considered as a measure for calculating the threshold in this proposed method. We have considered Bernsen’s local adaptive method for determining the threshold value in this technique. The algorithm of feature extraction with Bernsen threshold technique is given in Algorithm 4.7. Algorithm 4.7: Method seven 1: Input an image I with three different color components denoted as cc1, cc2 and cc3 respectively
each of size m ∗ n. /* where {(cc1), (cc2), (cc3)} = {(R), (G), (B)} for RGB color space */ 2: Perform the horizontal flip of I and name it I across X and Y axis of dimension m ∗ n. 3: Create the even Image Ieven and the odd image Iodd of dimension m ∗ n Ieven =
(I + I ) 2
and
Iodd =
(I − I ) 2
4: The threshold is set at the midrange value, which is the mean of the minimum Ilow (i, j) and
maximum gray values Imax (i, j) in a local window of suggested size w = 31 for each color component R, G and B.
4.4 Proposed methodologies | 73
5: If the contrast C(i, j) = Imax (i, j) − Ilow (i, j) is below a certain contrast threshold k, the
Bitmapx (i, j) within the window may be set to 0 or to 1 according to the class that most suitably describes the window for each color component R, G and B. 6: Generate image features for binary BTC for the given image. /* Xupmean is calculated from the pixel values which are assigned 1 and Xlomean is calculated from the pixel values assigned to 0*/ Xupmean = Xlomean =
∑m i=1
∑nj=1
m∗n
m n 1 ∗ ∑ ∑ BitMapx (i, j) ∗ X(i, j) BitMapx (i, j) i=1 j=1
− ∑m i=1
m n 1 ∗ ∑ ∑ (1 − BitMapx (i, j)) ∗ X(i, j) n ∑j=1 BitMapx (i, j) i=1 j=1
(4.21)
(4.22)
4.4.8 Method eight The proposed technique is implemented separately after extracting the R, G and B components of a generic image. Bitmaps for the image iare generated after thresholding the image with the Bernsen algorithm as shown in Algorithm 4.8. The feature vectors are calculated from the generated bitmaps to find out the upper and the lower mean. Algorithm 4.8: Method eight 1: Input an image I with three different color components denoted as cc1, cc2 and cc3 respectively
each of size m ∗ n. /* where {(cc1), (cc2), (cc3)} = {(R), (G), (B)} for RGB color space */ 2: The threshold is set at the midrange value, which is the mean of the minimum Ilow and maximum gray values Imax in a local window of suggested size w = 31 for each color component R, G and B. 3: If the contrast C(i, j) = Imax (i, j) − Ilow (i, j) is below a certain contrast threshold k, the Bitmapx (i, j) within the window may be set to 0 or to 1 according to the class that most suitably describes the window for each color component R, G and B. 4: Generate image features for binary BTC for the given image. /* Xupmean is calculated from the pixel values which are assigned 1 and Xlomean is calculated from the pixel values assigned to 0*/ Xupmean = Xlomean =
n ∑m i=1 ∑j=1
m∗n
m n 1 ∗ ∑ ∑ BitMapx (i, j) ∗ X(i, j) BitMapx (i, j) i=1 j=1
− ∑m i=1
m n 1 ∗ ∑ ∑ (1 − BitMapx (i, j)) ∗ X(i, j) n ∑j=1 BitMapx (i, j) i=1 j=1
(4.23)
(4.24)
74 | 4 A study of recent trends in content based image classification
The eight different algorithms for feature extraction discussed in this section have deployed global, local and mean threshold selection for feature extraction using image binarization. The extracted features have diversified robustness based on the type of threshold used to extract those features. Each technique has primarily divided the images into three basic color components, namely, red(R), green(G) and blue(B). Furthermore, the even and odd varieties of the images are created. The threshold selection techniques (global, local and mean) are applied separately to binarize the even and odd image varieties. The grey values higher than the threshold are clustered in one group and the values lower or equal to threshold are clustered in a different groups. The mean of the higher and lower intensity groups are considered as the feature vectors for the classification process.
4.5 K-nearest neighbor (KNN) classifier (distance based classifier) The KNN is known as an instance based classifier which executes as per resemblance between two instances. The nearest neighbour is identified to carry out classification in the instance space. The unknown instance is assigned a similar class to that of the nearest neighbor [14]. The distance measure used is the mean square error (MSE) method as in equation (4.25) for comparing the query image and database images, 1 M N 2 MSE = ∑ ∑ [I(x, y) − I (x, y)] , (4.25) MN y=1 x=1 where I and I are the two images used for comparison using the MSE method.
4.6 Experimental verification The proposed techniques are evaluated on a subset of Corel stock photo database which is known as the Wang dataset [15]. It is a widely used public dataset. The dataset is divided into nine categories. Each category is comprised of 100 images. A sample of the original dataset considered is shown in Figure 4.1. A cross validation scheme is applied to assess the classification performances for different feature vector extraction techniques. The scheme is n fold cross validation [16]. The value of n is an integer value and is considered as 10. In this process the entire dataset is divided into 10 subsets. One subset is considered as the testing set and the other nine subsets are considered to be a training set. The method is repeated for 10 trials and the performance of the classifier is evaluated by combining the 10 results thus obtained after evaluating the 10 folds.
4.7 Measures of evaluation
| 75
Fig. 4.1: Sample images of different categories from the Wang Database
4.7 Measures of evaluation The following two different parameters are considered for the quantitative comparison of the proposed techniques with the existing methodologies [16].
4.7.1 Misclassification rate (MR) The error rate of the classifier that indicates the proportion of instances that have been wrongly classified as in equation (4.26). MR =
FP + FN TP + TN + FP + FN
(4.26)
where, True Positive (TP) = Number of instances classified correctly True Negative (TN) = Number of negative results created for negative instances False Positive (FP) = Number of erroneous results as positive results for negative instances False Negative (FN) = Number of erroneous results as negative results for positive instances
76 | 4 A study of recent trends in content based image classification
The misclassification rate (MR) for classification with feature extraction using global threshold and adaptive threshold techniques on generic image and its even and odd varieties are given in Table 4.1 and 4.2 respectively. Graphical representation for image category wise comparative measures for the misclassification rate (MR) using global threshold on different image varieties are given in Figure 4.2. Figure 4.3 is for an average misclassification rate (MR) of each proposed technique using the global threshold. Figure 4.4 represents the category wise comparison for MR for feature extraction techniques applying different adaptive thresholds and the comparison of average MR for each proposed technique applying different adaptive thresholds are shown in Figure 4.5.
4.7.2 F1 Score The precision and recall (TP Rate) can be combined to produce a metric known as an F1 Score as in equation (4.27). It is considered as the harmonic mean of precision and recall. A higher value of the F1 Score indicates better classification results. F1score =
2 ∗ Precision ∗ Recall Precision + Recall
(4.27)
Precision = the probability that an object is classified correctly as per the actual value. Recall = the probability of a classifier that it will produce true positive result. F1 Score for classification with feature extraction using global threshold and adaptive threshold techniques on generic image and its even and odd varieties have been given in Table 4.3 and 4.4 respectively. Graphical representation for image category wise comparative measures for F1 Score using aglobal threshold on different image varieties is given in Figure 4.6. Figure 4.7 is for the average F1 Score of each proposed technique using the global threshold. Figure 4.8 represents the category wise comparison for F1 Score for feature extraction techniques applying different adaptive thresholds, and the comparison of the average F1 Score for each proposed technique applying different adaptive thresholds as shown in Figure 4.9. The results of applying different thresholds are shown in Figure 4.10 to Figure 4.25.
Tribals 0.14 0.12 0.12 0.14
Techniques
Generic + Even + Odd Image using Otsu Even Image using Otsu Odd Image using Otsu Generic Image using Otsu
0.10 0.10 0.11 0.15
Sea Beach 0.16 0.16 0.17 0.21
Gothic Structure 0.11 0.11 0.12 0.13
Bus 0.00 0.01 0.00 0.00
Dinosaur
Tab. 4.1: Misclassification Rate (MR) for feature extraction using Global Threshold Method.
0.09 0.09 0.09 0.10
Elephant 0.03 0.02 0.03 0.03
Roses 0.04 0.03 0.05 0.05
Horses 0.12 0.13 0.12 0.16
Mountains
0.085 0.086 0.091 0.107
Average
4.7 Measures of evaluation | 77
Tribals 0.07 0.08 0.07 0.12 0.11 0.11 0.11 0.08 0.10 0.10 0.11 0.13
Techniques
Generic+Even+Odd Image using Sauvola Odd Image using Sauvola Even Image using Sauvola Odd Image using Bernsen Generic Image using Bernsen Generic+Even+Odd Image using Bernsen Even Image using Niblack Generic Image using Sauvola Generic+Even+Odd Image using Niblack Even Image using Bernsen Odd Image using Niblack Generic Image using Niblack
0.11 0.13 0.12 0.11 0.12 0.10 0.12 0.15 0.11 0.11 0.14 0.15
Sea Beach 0.12 0.13 0.15 0.14 0.14 0.16 0.15 0.14 0.16 0.16 0.15 0.15
Gothic Structure 0.07 0.08 0.09 0.09 0.11 0.11 0.10 0.10 0.12 0.12 0.12 0.12
Bus
Dinosaur 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Tab. 4.2: Misclassification Rate (MR) for feature extraction using Global Threshold Method.
0.06 0.06 0.08 0.07 0.07 0.07 0.07 0.08 0.08 0.07 0.09 0.11
Elephant 0.03 0.02 0.03 0.02 0.03 0.03 0.03 0.03 0.02 0.03 0.04 0.04
Roses 0.04 0.03 0.03 0.03 0.03 0.03 0.02 0.03 0.02 0.03 0.04 0.03
Horses 0.11 0.11 0.10 0.11 0.11 0.12 0.12 0.12 0.13 0.12 0.13 0.13
Mountains
0.067 0.071 0.076 0.078 0.080 0.080 0.081 0.081 0.082 0.083 0.091 0.096
Average
78 | 4 A study of recent trends in content based image classification
4.7 Measures of evaluation
| 79
0.25
0.2 Tribals Sea Beach Gothic Structure Bus Dinosaur Elephant Roses Horses Mountains Average
0.15
M R 0.1
0.05
0
Generic + Even + Odd Image using Otsu
Even Image using Otsu
Odd Image using Otsu
Generic Image using Otsu
Image Varieties Fig. 4.2: Image Category wise comparison of Misclassification Rate (MR) for proposed techniques applying global threshold
Fig. 4.3: Average misclassification rate (MR) comparison for proposed techniques applying global threshold
80 | 4 A study of recent trends in content based image classification Image Category wise Performance evaluation of Proposed Techniques using Adaptive Threshold based on MR 0.18
Misclassification Rate (MR)
0.16 0.14 0.12
Tribals Sea Beach Gothic Structure Bus Dinosaur Elephant Roses Horses Mountains
0.10 0.08 0.06 0.04 0.02 0.00 Generic+ Odd Image Even Odd Image using using Even+ Image OddImage Sauvola using Bernsen using Sauvola Sauvola
Generic Generic+ Even Even+ Image Image using OddImage using Bernsen using Niblack Bernsen
Generic Generic+ Even Odd Image Image using Image Even+ Niblack using OddImage using Sauvola using Bernsen Niblack
Generic Image using Niblack
Fig. 4.4: Image Category wise comparison of Misclassification Rate (MR) proposed techniques applying adaptive thresholds
Comparison of Proposed Techniques using Adaptive Threshold based on Average MR 0.120
Misclassification Rate (MR)
0.100 Generic +Even+ Odd Image using Sauvola Odd Image using Sauvola Even Image using Sauvola Odd Image using Bernsen Generic Image using Bernsen Generic + Even + Odd Image using Bernsen Even Image using Niblack Generic Image using Sauvola Generic + Even + Odd Image using Niblack Even Image using Bernsen Odd Image using Niblack Generic Image using Niblack
0.080
0.060
0.040
0.020
0.000
Proposed Techniques using Adaptive Threshold
Fig. 4.5: Average misclassification rate (MR) comparison for proposed techniques applying adaptive threshold
Tribals 0.50 0.53 0.48 0.39
Techniques
Generic + Even + Odd Image using Otsu Even Image using Otsu Odd Image using Otsu Generic Image using Otsu
0.51 0.53 0.44 0.29
Sea Beach 0.29 0.25 0.26 0.15
Gothic Structure
Tab. 4.3: F1 Score for feature extraction using global threshold methods.
0.49 0.51 0.50 0.44
Bus 1.00 0.97 1.00 0.98
Dinosaur 0.63 0.61 0.60 0.57
Elephant 0.88 0.90 0.86 0.88
Roses 0.83 0.84 0.77 0.75
Horses 0.41 0.40 0.42 0.26
Mountains
0.616 0.615 0.594 0.522
Average
4.7 Measures of evaluation |
81
Tribals 0.70 0.66 0.68 0.51 0.54 0.55 0.53 0.67 0.56 0.55 0.52 0.43
Techniques
Generic + Even + Odd Image using Sauvola Odd Image using Sauvola Even Image using Sauvola Odd Image using Bernsen Generic Image using Bernsen Generic + Even + Odd Image using Bernsen Even Image using Niblack Generic Image using Sauvola Generic + Even + Odd Image using Niblack Even Image using Bernsen Odd Image using Niblack Generic Image using Niblack
0.45 0.37 0.38 0.47 0.43 0.53 0.44 0.26 0.49 0.45 0.38 0.34
Sea Beach 0.48 0.43 0.39 0.35 0.38 0.27 0.32 0.41 0.30 0.31 0.33 0.29
Gothic Structure
Tab. 4.4: F1 Score for feature extraction using adaptive threshold methods.
0.69 0.67 0.61 0.60 0.55 0.55 0.55 0.58 0.47 0.50 0.46 0.44
Bus 0.99 0.99 0.99 1.00 1.00 1.00 0.99 0.99 0.99 0.99 1.00 1.00
Dinosaur 0.74 0.75 0.66 0.67 0.67 0.71 0.67 0.64 0.68 0.69 0.62 0.53
Elephant 0.88 0.89 0.85 0.90 0.86 0.87 0.86 0.85 0.90 0.88 0.82 0.81
Roses 0.84 0.88 0.87 0.87 0.88 0.86 0.91 0.84 0.90 0.88 0.83 0.88
Horses 0.45 0.47 0.50 0.48 0.46 0.40 0.45 0.44 0.40 0.42 0.35 0.39
Mountains
0.692 0.678 0.659 0.648 0.640 0.637 0.636 0.631 0.631 0.630 0.590 0.567
Average
82 | 4 A study of recent trends in content based image classification
4.7 Measures of evaluation
| 83
Fig. 4.6: Image Categoty wise comparison of F1 Score for proposed techniques applying global threshold
Fig. 4.7: Average F1 Score comparison for proposed techniques applying global threshold
84 | 4 A study of recent trends in content based image classification
Image Category wise Performance eveluation of Proposed Techniques using Adaptive Threshold based F1 Score 1.20
1.00
F1 Score
0.80
Tribals Sea Beach Gothic Structure Bus Dinosaur Elephant Roses Horses Mountains
0.60
0.40
0.20
0.00 Generic+ Odd Image Even Odd Image using using Even+ Image OddImage Sauvola using Bernsen using Sauvola Sauvola
Generic Generic+ Even Even+ Image Image using OddImage using Bernsen using Niblack Bernsen
Generic Generic+ Even Odd Image Image using Even+ Image Niblack using OddImage using Sauvola using Bernsen Niblack
Generic Image using Niblack
Fig. 4.8: Image category wise comparison of F1 Score for proposed techniques applying adaptive thresholds
Fig. 4.9: Average F1 Score comparison for proposed techniques applying adaptive thresholds
4.7 Measures of evaluation
Fig. 4.10: Binarization of even image using Otsu’s threshold
Fig. 4.11: Binarization of odd image using Otsu’s threshold
Fig. 4.12: Binarization of generic + even + odd image combinations using Otsu’s threshold
Fig. 4.13: Binarization of generic image using Otsu’s threshold
| 85
86 | 4 A study of recent trends in content based image classification
Fig. 4.14: Binarization of even image using Niblack’s threshold
Fig. 4.15: Binarization of odd image using Niblack’s threshold
Fig. 4.16: Binarization of generic + odd + even image combination using Niblack’s threshold
Fig. 4.17: Binarization of generic image using Niblack’s threshold
4.7 Measures of evaluation
|
Fig. 4.18: Binarization of even image using Sauvola’s threshold
Fig. 4.19: Binarization of odd images using Sauvola’s threshold
Fig. 4.20: Binarization of generic + odd + even image combination using Sauvola’s threshold
Fig. 4.21: Binarization of generic image using Sauvola’s threshold
87
88 | 4 A study of recent trends in content based image classification
Fig. 4.22: Binarization of even image using Bernsen’s threshold
Fig. 4.23: Binarization of odd image using Bernsen’s threshold
Fig. 4.24: Binarization of generic + odd + even image combination using Bernsen’s threshold
Fig. 4.25: Binarization of generic image using Bernsen’s threshold
4.8 Experimental results | 89
4.8 Experimental results Table 4.1 and Table 4.2 show the results of the misclassification rate (MR) of the nine different image categories after applying global thresholds and adaptive threshold techniques respectively. The results in Table 4.1 show that feature extraction by applying Otsu’s method on a combined set of generic, even and odd images have a minimum misclassification rate (MR). An ascending order of misclassification rate (MR) is observed by applying Otsu’s method separately on each variation viz., even images, odd images and generic images respectively. The graphical representation of the comparisons is shown in Figure 4.2 and 4.3 respectively. Table 4.2 shows the minimum misclassification rate (MR) for feature extraction on applying Sauvola’s threshold on the combination of even, odd and generic images, followed by increasing misclassification rate (MR) for each variation viz., odd image, and even image respectively on applying Sauvola’s threshold. The fourth and fifth highest misclassification rate (MR) is shown by feature extraction on applying Bernsen’s threshold on odd images and on generic images respectively. The increasing order of misclassification rate is continued with feature extraction on applying Bernsen’s threshold on a combination of generic, even and odd images, Niblack’s threshold on even images, generic images using Sauvola’s threshold, a combination of generic, even and odd images using Niblack’s threshold, an even image using Bernsen’s threshold, odd images using Bernsen’s threshold and generic images using Niblack’s threshold respectively. The comparisons for the misclassification rates (MR) with techniques applying adaptive thresholds are shown in Figure 4.4 and 4.5. Table 4.3 and 4.4 show the results for the F1 Score of classification done by feature extraction using global and adaptive threshold techniques respectively on nine different image categories of the Wang Dataset. It is observed in Table 4.3 and in Figure 4.6 and 4.7 that feature extraction by applying Otsu’s method on a combination of generic, even and odd images has the highest F1 Score compared to the rest three techniques on applying Otsu’s method of thresholding as it has the least misclassification rate (MR). Similarly, in Table 4.4 feature extraction on applying Sauvola’s threshold on a combination of even, odd and generic images has shown the maximum F1 Score compared to the rest of the feature extraction techniques on applying different adaptive threshold methods. The comparisons for techniques with adaptive thresholds are given in Figure 4.8 and 4.9. Table 4.5 compares the misclassification rate (MR) and F1 Score of the proposed techniques and the existing techniques simultaneously. It is observed that the proposed feature extraction techniques applying Sauvola’s threshold separately on a combination of even, odd and generic images, on odd images and on even images have outperformed all other proposed techniques applying global and adaptive thresholds, as well as the existing techniques applying a mean threshold on even images, on odd images and on a combination of even, odd and
90 | 4 A study of recent trends in content based image classification
Tab. 4.5: Comparison of Proposed Techniques and Existing Techniques. Techniques
MR
F1 Score
Generic + Even + Odd Image using Sauvola (Proposed) Odd Image using Sauvola (Proposed) Even Image using Sauvola (Proposed) Even Image using Mean Threshold (Existing) Odd Image using Bernsen (Proposed) Generic Image using Bernsen (Proposed) Generic + Even + Odd Image using Bernsen (Proposed) Even Image using Niblack (Proposed) Generic Image using Sauvola (Proposed) Generic + Even + Odd Image using Niblack (Proposed) Even Image using Bernsen (Proposed) Generic + Even + Odd Image using Mean Threshold (Existing) Generic + Even + Odd Image using Otsu (Proposed) Even Image using Otsu (Proposed) Odd Image Using Mean Threshold (Existing) Odd Image using Otsu (Proposed) Odd Image using Niblack (Proposed) Generic Image using Niblack (Proposed) Generic Image using Otsu (Proposed)
0.067 0.071 0.076 0.078 0.078 0.080 0.080 0.081 0.081 0.082 0.083 0.083 0.085 0.086 0.09 0.091 0.091 0.096 0.107
0.692 0.678 0.659 0.648 0.646 0.640 0.637 0.636 0.631 0.631 0.630 0.627 0.616 0.615 0.598 0.594 0.590 0.567 0.522
generic images. The techniques have shown the minimum misclassification rate (MR) and a higher F1 Score compared to the rest of the proposed and existing techniques in Table 4.5. The first technique applying the mean threshold on even images occupies the fourth position in performance having a higher MR and a lesser F1 Score compared to the preceding three proposed techniques. The proposed feature extraction using Bernsen’s threshold separately on odd images, generic images and on a combination of even, odd and generic images respectively, as well as feature extraction using Niblack’s threshold on even images, Sauvola’s threshold on generic image, Niblack’s threshold on combination of even, odd and generic images and Bernsen’s threshold on even images, show an increasing rate of MR and a decreasing rate of F1 Score compared to the performance of the prior technique in fourth position. This is an existing technique but has a lower MR and a higher F1 Score when referred to the second and third set of existing techniques applying mean threshold on a combination of even, odd and generic images and on odd images respectively. Two of the proposed techniques involving feature extraction with Otsu’s threshold separately on a combination of even, odd and generic images and on even images respectively have the highest F1 Score and a lower MR. This is in comparison to the third technique of existing feature extraction method, using a mean threshold with odd images but it has an inferior performance with reference to the prior two techniques in the table for existing feature vector extraction mechanisms. The least effective performances are exhibited by
4.8 Experimental results | 91
feature extraction with odd images using Otsu’s and Niblacks’s threshold separately followed by feature extraction using Niblack’s method and Otsu’s method individually on generic images. Superior classification results is established by the proposed techniques that have shown a higher F1 Score which is a metric for judging better classification results. Reduced misclassification rate (MR) while classifying using proposed feature vector extraction techniques has revealed the robustness of the extracted features. Thus, it is observed that the adaptive threshold using Sauvola’s method applied on a combination of generic, even and odd images is the most successful technique for feature extraction to facilitate the content based classification performed by the discussed categories of techniques. It is further observed that the other two proposed methods implementing feature extraction using Sauvola’s threshold one by one on odd and even images have also outclassed all the existing techniques. Therefore, feature extraction with Sauvola’s method of adaptive threshold technique applied independently on a combination of generic, even and odd images, on odd images and on even images is the best alternative compared to the rest in its class. These techniques also shown better results compared to the existing mean threshold technique for feature extraction with odd and even images according to Table 4.5. A comparison of the MR and F1 Score of the proposed techniques and the existing techniques is given in Figure 4.26 and 4.27. Comparison of Proposed Techniques w.r.t Existing Techniques based on Average F1 Score 0.700
Generic +Even+ Odd Image using Sauvola ( Proposed) Odd Image using Sauvola (Proposed) Even Image using Sauvola (Proposed)
0.600
Even Image using Mean Threshold (Existing) Odd Image using Bernsen (Proposed) Generic Image using Bernsen (Proposed)
0.500
Generic + Even + Odd Image using Bernsen (Proposed) Even Image using Niblack(Proposed) 0.400
F1 Score
Generic Image using Sauvola (Proposed) Generic + Even +Odd Image using Niblack (Proposed) Even Image using Bernsen (Proposed)
0.300
Generic +Even +Odd Image using Mean Threshold (existing) Generic +Even +Odd Image using Otsu (Proposed) 0.200
Even Image using Otsu (Proposed) Odd Image using Mean Threshold (Existing) Odd Image Using Otsu (Proposed)
0.100
Odd Image using Niblack (Proposed) Generic Image using Niblack (Proposed) Generic Image using Otsu (Proposed)
0.000
Techniques Fig. 4.26: Comparison of average misclassification rate (MR) of proposed and existing techniques
92 | 4 A study of recent trends in content based image classification Comparison of Proposed Techniques w.r.t Exsisting based on Average MR Generic +Even+ Odd Image using Sauvola ( Proposed)
0.120
Odd Image using Sauvola (Proposed) Even Image using Sauvola (Proposed) Even Image using Mean Threshold (Existing)
0.100
Odd Image using Bernsen (Proposed)
Misclassification rate (MR)
Generic Image using Bernsen (Proposed) Generic + Even + Odd Image using Bernsen (Proposed)
0.080
Even Image using Niblack(Proposed) Generic Image using Sauvola (Proposed) Generic + Even +Odd Image using Niblack (Proposed)
0.060
Even Image using Bernsen (Proposed) Generic +Even +Odd Image using Mean Threshold (existing) Generic +Even +Odd Image using Otsu (Proposed)
0.040
Even Image using Otsu (Proposed) Odd Image using Mean Threshold (Existing) Odd Image Using Otsu (Proposed)
0.020
Odd Image using Niblack (Proposed) Generic Image using Niblack (Proposed) Generic Image using Otsu (Proposed)
0.000
Techniques Fig. 4.27: Comparison of average F1 Score of proposed and existing techniques
4.9 Conclusion The work has presented new techniques of feature vector extraction from even and odd variations of generic images using global and adaptive threshold techniques. The methods have calculated image variance based thresholds to perform image binarization for feature extraction. The results have established the proposed methods as better performing techniques for feature extraction for content based image classification, especially with adaptive thresholding using Sauvola’s method. The work may be extended as feature extraction techniques in the area of content based image retrieval, medical image analysis, etc.
References [1] [2]
M. Sezgin and B. Sankur. Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electron. Imaging, 13(1):146–165, 2004. R. Raveaux, J. C. Burie, and J. M. Ogier. Structured representations in a content based image retrieval context. Journal of Visual Communication and Image Representation, 24(8):1252– 1268, 2013.
References |
[3]
[4]
[5]
[6] [7] [8] [9] [10] [11] [12] [13]
[14] [15]
[16]
93
S. D. Thepade, R. Das, K. Kumar, and S. Ghosh. Image classification using advanced block truncation coding with ternary image maps, vol. 361 of Communications in Computer and Information Science, pp. 500–509. Springer, Berlin, Heidelberg, 2013. H. B. Kekre, S. D. Thepade, R. K. K. Das, and S. Ghosh. Multilevel block truncation coding with diverse colour spaces for image classification. In IEEE-International Conference on Advances in Technology and Engineering (ICATE 2013), pp. 1–7, 2013. K. Ntirogiannis, B. Gatos, and I. Pratikakis. An objective evaluation methodology for document image binarization techniques. In 8th IAPR Workshop on Document Analysis Systems, pp. 1–6, 2008. M. Sezgin and B. Sankur. Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging, 13(1):146–165, 2004. J. Sauvola and M. Pietikainen. Adaptive document image binarization. Pattern Recognition, 33(4):225–236, 2000. Y. Yang and H. Yan. An adaptive logical method for binarization of degraded document images. Pattern Recognition, 33:787–807, 2000. E. A. Savakis. Adaptive document image thresholding using foreground and background clustering. In Int. Conf. on Image Processing (ICIP 98), 1998. W. Niblack. An Introduction to Digital Image Processing, pp. 115–116. Prentice Hall, Eaglewood Cliffs, NJ, 1998. J. Bernsen. Dynamic thresholding of gray level images. In ICPR 86: Proc. Intl. Conf. Patt. Recog., pp. 1251–1255, 1986. N. Otsu. A threshold selection method from gray-level histogram. IEEE Transactions on Systems, Man, and Cybernetics, 9:62–66, 1979. S. Thepade, R. Das, and S. Ghosh. Performance comparison of feature vector extraction techniques in RGB color space using block truncation coding or content based image classification with discrete classifiers. In 2013 Annual IEEE India Conference (INDICON), pp. 1–6, 2013. S. B. Kotsiantis. Supervised machine learning : A review of classification techniques. Informatica, 31:249–268, 2007. J. Li and J. Z. Wang. Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1075–1088, 2003. S. Sridhar. Image Features Representation and Description Digital Image Processing, pp. 483– 486. India Oxford University Press, New Delhi, 2011.
Arindam Mondal*, Anirban Mukherjee, and Utpal Garain
5 Intelligent monitoring and evaluation of digital geometry figures drawn by students Abstract: This chapter proposes a feasible implementation of a student-specific automated system of learning and tutoring of school level geometry concepts. Unlike the usual geometry drawing or tutoring software, the proposed system can intelligently assess the student’s understanding of a geometry concept by testing the student with relevant problems and accordingly takes him to a higher or lower level of concepts. The diagram drawn by a student in a digital interface is evaluated to find the degree of geometric correctness which is otherwise a challenging task. For tutoring primary school level geometry, a figure database with different difficulty levels is maintained while for secondary school level tutoring, earlier developed text-to-diagram drawing functionality is incorporated. The novelty of the tutoring system lies in the fact that no such system exists that can draw, recognize and compare digital geometry diagrams and adjust the diagram based content/problems through different learning and testing stages based on dynamic student-specific responses. The representative test cases cited in this study clearly demonstrates the usefulness and intelligent treatment of the system. Keywords: pattern recognition; geometry drawing; computer-based teaching-learning; artificial intelligence
5.1 Introduction 5.1.1 Background By-hand constructions and drawing using ruler and compass on paper play an integral role in constructive geometry; such tasks are included in all secondary school level geometry syllabi’s. With computers being increasingly used as a teaching-learning medium, some software has been developed that replaces the ruler and compass as tools for geometric construction. Such software provides an interface where the students draw geometry patterns using mouse. Students at primary school level can learn geometry by constructing basic shapes like lines, squares, rectangles, triangles, etc. using dynamic mathematics software tools like GeoGebra. There are other geo-
*Corresponding author: Arindam Mondal, Department of Computer Application, RCC Institute of information Technology, Kolkata, India Anirban Mukherjee, Department of Engineering Science & Management, RCC Institute of information Technology, Kolkata, India Utpal Garain, CVPR Unit, Indian Statistical Institute, Kolkata, India https://doi.org/10.1515/9783110552072-005
96 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
metric drawing software tools like Geometria, Sketchometry, Sketchpad, Geomspace, etc. which are useful for practicing geometry in higher classes. Though these types of software seem to be useful in learning geometry shapes, those are not student specific. They provide a predefined learning path through a fixed set of diagrams and diagram construction stages which cannot be adjusted according to the student’s ability or ease of understanding as a teacher does. Student-specific tutoring (of geometry concepts) demands that a teacher should gradually take a student from easier to harder concepts by dynamically assessing his understanding level and if required, by redirecting the student to relearn the basics and consolidate the weaker conceptual areas. This is usually done by continuously testing a student with problems related to a given learning input. Depending on the understanding of a given problem/figure/concept the student is tested with more complex figures or with easier problems/figures representing more fundamental concepts. This chapter describes the working of an intelligent geometry tutoring prototype. The prototype is designed to work in two modes: lesson mode and test mode. In the lesson mode, the learning inputs i.e., the geometric figures of increasing conceptual level difficulty will all be presented in a step-by-step manner. The key geometric concepts related to a figure will also be presented in a context-sensitive and object-sensitive manner. In the test mode, the system will test the students’ understanding of geometry concepts by asking him to draw figures in order of increasing complexity. In the background, the drawing steps and coordinates of key points or vertices of figures drawn by a student will be recorded and the entire figure will be analyzed and evaluated i.e., matched with the corresponding correct figure (definition) stored in or generated by the system. Based on the match percentage i.e., correctness of the figures drawn by a student, he will be further tested with harder or easier figures automatically. Thus, there will be a dynamic assessment of a student’s performance or understanding and also re-learning of related basics to help the students achieve complete understanding of entire learning content through varying time and steps.
5.1.2 Motivation When a teacher physically interacts with a student to teach a subject like geometry, he gets the required feedback from the student to dynamically vary the pace or content of teaching which would best suit the student. Here the teacher’s experience and knowledge to test a student’s understanding comes into play. Often the teacher has to break up a geometry problem or figure into its constituent parts or simpler sub-problem to help in the understanding process of the student. On the contrary, a student with a better grasp of a given concept is usually given harder problems to solve. In a humancomputer interaction, normally, the computer is not expected to behave differently with different students and guide the student according to his ability. That is why a computer has not yet been used successfully as an alternative to a teacher.
5.1 Introduction
| 97
To replace the teacher’s role with intelligent geometry tutoring software is a challenging task and requires computational intelligence to take the human factors into consideration to some reasonable extent. A basic software module should primarily demonstrate the concepts of geometry through simple to harder geometry problems and corresponding figures interactively. Besides this, computational intelligence should be applied to monitor the drawing exercise and analyze the diagram drawn by a student to figure out the geometric errors or the overall correctness of the diagram and subsequently decide on the next level of a problem to be given to the student. This requires real time automatic diagram recognition which is not a trivial problem; identifying geometrical correctness of a figure is far more critical than matching of two images or patterns as a particular geometric configuration or a concept can be represented correctly by more than one figure. To the best of our knowledge no such software is available that can intelligently assess the understanding level of a user by evaluating digitally drawn figures corresponding to a given problem or description.
5.1.3 Contribution Primary school level geometry is mostly considered in the present study. Starting from elementary concepts like a horizontal line, vertical line, slant line, perpendicular lines, parallel lines, intersecting lines, etc., simple to complex entities like a rectangle, square, rhombus, parallelogram, pentagon, hexagon, different types of triangle and then such entities with bisectors, diagonals, medians, etc. are demonstrated in the lesson mode. During demonstration the geometric characteristics of the concerned entities are highlighted. The demonstration can be repeated or started from any point as may be desired by a student. In the test mode, which forms the most important part of the proposed system, the student will be asked to draw geometric figures chosen randomly from the set of figures stored in the system corresponding to each difficulty level. The idea of difficulty level of the geometry figures or concepts is taken from the sequential organization of geometry chapters or exercises, in order of increasing complexity, in the geometry text books or work books followed in the schools. The categorization of the geometry figures and concepts by assigning difficulty level is implemented through a figure database in the present system. In primary school level geometry syllabi, there is limited number of figures/concepts that can be stored in the system and can be used as ground-truth references to compare or match with student-drawn figures. The coordinates and different parameter values of the ground-truth figures are also stored and their values are matched with the corresponding values of each parameter of the student-drawn figure. The maximum possible match score against each of these parameter is one which corresponds to a 100% match. The correctness of the diagram drawn by the student is the percentage of the cumulative match score of all the parameters of the student-drawn figure. This
98 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
technique of intelligent evaluation of geometric correctness of a digital diagram was first proposed in [1] by Mukherjee et al. (including two of the present authors). Depending on the percentage match score or correctness, the system will further test the student with a problem chosen randomly from the next higher or lower difficulty level. This process will continue till a student successfully clears all the difficulty levels or gets stuck at a particular level i.e., clears the lower level but cannot clear that particular level. In the second case, the system would recommend the student to take a lesson from the particular level where he got stuck. Unlike primary school level, high school geometry books contain an unlimited variety of geometry problems based on a limited number of concepts defined in the syllabus. It is not possible to store figures representing all types of problems. In the lesson mode, a particular concept (pertaining to a certain difficulty level) may be illustrated through a word problem describing a particular geometric configuration. Again the same concept may be tested with a different word problem representing a different configuration. Therefore, the intelligent monitoring and evaluation system needs to first construct a conceptually correct geometry figure dynamically in the background upon understanding the textual description of the geometry problem; then it should match the student-drawn figure with the system-generated figure (definition) and determine the correctness. Natural language understanding is a challenging problem in the research of artificial intelligence (AI). This functionality of automatic diagram drawing from a natural language description has been implemented partially in the present system using the concept of GeometryNet based diagram drawing introduced by Mukherjee et al. [2, 3]. For high school level geometry tutoring it is again a challenge to assign the difficulty level of the geometry problems. An attempt has been made in the present system to work with a limited pool of problems categorized according to their order of occurrence in text book chapters dealing with different geometrical concepts. Thus the system under the present study is a unique combination of three elements of functional intelligence – (1) automatically evaluate the geometrical correctness of a school level geometry diagram drawn digitally, (2) automatically construct a correct representation of a school level geometry word problem or concept description and (3) change the learning content and test problems in real time according to the student’s response. The present system can potentially help in examining those children who suffer from visual perceptual disease. This type of disorder affects the understanding of information that a person sees and/or affects his ability to draw or copy something. The system as described here can monitor drawing of a simple geometric figure by a child and measure the extent to which he has correctly drawn, which might be helpful to identify how much visual perceptual disorder the child has. The rest of this chapter is divided into the following five sections. Section 5.2 discusses some of the software available in the market for geometry diagram drawing and intelligent math tutoring. The systems earlier proposed by us for automated geometry
5.2 Literature survey
| 99
diagram drawing from word problems and for automated evaluation of such diagrams are also described. Section 5.3 explains the interface, and the working modes of the present system. The figure database and student-specific navigation between difficulty levels is also explained here. Section 5.4 deals with the dynamic diagram assessment methodology and substantiates it with different test cases with sample problems taken from a primary and high school level geometry book. Section 5.5 and 5.6 concludes this chapter with introspection, highlighting the limitation and the future scope of research of the present system.
5.2 Literature survey In Section 5.1 it was mentioned that software to practice and learn geometric constructions are already available in the market. Here, we provide a brief overview of few of those software. GeoGebra (Figure 5.1) is a software that explicitly links (a bidirectional combination of) geometry and algebra. It is an interactive program for learning and teaching mathematics including geometry to a higher level. Constructions can be made with points, vectors, segments, lines, polygons, conic sections and higher order polynomials. Another popular software GNU Dr. Geo (Figure 5.2) is an interactive geometry software that allows its users to design & manipulate interactive geometric sketches. In primary education Dr. Geo is a nice tool to explore the geometric properties of triangle, square, etc. Advanced users can also design interactive activities with a combined use of interactive sketch and programmed scripts.
Fig. 5.1: GeoGebra Software
100 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
Fig. 5.2: Dr. Geo Software
Fig. 5.3: Sketchometry Software
Sketchometry is dynamic mathematics software for doing Euclidean geometry and function plotting. The user sketches points, circles and lines on the screen, and Sketchometry (Figure 5.3) tries to identify these strokes and generates exact geometric objects out of it.
5.2 Literature survey
| 101
All of this software are good for drawing geometry figure or polynomial sections/ curves on the screen but they are unable to recognize and analyze a new diagram drawn and assess its geometrical correctness. Now coming to mathematical tutoring software, Mindspark [4], developed by Educational Initiatives, India, is an adaptive online intelligent tutoring system that helps students to study mathematics effectively through questions, practice, games, etc. In this software, if the student answers five consecutive questions correctly, he will receive a challenge question which is harder than normal questions. If the student answers the challenge questions correctly, he receives extra points. When a student answers a question incorrectly, he may be provided a simple or detailed explanation, or be redirected to questions that strengthen the basic understanding. These decisions are taken by an adaptive logic which is expected to get better and better with increased student usage. It thus aims to use not just the interactivity of the computer, but its intelligence; and to mimic the diagnostic capabilities of a good teacher. But Mindspark does not deal with drawing geometrical figures. The Mathematics Tutor [5] helps students solve word problems using fractions, decimals and percentages. The tutor records the success rates while a student is working on problems while subsequently providing appropriate problems for the student to work on. The subsequent problems that are selected are based on the student’s ability and a desirable time is estimated in which the student is to solve the problem. Image processing to identify and mark objects or areas of different geometrical shapes in an image is a new field of study. Much research has been carried out for accurate recognition of object edges that represent some geometric shape. Garg et al. [6] presents a method of shape recognition for different regular geometrical shapes using morphological operations. Zakaria et al. [7] also proposed another shape recognition method where circle, square and triangular shaped object/areas within an image could be identified. Their proposed method utilizes intensity values from the input image followed by thresholding by Otsu’s method to obtain the binary image. They have used median filtering to eliminate noise and a Sobel operator to find the edges. Rege et al. [8] have discussed an approach involving digital image processing and geometric logic for recognition of two dimensional objects such as squares, circles, rectangles and triangles as well as the color of the object. Their methods involve a three dimensional RGB image to two dimensional black and white image conversions, color pixel classification for object-background separation, area based filtering and the use of a bounding box and its properties for calculating object metrics. Abbadi et al. [9] have introduced a new approach for recognizing two dimensional shapes in an image, and also recognizes the shapes type. Their algorithm recognizes all the known shapes based on segmentation of images into regions corresponding to individual objects and then determines the shape factor which is use to recognize the shape type. In the algorithm proposed by Chhaya et al. [10] gives an approach to identify basic geometric shapes like square, circle, triangle, etc. The algorithm involves conversion of an RGB image to a gray scale image and then to a black and white image. They have
102 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
used the thresholding concept. The area of the minimum bounding rectangle is calculated irrespective of the angle of rotation of the object, and the ratio of this area to the area of the object is calculated and compared to the predefined ratio to determine the shape of the given object. The above mentioned image processing approaches recognize standard geometrical shapes, they treat the shapes as areas of contiguous same color pixels rather than geometric objects. Therefore the geometric properties of the shapes are not analyzed or extracted for any kind of comparative study. As per the research literature there are few intelligent geometry diagram drawing software or prototypes that use AI to provide a self learning platform to students of geometry. One such system implemented by Mukherjee and Garain [3, 11] based on a knowledge base GeometryNet [2] developed by them, can automatically draw the equivalent diagram of a school level geometry problem to help students who cannot understand the problem and draw the diagram correctly. Here natural language processing (NLP) is used to analyze a geometry word problem and collect the entities and their relations as described in the problem to connect to the knowledge base, and find the values of parameters needed to draw the described entities using standard graphics functions. LIM-G is another knowledge base used by Wong et al. [12] to create a proto-type that understands elementary school level geometry problems (which are about the area and circumference of various shapes) using prebuilt geometry knowledge and constructs a figure for the problem. The LIM-G knowledge base is constructed with an ontology based knowledge engineering tool called InfoMap [13]. Another tool DG for drawing dynamic geometric figures on a web page by understanding the text of school level geometry problems with the help of knowledge base engine InfoMap, has been developed by Wong et al. [14]. Geometric concepts extracted from an input text are used to output a multistep JavaSketchpad (JSP) script, which constructs the dynamic geometry figure on a webpage. With this tool, the teachers and students can construct dynamic geometric figures on a web page for their learning activities, such as making geometry conjectures and proving a theorem. In the domain of vector graphics recognition, several methods have been proposed [15, 16]. These methods evaluate the quality of empirical matches between the entities detected in vector form by the graph recognition algorithm and the groundtruth entities based on a pre-defined set of physical matching criteria of elementary graphic entities like line, circle and arc. But these methods are not suitable for evaluating the geometrical correctness of a diagram freely drawn on the screen as explained in [1]. The empirical method proposed in [1] by Mukherjee et al., though not foolproof, seems to be the only available method to rationally quantify the extent of the match of a digitally drawn geometry diagram with the corresponding groundtruth diagram. This method, explained in Section 5.4, has been adopted in the test mode of the present system to find the correctness of a geometry diagram digitally drawn by a student.
5.3 System functions |
103
5.3 System functions The present system for intelligent tutoring of geometry is designed in such a manner that it helps the student in learning the concepts gradually in a self paced manner and also helps monitoring or testing his progress of learning. Thus the system interacts with the student in two modes, one is the lesson mode and the other is the test mode. Let us first discuss the lesson mode. Here the lesson content is divided into two classes – primary and secondary. The first step in tutoring of geometry at a primary class is to demonstrate the fundamental geometrical concepts regarding lines like horizontal, vertical, slanted, acute, obtuse, parallel, perpendicular, intersecting, bisecting and then the other closed entities like rectangle, square, parallelogram, rhombus, polygon with n sides, diagonal, different varieties of triangles, altitude, median, etc. in a step by step manner. Each of these concepts or entities will be demonstrated with a figure along with the geometrical properties. The relevant properties for a given figure will be appearing as pop-ups. The entities are made hyper-objects wherever relevant. When a parallelogram or rectangle is demonstrated then if the mouse is placed over any of the four edges, it automatically highlights the parallel line of that line by changing colour and also shows a pop-up message “These lines are parallel and equal”. Also it shows the equal angles by putting angle marks “)” of same colour when the mouse is pointed near any of the corners. If for example a right angle triangle is demonstrated then along with a figure of right angle triangle (Figure 5.4), a pop-up information like the following will appear: A right angled triangle is a triangle in which one angle is a right angle that is a 90° angle. The side opposite to the right angle is called hypotenuse. The sides adjacent to the right angle are called base and height. The Pythagoras formula that applies for right angle triangle is: base2 + height2 = hypotenuse2 . When the student points the mouse pointer near the right angle of the triangle, a square angle mark “⌞” pops-up near the angle to show that that angle is 90°. Again the pop-ups ‘hypotenuse’, ‘base’ and ‘height’ appear when the pointer touches respective sides of the right angle triangle. The system will thus demonstrate all the concepts or figures pertaining to a level and will then automatically move up to next higher level to demonstrate the figures stored at that level. Likewise, in the increasing order of complexity, all the figures of all the levels will be demonstrated interactively. If a student wants to go back to a certain level or repeat a demonstration of a particular figure or pause at a point, the system would allow that. For the secondary class module, very limited number of diagrams corresponding to representative geometry word problems, categorized according to complexity of the underlying central concept, are stored in the system and are used in the lesson mode.
104 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
Fig. 5.4: Lesson mode display screen
Fig. 5.5: A complex geometry figure
For example consider the word problem: “PQ is perpendicular to RS such that Q is the midpoint of RS. PT is parallel to RS. Find angle RQT”. Figure 5.5 is the diagram drawn by the system to represent the above problem. Here the geometrical solution of the question asked in the problem is not within the purview of the system function. It can only draw the equivalent diagram to correctly illustrate the problem which many student would find difficult to do. In the test mode, which forms the most important part of the proposed system, the students’ understanding of geometry lessons will be tested and monitored. The student would be asked to draw geometric figures chosen randomly from the set of figures stored pertaining to each difficulty level. The correctness of the geometrical figure drawn by a student in the computer can be evaluated by the system following the method proposed by Mukherjee et al. in their earlier work [1, 17]. The system tracks the score of the student at different levels. Wherever the student fails (i.e., scores below threshold level), he will be again tested at the previous level with different figures; depending on the outcome the system would recommend the student to take further lessons at that level or previous level thereby helping him to understand the concept fundamentally. On the other hand, if the student can draw figures reasonably correctly (scoring more than a threshold value) at a given level then he would be asked to draw
5.3 System functions | 105
figures from the next higher level and likewise till the last level is reached. Let us consider a student who is asked to draw a geometric figure e.g., ‘Draw a rectangle’. When the student draws a figure on screen using the mouse, the system automatically checks whether the figure drawn is correct i.e., a true rectangle or not by matching with the properties of a rectangle already stored in the figure database. If the figure is correct then it asks the student to draw a figure of a higher level of difficulty e.g., ‘Draw a parallelogram and one bisector of opposite angles’. If against the earlier question, the rectangle drawn is partially correct then it asks another question of the same level of difficulty e.g., ‘Draw a square’. If this time it is totally incorrect or again partially correct then it asks a lower level question e.g., ‘Draw two parallel lines’. To make the tutoring of geometry really student-specific, the difficulty level of the pre-selected figures or concepts or problems are ascertained first, then categorized according to difficulty level and finally connected in a hierarchy of related figures of increasing difficulty level. This yields a figure database covering the entire learning content for the primary school level geometry and partial content of the secondary school level geometry. In the figure database, each figure contains meta data like the figure description, related geometrical elements and properties and parameters linked with the elements. With the help of primary class mathematics text books and some websites we design a simple geometry figure database. Initially, we consider only five different levels. Level 1 is for the figures of some basic concept of lines. Level 2 to level 5 are some basic entities with increasing complexity. For lesson mode and as well as test mode the system displays or asks to draw figures from this database. A pictorial representation of the figure database is shown in Figure 5.6 though the meta-data connected to each figure is not shown here. For a primary class student, as the variety of geometrical concepts is limited, the system can test a student from the same figure database which is used to teach the student in lesson mode. But for higher class geometry there is a wide variety of geometry word problems that carries a central geometrical concept – hence to test a secondary class student, the figure database containing a limited pool of geometry word problems and corresponding diagrams is not enough. The present system allows connecting to a concept-wise (difficulty level-wise) list of geometry word problems; either the system can choose randomly from that list or the student can himself select any problem from that list. While the student draws the diagram in the given interface, the system automatically constructs the equivalent diagram using the GeometryNet based text-to-drawing (t2d) tool (an earlier work by Mukherjee et al. [2, 3]. This machine drawn diagram is optionally displayed on screen and matched with the studentdrawn diagram to evaluate the correctness of the latter. Accordingly, further testing is done by moving to the next higher or lower level. GeometryNet is a knowledge base which forms the backbone of the t2d tool [2]. Basically, it builds a domain-specific ontology for geometrical terminology, where the semantics of certain terms are expressed in terms of equations involving their argu-
106 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
Level of geometry drawing Level 1
Horizontal line
Vertical line
Level 2
Square
Rectangle
Level 3
Equilateral Triangle
Level 4
Rechtangle with angular bisector
Level 5
Draw square ABCD. Join AC and extend to E. Join BE
Slanting line (right to left)
Intersecting lines
Triangle
Parallelogram
Rhombus
Right Triangle
Acute Triangle
Obtuse Triangle
Trapezium
Parallelogram with angular bisector
Triangle with angular bisector
Rectangle with two angular bisector
A C
Slanting line (left to right)
Perpendicular lines
Pentagon
Draw Equilateral Triangle. Draw all angular bisector
B D E
Fig. 5.6: Pictorial representation of figure database (partial)
ments. GeometryNet lists 51 geometric entities (e.g., line, parallelogram, circle, triangle, quadrilateral, etc.) and entity parameters (e.g., coordinates, angle, slope, length, etc.), 50 entity attributes (e.g., isosceles, concentric, common, etc.), and 35 interaction types or relations (e.g., produce, intersect, bisect, etc.) between the entities. The t2d module is intelligent enough to automatically analyze, comprehend and diagrammatically represent any new problem (stated in natural language) chosen by the student. Its NLP unit (comprising POS tagger, parser and translator) does the syntactic and semantic analysis of a problem description and figures out the geometrical terms and relations mentioned therein, resolving all the lexical complexities. Upon consulting the GeometryNet knowledgebase it forms the parse graph first and subsequently a language independent intermediate representation of the problem [3]. Fi-
5.3 System functions |
107
nally, the line elements to be drawn along with their parameter values are intelligently figured out (by machine understanding of the NLP output) to generate a diagrammatic representation of the input problem. The functional units of the t2d module along with their outputs are illustrated in Figure 5.7. Thus this tool is flexible and intelligent enough to potentially accept any secondary school level geometry word problem as input and draw any complex combination of geometric entities described in the problem. This is unlike other available geometry diagram drawing software or tutoring tools that use stored figures and fixed patterns or predesigned structured input mechanism to draw figures without requiring any language comprehension module or domain knowledge. This novel text-to-diagram conversion mechanism thus forms a basic intelligent framework towards developing the present system.
NL Geometry Problem
Parser 1
Digital Diagram
Parse Graph
Translator
Graphics Module
Intermediate Representation
Draw- Able Representation
Parser 2
GeometryNet NLP Tools Fig. 5.7: Functional units of text-to-diagram conversion tool. [3]
5.3.1 System interface The interface of the system is designed to display the title of the geometrical entity or textual description of a geometrical configuration to be demonstrated by the system or to be drawn by the student. To make things easier for primary school students the interface provides a dotted grid at integer coordinates to draw figures using a straight line from dot to dot. When the student clicks with the mouse near or over one grid point, it is selected and a line starting from that point is shown as a drag line (like the
108 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
rubber band line in popular geometry tools). When he again clicks near another grid point then a firm line is drawn up to it. Thus the student can draw a figure with a set of straight line segments. Once the figure is completely drawn, the student needs to click a ‘submit’ button for the program to evaluate the correctness of the figure and return with the feedback ‘correct’, ‘partially correct’ and ‘incorrect’ for scores as 100, 50–99 and 0–49 respectively. The above process is illustrated in Figure 5.8 where a student is tested by being asked to ‘Draw a parallelogram’. The student completes the drawing in Figure 5.8(c) and presses the submit button. Immediately the score is displayed along with a comment that the figure drawn is partially correct. For cases like this where the student drawn figure is incorrect or partially correct, the system will display the correct figure on the screen as shown in Figure 5.8(d).
Q1. Draw one parallelogram
T
(a)
Q1. Draw one parallelogram
T
(b)
Q1. Draw one parallelogram
T
A parallelogram looks like this
T
You are partially correct (75%)
(c)
(d)
Fig. 5.8: Drawing and display interface: (a) dotted grid, (b) drawing in progress, (c) drawing complete, score and comment displayed (d) correct figure displayed
As the grid points are at predefined locations, the system can extract the coordinates of the vertices or line end points of the figure drawn by the student in the order the points are connected by line segments. The point coordinates of the correct geometrical figures to be displayed when the student draws incorrectly, are pre-selected and stored in the figure database.
5.4 Assessment of digital diagram
| 109
For primary students the grid of dots is useful for drawing but for students of higher class, who have already learned the basic geometry concepts and figures, the system will provide a blank drawing space without any dot grid. This is required to draw lines between points that do not fall exactly on any of the dots of the grid. On the other hand, drawing in a blank space will be more difficult than drawing on a dot grid because one will have to draw parallel, perpendicular or equal lines based on eye estimation. However, the system will have in-built tolerance for minor errors due to eye estimation. Even without the grid points the system can extract the coordinates of the points of the diagrams drawn. In that case tracking the pen-up, pen-down events will enable capture of the key-points or vertices.
5.4 Assessment of digital diagram An assessment module renders the essential intelligent functionality – automated evaluation or monitoring of student-drawn geometry diagrams, of the present system. The module is an adoption of an earlier work done by Mukherjee et al. [17]. Unlike other available geometry tutoring tool that doesn’t attempt to measure the accuracy of a freely drawn digital geometry figure, the present system can judge the geometric correctness of a diagram just the way a teacher checks a diagram drawn (practically at any size and orientation) on paper by a student. Such evaluation is fundamental in nature as it checkstheconceptualcorrectnessof a geometry diagram. Themethod isbased ongraph adjacency matrices and geometric properties (e.g., parallel, perpendicular, equal) of basic entitiesrather thantheir physicalproperties(e.g., length, angle, coordinates). The quantitative evaluation of the diagram drawn by a student in the present system gives a measure of the quantum of geometry phenomena a student is able to understand and represent properly thereby meeting the requirement of an intelligent geometry tutor. In this method the system evaluates the geometrical similarity of the studentdrawn figure and system-stored or system-drawn figure by comparing the values of the following parameters of both the figures. (i) Adjacency matrix (for vertices) (ii) Total number of points (iii) Total number of edges (iv) Total number of angles between edges (v) Sets of same-length edges (vi) Sets of same angle between edges Once the coordinates of the figure drawn by the students are extracted in the order they were connected, the adjacency matrix of the figure is automatically generated.
110 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
Subsequently, the system calculates the length matrix and slope matrix for the edges of the figure drawn. From the slope matrix the system computes the smaller (acute) angle between every pair of edges connected at some point. Then it searches for angles of same values and forms different sets of same angles (if more than one). Similarly, sets are found for same-length edges of the figure drawn. Values for the same parameters corresponding to the correct figure (whose coordinates are preselected) are stored in the figure database. While comparing the parameters if the adjacency matrices (of same order) of two figures do not match element by element then it is checked whether they match if one of the matrices is read in reverse row and/or reverse column direction. For example, consider the two rectangles with one diagonal in Figure 5.9. The respective adjacency matrices in the order of A-B-C-D are shown beside the diagrams: The matrices are of same order but apparently they do not match. If we read the second matrix in a bottom to top and right to left direction and compare with the first matrix read normally in a top to bottom and left to right direction, then the matrices match. This implies that the diagrams are geometrically equivalent. Even after this effort if the adjacency matrices (of same order) do not match, then a match score is awarded based on the number of rows completely matched. If only 2 out of 4 rows match, the score is 2/4 or 1/2 out of a maximum possible score of 4/4 or 1. If the order of adjacency matrices are not the same, implying an unequal number of points, then the score is 0 as no row will match. With regard to parameters (ii) to (vi), the scores are simply calculated as the ratio of the parameter value (say x) for the student-drawn figure to that (say y) of the systemstored or system-drawn figure provided x ≤ y. The score x/y should be equal to 1 for all the parameters if the figure is 100% correct. If x > y for any parameter then the corresponding score is calculated as 1 − (x − y)/y, thus assigning a penalty for (x − y) additional elements generated wrongly against expected y elements. For example if x
Fig. 5.9: Geometrically similar diagrams
5.4 Assessment of digital diagram
| 111
is 3 (for a given parameter) and y is 4 then the score will be 3/4 but if x is 5 and y is 4 then the score will be 1−((5−4)/4) i.e., 3/4. There will be multiple values of parameter for (v) and (vi). The cumulative score of the student-drawn figure over the grid points considering all the parameters and their multiple values is calculated. The sum of the maximum possible scores against each of these parameter values is stored in the database. The maximum possible score corresponds to a 100% correctly drawn figure. The final output of evaluation is the percentage of cumulative score of the student-drawn figure and this is calculated as: ∑ Scores of parameters (i) to (iv) × 100. ∑ Max. Scores of parameters (i) to (iv) Frequently, there will be multiple values of parameter (v) or (vi) for both diagrams. Suppose for student-drawn diagram the values against (vi) are ⟨3, 2, 1⟩ and for the system-stored or system-drawn diagram the corresponding values are ⟨4, 2⟩. Now for each element of ⟨4, 2⟩ the system will try to find a perfect match in ⟨3, 2, 1⟩. A match is obtained for 2 and 4 has no match. Then keeping aside the perfectly matched element 2, the nearest match of 4 is searched and found as 3 among the rest of the elements (3 and 1) in ⟨3, 2, 1⟩. Finally the match score is calculated for the ⟨x, y⟩ pairs ⟨3, 4⟩, ⟨2, 2⟩ and ⟨1, 0⟩. The scores are 0.75 and 1 for the first two pairs. For the third pair x > y, but y being 0, the score 1 − (x − y)/y cannot be evaluated. Therefore for such cases i.e., when there is no set of angles left on the system-stored or system-drawn diagram to match, the score is set to −0.5. This is done to impose a reasonable penalty for undesired set of same angle sets that occurs due to incorrect student drawing. However, if there is an ⟨x, 0⟩ matching pair for parameter (v) then penalty is not imposed because a diagram may be correct geometrically with some edges becoming equal which may not be in the system-stored or system-drawn diagram. The maximum possible score for each ⟨x, y⟩ pairs of these parameters (ii) to (vi) is 1 (i.e., y/y) except 0 in case of ⟨x, 0⟩ match for parameter (v).
5.4.1 Test cases The process of diagram matching described in the assessment section is illustrated below for five test cases and the comparison summary with respect to two sets of values of six parameters are shown in tabular form for each test case. For first three test cases we consider that three students are asked to draw a parallelogram but each student draws differently. The system assesses each figure and gives the score.
112 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
Test case I: Draw a parallelogram. In this case the cumulative score of the student-drawn figure is 6 while the maximum possible score is 6. This implies that the match percentage is [(6/6) × 100] i.e., 100%. The score sheet in Table 5.2 is computed based on the intermediate output given by the system in Table 5.1. Tab. 5.1: Analysis of Test case I System-stored Figure Figure
A
Student-drawn Figure
A
B D
B
D
C
C
Coordinate
A (10, 10), B (10, 40), C (30, 50), D (30, 20)
A (10, 40), B (10, 60), C (40, 40), D (40, 20)
Adjacency Matrix
0 1 0 1
0 1 0 1
Total no. of points
4
Edge slope Matrix
999 0 999 0.5
Edge length Matrix
0 30 0 22.36
Total no. of edge
4
4
Same length edge sets
30 ⇒ 2 22.36 ⇒ 2
36.06 ⇒ 2 20 ⇒ 2
Total no. of angle
4
4
Same angle sets
0.54 = 4
0.67 = 4
1 0 1 0
0 1 0 1
1 0 1 0
1 0 1 0
0 1 0 1
1 0 1 0
4 0 999 0.5 999
999 0.5 999 0
30 0 22.36 0
0.5 999 0 999 0 22.36 0 30
22.36 0 30 0
999 0 999 −0.67
0 999 −0.67 999
999 −0.67 999 0
−0.67 999 0 999
0 20 0 36.06
20 0 36.06 0
0 36.06 0 20
36.06 0 20 0
Tab. 5.2: Score sheet for Test case I
y-value x-value Score Max Score
Adjacency Matrix
Points
Edges
Angles
Same Length Edge
Same Angle
4×4 4×4 1 1
4 4 1 1
4 4 1 1
4 4 1 1
2, 2 2, 2 1 1
4 4 1 1
| 113
5.4 Assessment of digital diagram
Test case II: Draw a Parallelogram. In this case (Table 5.3 and Table 5.4) the cumulative score of the student-drawn figure is 4.5 while the maximum possible score is 6. So as evaluated by the machine, the student is partially correct (the score is within 50%–99%). Hence the student gets a new question (figure) of same level. The system finds such a figure from the figure database of figures. It is evident that in this case the student will be given a figure of the same level (level 2) i.e., to draw a rectangle or square or rhombus. Tab. 5.3: Analysis of Test case II System-stored Figure
Student-drawn Figure
Figure
A
A
Coordinate
A (10, 10), B (10, 40), C (30, 50), D (30, 20)
A (10, 20), B (10, 40), C (30, 50), D (30, 5)
Adjacency Matrix
0 1 0 1
0 1 0 1
Total no. of points
4
Edge slope Matrix
999 0 999 0.5
Edge length Matrix
0 30 0 22.36
Total no. of edge
4
4
Same length edge sets
30 ⇒ 2 22.36 ⇒ 2
0
Total no. of angle
4
4
Same angle sets
0.54 = 4
0.75 = 2 0.5 = 2
B D
D
C
1 0 1 0
0 1 0 1
B
1 0 1 0
C
1 0 1 0
0 1 0 1
1 0 1 0
4 0 999 0.5 999
999 0.5 999 0
30 0 22.36 0
0.5 999 0 999 0 22.36 0 30
22.36 0 30 0
999 0 999 −0.75
0 999 0.5 999
0 20 0 22.91
20 0 22.36 0
999 0.5 999 0
−0.75 999 0 999
0 22.36 0 45
22.91 0 45 0
Tab. 5.4: Score sheet for Test case II
y-value x-value Score Max Score
Adjacency Matrix
Points
Edges
Angles
Same Length Edge
Same Angle
4×4 4×4 1 1
4 4 1 1
4 4 1 1
4 4 1 1
2, 2 0 0 1
4 2, 2 0.5 1
114 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
Test case III: Draw a Parallelogram. In this case (Table 5.5 and Table 5.6) the coumulative score of the student-drawn figure is 2.25 while the maximum possible score is 6. Tab. 5.5: Analysis of Test case III System-stored Figure Figure
A
Student-drawn Figure
A
B D
B
E
C
C D
Coordinate
A (10, 10), B (10, 40), C (30, 50), D (30, 20)
A (10, 30), B (20, 60), C (40, 60), D (50, 40), E (30,10)
Adjacency Matrix
0 1 0 1
0 1 0 0 1
Total no. of points
4
Edge slope Matrix
999 0 999 0.5
Edge length Matrix
0 30 0 22.36
Total no. of edge
4
5
Same length edge sets
30 ⇒ 2 22.36 ⇒ 2
0
Total no. of angle
4
5
Same angle sets
0.54 = 4
0
1 0 1 0
0 1 0 1
1 0 1 0
1 0 1 0 0
0 1 0 1 0
0 0 1 0 1
1 0 0 1 0
5 0 999 0.5 999
999 0.5 999 0
30 0 22.36 0
0.5 999 0 999
0 22.36 0 30
999 3 999 999 −1 22.36 0 30 0
0 31.62 0 0 28.28
3 999 0 999 999
999 0 999 −2 999
31.62 0 20 0 0
999 999 −2 999 −1.5 0 20 0 22.36 0
−1 999 999 −1.5 999 0 0 22.36 0 36.06
28.28 0 0 36.06 0
Tab. 5.6: Score sheet for Test case III
y-value x-value Score Max Score
Adjacency Matrix
Points
Edges
Angles
Same Length Edge
Same Angle
4×4 5×5 0 1
4 5 0.75 1
4 5 0.75 1
4 5 0.75 1
2, 2 0 0 1
4 0 0 1
5.4 Assessment of digital diagram |
115
So as evaluated by the machine, the student is wrong (the score is within 0%– 50%). Hence the student gets a new question (figure) of a lower level. It is evident that in this case the student will be given to draw a basic concept of level 1 i.e., to draw parallel and equal lines. Three screenshots (Figure 5.10) of the system interface in the test mode (T) is given below which shows that the system displays the scores of the figures drawn by the student. Given below are two more test cases (IV & V) which concerns more complicated figures including multiple entities. Out of these two, the Test case V concerns a secondary class geometry problem. Here the reference diagram is drawn by the system from the problem text using the GeometryNet based text-to-diagram conversion tool. Test case IV: Draw one triangle and any one median of the triangle. In this case (Table 5.7 and Table 5.8) the cumulative score of the student-drawn figure is 4 while the maximum possible score is 6. So the student is partially correct, actually he has drawn altitude of a triangle, not the median. Test case V: BE bisects angle ABC and EB is produced to D. Prove that angle ABD = angle CBD. In this case (Table 5.9 and Table 5.10) the cumulative socre of the student-drawn figure is 4.72 while the maximum possible socre is 6. So the student is partially correct.
5.4.2 Comparison The results of the above test cases could not be compared with the output of other similar systems because no such system is available to the best of our knowledge. Any benchmark test data set is also not available for comparing the evaluation of digital diagram drawing from textual input. Though intelligent math tutoring software like Mindspark and Math Tutor are available but they do not deal with tutoring based on digital geometry diagram drawing. However, like the present system the Mindspark software uses the idea of varying the number and complexity of the questions posed to the students depending upon their response pattern. The results of the above test cases concerning the diagram assessment or recognition module of the present system could not be compared because no such geometry diagram assessment system/method is available to the best of our knowledge. Jiang et al. [18] proposes a sketch recognition method to understand hand-drawn geometry graphs and handwritten geometry proof scripts. The graph recognition is based on the detection of shape primitives using turning points in of strokes and angles be-
116 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
Fig. 5.10: Screenshots of the interface in the test mode
| 117
5.4 Assessment of digital diagram
Tab. 5.7: Analysis of Test case IV System-stored Figure
Student-drawn Figure
A
A
Figure
C
D
C
B
D
B
Coordinate
A (10, 30), B (40, 60), C (40, 20), D (40, 30)
A (10, 40), B (50, 50), C (50, 30), D (50, 40)
Adjacency Matrix
0 1 1 1
0 1 1 1
Total no. of points
4
Edge slope Matrix
999 1 −0.33 0.33
1 999 999 0
Edge length Matrix
0 42.43 31.62 31.62
42.43 0 0 20
Total no. of edge
5
5
Same length edge sets
20 ⇒ 2 31.62 ⇒ 2
0
Total no. of angle
8
8
Same angle sets
0
0.25 ⇒ 2 0.5 ⇒ 2
1 0 0 1
1 1 0 1
1 1 1 0
1 0 0 1
1 1 0 1
1 1 1 0
4 −0.33 999 999 0 31.62 0 0 20
0.33 0 0 999
999 0.25 −0.5 0
31.62 20 20 0
0 41.23 44.72 40
0.25 999 999 0 41.23 0 0 10
−0.5 999 999 0
0 0 0 999
44.72 0 0 20
40 10 20 0
Tab. 5.8: Score sheet for Test case IV
y-value x-value Score Max Score
Adjacency Matrix
Points
Edges
Angles
Same Length Edge
Same Angle
4×4 4×4 0 1
4 4 1 1
5 5 1 1
8 8 1 1
2, 2 0 0 1
0 2, 2 0 1
118 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
Tab. 5.9: Analysis of Test case V System-stored Figure
Student-drawn Figure
C
Figure
E
C
E
A
D
B A
D Coordinate
A (30, 40), B (40, 20), C (30, 50), D (50, 10), E(20, 40)
Adjacency Matrix 0 1 0 0 0
1 0 1 1 1
0 1 0 0 0
0 1 0 0 0
A (40, 20), B (40, 40), C (20, 40), D (20, 20), E (10, 10) 0 1 0 0 0
0 1 0 0 0
1 0 1 1 0
0 1 0 0 0
Total no. of points
5
Edge slope Matrix
999 −2 999 999 999
Edge length Matrix
0 22.43 0 0 0
Total no. of edge
4
4
Same length edge sets
22.36 ⇒ 2
20 ⇒ 2
0 1 0 0 1
0 0 0 1 0
5 −2 999 −0.5 −1 −1
999 −0.5 999 999 999
22.36 0 22.36 14.14 28.28
999 −1 999 999 999
0 22.36 0 20 0
999 −1 999 999 999 0 14.14 0 0 0
0 28.28 0 0 0
Total no. of angle 6 Same angle sets
B
999 0 999 999 999
0 999 0 1 999
999 0 999 999 999
999 1 999 999 1
0 20 0 0 0
20 0 20 28.28 0
0 20 0 0 0
0 28.28 0 0 14.14
999 999 999 1 999 0 0 0 14.14 0
4
0.33 ⇒ 4
0⇒2 1⇒2
Tab. 5.10: Score sheet for Test case V
y-value x-value Score Max Score
Adjacency Matrix
Points
Edges
Angles
Same Length Edge
Same Angle
5×5 5×5 0.4 1
5 5 1 1
4 4 1 1
6 4 0.66 1
2 2 1 1
4 2, 2 0 1
5.5 Limitations and future scope |
119
tween strokes. Though Wong et al. [14] developed an application of drawing dynamic geometry figures by understanding the natural language text of geometry problems, the correctness of figures so drawn by the system is judged manually. So it is not used as a tool for intelligent tutoring although the application has close resemblance to the t2d module of the present system. Again intelligent math tutoring software like Mindspark and Math Tutor are available but they do not deal with tutoring based on digital geometry diagram drawing. However, like the present system the Mindspark software uses the idea of varying the number and complexity of the questions posed to the students depending upon their response pattern.
5.5 Limitations and future scope Coming to the limitation of the present system, the secondary class module that requires the GeometryNet knowledge base and text-to-diagram conversion tool (developed by Mukherjee & Garain) as essential components, is not fully implemented yet. Presently it works for a limited number of geometry word problems of secondary classes. The GeometryNet needs to be expanded and some fine tuning of the NLP parsers is required for this module to function at full scale. The figure database also needs to be expanded and optimized for both primary and secondary modules. There is lot of scope for future research on the present system to make it a comprehensive automated intelligent solution for self-learning geometry. Using computer vision techniques if the exact error elements (erroneous lines or points or angles) in a diagram drawn by a student can be detected and displayed on the diagram then it would be much useful for the student compared to merely quantifying the error percentage which is presently done. Elements of machine learning or adaptive logic can be incorporated into the system so that improved decisions are taken with more and more student usage, thereby achieving better simulation of the discourse pattern and pedagogical strategies of an experienced teacher. Moreover a trend analysis can be done over student-drawn diagrams to find the conceptual lacuna of a student and accordingly provide exercises that can help consolidate his understanding.
5.6 Conclusion Though the learning mode of the system as described here doesn’t feature intelligent image analysis, it is a good application of the hierarchical organization of geometric figures having hyperactive elements linked with their properties. In effect a student is given a self-learning opportunity by exploring the figures and progressing through a guided path. The test mode of the system involves real time intelligent image analysis and diagram recognition. The concept of attaching difficulty level to the drawing ex-
120 | 5 Intelligent monitoring and evaluation of digital geometry figures drawn by students
ercises and automatically guiding a student back and forth between the easier and harder figures by dynamically analyzing and evaluating his performance emulates the act of a human teacher. The earlier method applied by two of the present authors (Mukherjee & Garain) in evaluating a diagram drawn by a machine from its textual description is adopted in the present system and a reasonably good result is achieved – this is evident from five test cases presented in Section 5.4. However, the system is yet to establish reliability in the case of complex diagram drawing as depicted in Test case V. The reason being that in such cases automated diagram drawing is involved besides automated diagram evaluation, thereby demanding more computational intelligence. The results so far achieved are quite significant and indicate that the proposed model can be used as an intelligent automated solution for teaching school level geometry to students with varying ability.
References [1]
A. Mukherjee, U. Garain, and A. Biswas. Evaluation of the graphical representation for text-tographic conversion systems. In 10th IAPR International Workshop on Graphics Recognition GREC 2013, Bethlehem PA, USA, 2013. [2] A. Mukherjee, U. Garain, and M. Nasipuri. On construction of a geometrynet. In IASTED International Conference on Artificial Intelligence and Applications, Calgary, Canada, ACTA Press, pp. 530–536, 2007. [3] A. Mukherjee, S. Sengupta, D. Chakraborty, et al. Text to diagram conversion: A method for formal representation of natural language geometry problems. In IASTED International Conference on Artificial Intelligence and Applications, Austria, pp. 137–144, 2013. [4] R. Ramkumar. Autometic identification of affective states using student log data in ITS. In G. Biswas et al., eds., Artificial Intelligence in Education, p. 612. Springer, 2011. [5] C. R. Beal, J. Beck, and B. Woolf. Impact of intelligent computer instruction on girls’ math self concept and beliefs in the value of math. In Paper presented at the annual meeting of the American Educational Research Association, 1998. [6] S. Garg and G. S. Sekhon. Shape recognition based on features matching using morphological operations. International Journal of Modern Engineering Research (IJMER), 2(4):2290–2292, 2012. [7] M. F. Zakaria, H. S. Choon, and S. A. Suandi. Object shape recognition in image for machine vision application. International Journal of Computer Theory and Engineering, 4(1):76–80, 2012. [8] S. Rege, R. Memane, M. Phatak, and P. Agarwal. 2D geometric shape and color recognition using digital image processing. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2(6):2479–2486, 2013. [9] N. El Abbadi and L. Al Saadi. Automatic detection and recognize different shapes in an image. IJCSI International Journal of Computer Science Issues, 10(1):162–166, 2013. [10] S. V. Chhaya, S. Khera, and P. K. S. Basic geometric shape and primary colour detection using image processing on matlab. IJRET: International Journal of Research in Engineering and Technology, 4(5):505–509, 2015.
References |
121
[11] A. Mukherjee and U. Garain. Intelligent tutoring of school level geometry using automatic text to diagram conversion utility. In 2nd International Conference on Emerging Applications of Information Technology (EAIT), Kolkata, India, IEEE, 2011. [12] W. K. Wong, S. C. Hsu, S. H. Wu, and W. L. Hsu. LIM-G: Learner-initiating instruction model based on cognitive knowledge for geometry word problem comprehension. Computers and Education Journal, 48(4):582–601, 2007. [13] W. L. Hsu, S. H. Wu, and Y. S. Chen. Event identification based on the information map – INFOMAP. In IEEE Systems, Man, and Cybernetics Conference, Tucson, Arizona, USA, pp. 1661– 1672, 2001. [14] W. K. Wong, S. K. Yin, and C. Z. Yang. Drawing dynamic geometry figures online with natural language for junior high school geometry. IRRODL, 13(5):126–147, 2012. [15] I. T. Phillips and A. K. Chhabra. Empirical performance evaluation of graphics recognition systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(9):849–870, 1999. [16] L. Wenyin and D. Dori. A protocol for performance evaluation of line detection algorithms. Machine Vision and Applications, 9(5-6):240–250, 1997. [17] A. Mukherjee, U. Garain, and A. Biswas. Evaluation of diagrams produced by text-to-graphic conversion systems. Graphics Recognition. Current Trends and Challenges, 20:252–265, 2014. [18] Y. Jiang, F. Tian, H. Wang, et al. Intelligent understanding of handwritten geometry theorem proving. In Proceedings of the 15th international conference on Intelligent user interfaces, China, pp. 119–128, 2010.
B. K. Tripathy*, T. R. Sooraj, and R. K. Mohanty
6 Rough set and soft set models in image processing Abstract: Image processing is used to extract useful information from images. It is among the rapidly growing technologies today and forms a core research area within engineering and computer science disciplines. Uncertainty based models play major roles in image processing in general and image segmentation in particular, leading to their applications in medical image processing and satellite image processing. From among the uncertainty models; namely fuzzy set, rough set, intuitionistic fuzzy set, soft set and their hybrid models, we shall deal with only two as far as their role in image processing is concerned. These are rough set and soft set introduced by Pawlak in 1982 and Molodtsov in 1999 respectively. We shall also deal with some hybrid models of these two models and the models mentioned above. Our special attention will be on the application of these models in image segmentation. Keywords: Image processing, rough set, soft set, image segmentation, feature extraction, data clustering, texture classification
6.1 Introduction The process of transforming images, putting them in digital form, improving their quality through enhancement by performing some operations and drawing useful information is termed as image processing [1]. It can also be categorised as a process for which the inputs are in the form of images, say for instance a video frame or photograph and the output is generated in the form of either some other images, or properties associated with the input image. Normally, under image processing, images are treated as signals of two dimensions whenever techniques are applied for signal processing. Uncertainty is inevitable because it occurs in every sphere of real life situations. Several such models dealing with uncertainty have been proposed in the literature. It may be probability theory which is perhaps the first of such models, followed by a long list like fuzzy set, intuitionistic fuzzy set, rough set, soft set, their hybrid models and many others [2–5]. Image segmentation is a highly fruitful branch of image processing [6]. In this process, images are divided into small groups of pixels, where elements of each such group have more similar features than those in different groups. There are two major directions under segmentation of images. These are termed as pixel classification and gray-level thresholding. In the first approach, the characteristic space of multiple im-
*Corresponding author: B. K. Tripathy, VIT, Vellore T. R. Sooraj, R. K. Mohanty, VIT, Vellore https://doi.org/10.1515/9783110552072-006
124 | 6 Rough set and soft set models in image processing
age bands are used to determine the similarity; where as in the second, a range of gray values within a certain threshold is used for the purpose. However, for all the images obtained from remote sensing, pixel classification is used as the measure. Uncertainties creep into the study of images due to factors like mixed pixels and approximations in the measures. So, statistical methods are used in the classification of images using unsupervised approaches. Among the various other models available to take care of uncertainty are neural networks, fuzzy sets, rough sets, soft sets, intuitionistic fuzzy sets and their hybrid models to name a few. The notion of fuzzy sets was introduced by Zadeh in 1965 and it remains to date the most popular model used by researchers. It is an extension of the basic notion of crisp sets and the notion of membership function is used to define the belongingness of an element to a collection. The membership is gradual by nature. The rough set model was proposed by Pawlak in 1982 and is another model, which complements fuzzy sets in the scientific study of uncertainties. In fact, this model truly follows the idea of an uncertainty region. In this model, uncertainty comes in the form of a region called the boundary, and we cannot say definitely about the belongingness of these elements to the set. There are two other sets associated with a set in the rough set context; the first one is called the lower approximation containing certain elements and the second one is called the upper approximation containing possible elements. However, the three concepts of lower approximation, upper approximation and the boundary of a set are with respect to an equivalence relation defined over the universe of discourse, and these sets change when we change the equivalence relation. Pawlak believed that human knowledge is directly proportional to their capabilities to classify objects. It is well known that equivalence relations and classifications are interchangeable notions and so for mathematical purposes he considered equivalence relations defined over a universe. It was observed by Molodtsov that most of the uncertainty based models found so far (before 1999) and specifically the fuzzy sets and its variants lack the scope for parameterization. So, he proposed a new model called the soft sets in 1999, which represents a parameterized family of sets [7]. It has been observed theoretically and has been verified practically that hybrid models have better modeling power than the individual models. In fact, it was illustrated by Dubois and Prade in 1990 by taking fuzzy sets and rough sets together and forming the models of fuzzy rough sets and rough fuzzy sets. There after several such models have been framed and proposed in the literature. Rough set has been one of the major components of soft computing and has the capability of dealing with uncertainty. It has wide applicability in the area of image processing [8–10]. In the fields of computer vision and information retrieval, image segmentation plays a significant role and so during the past few years both academia and industry are keen for research on it. Reddy et al [11] provided an approach by using the rough fuzzy k-means algorithm in order to segment images. Spatial image classification is the mechanism for extracting meaningful knowledge information classes
6.2 Basic definitions and notations
| 125
from spatial images dataset. Many traditional pixel based image classification techniques such as support vector machines (SVM), ANN, fuzzy methods, decision Trees (DT), etc. exist. The performance and accuracy of these image classification methods depends upon the network structure and number of inputs. In 2016, Vasundhara et al [12] discussed rough sets and artificial neural networks based image classification. Suseendran et al [13] discussed lung cancer image segmentation using rough set theory, where they applied rough k-means clustering. In this chapter, we discuss the role of soft sets and rough sets in image processing and also discuss some applications of these models in image processing. Now a days many researchers are applying soft sets to the fields of computer science. Image processing is one of the areas where it is applied. Lashari et al. [14] applied soft sets to medical image classification. The topics like analysis, diagnosis and teaching in the context of medicine are discussed in this paper. As aids to achieve the above goals many modalities for medical imaging and their use in real life situations, which use techniques from data mining, are proposed. In fact, the main goal of classification of medical images is to study the anatomy of the human body that is affected by the disease and not exactly the accuracy of the results obtained there in. This provides an opportunity to the clinicians to arrive at the path of progress of the disease. Sreedevi and Eliabeth [15] used the fuzzy soft set model to develop a method which detects images of affected areas from an image in digital mammograms. Much work has been done on image segmentation [16], feature selection, etc. In the following sections we shall elaborate on the applications of rough set, soft set and some of their hybrid models in image processing. In the next section we introduce some of the basic definitions to be used by us in the following presentation and discussions.
6.2 Basic definitions and notations We introduce the definitions of the basic models like fuzzy set, intuitionistic fuzzy set, rough set, soft set, rough fuzzy set and rough intuitionistic fuzzy set and some other hybrid models. First, we start with the definition of a fuzzy set. We take U to be a universe of discourse in all discussions below. Definition 6.2.1. A fuzzy set A is determined by a function associated with it, called the membership function being denoted by μ A and is defined by μ A : U → [0, 1] such that for any x in U, μ A associates a real number μ A (x) in the interval [0, 1]. The membership function of a fuzzy set is the extended version of the notion of characteristic function for a crisp set. Definition 6.2.2. Let R be an equivalence relation defined over U such that the equivalence classes generated by it are denoted by [x]R for any x in U. Then for a subset X of U, we associate two crisp sets called the lower and upper approximations of X with
126 | 6 Rough set and soft set models in image processing
respect to R, denoted by RX and RX respectively, defined as RX = {y ∈ X | [y]R ⊆ X}
and
RX = {y ∈ U | [y]R ∩ X ≠ ϕ}.
(6.1)
The set X is rough or not according to RX ≠ RX or otherwise. The uncertainty region associated with X is called the boundary region, being denoted by BN R (X) and is given by BN R (X) = RX\RX. The application of rough set theory can be seen in data reduction and data relationship/dependencies/similarities/differences. The rough set theory approach has found interesting applications in various branches of science including image processing, and artificial intelligence. Rough sets based algorithms have the capability to be extended on parallel computers. In the case of fuzzy sets, the non-membership value of an element is the one’s complement of its membership value. But, it is not the case in many real life cases. So, the intuitionistic fuzzy set model was introduced in 1986. We define it as follows: Definition 6.2.3. An intuitionistic fuzzy set A over U is determined by two functions called its membership and non-membership functions denoted by μ A and ν A such that both are mappings from U to the interval [0, 1] and the sum of their values for any element x lies in the interval [0, 1]. Intuitionistic fuzzy sets are better models as they involve an uncertainty function determined by one’s complement of the function (μ A + ν A ). Definition 6.2.4 (Fig 6.1). A soft set defined over U is a pair (F, E), such that E is a set of parameters defined over U and F is a map from E to P(U), the power set of U. F : E → P(U).
(6.2)
Let (F, E) be a soft set over U. Then we define the characteristic function χ (F,E) = a | a ∈ E} of (F, E) as follows: {χ (F,E) Definition 6.2.5. Let (F, E) be a soft set over U. Then for any a ∈ E, we define the a characteristic function χ(F,E) : U → {0, 1} such that {1 , a (x) = { χ (F,E) 0, {
if x ∈ F(a) ; otherwise .
(6.3)
Definition 6.2.6 (Fuzzy soft set). Let U be an initial universal set and E be a set of parameters. Let I U denote the set of all fuzzy subsets of U. A pair (F, E) is called a fuzzy soft set over U, where F is a mapping given by F : E → IU .
(6.4)
Let (F, E) be a FSS over universal set U. In [3], the set of parametric membership funca |a ∈ E} of (F, E). tions are defined as μ (F,E) = {μ (F,E)
6.2 Basic definitions and notations
| 127
Fig. 6.1: Diagrammatic representation of a soft set
Definition 6.2.7. For ∀a ∈ E, the membership function is defined as follows. a (x) = α , μ (F,E)
α ∈ [0, 1]
(6.5)
Definition 6.2.8. A fuzzy relation on U is said to be a fuzzy equivalence relation iff it is Fuzzy reflexive: For ∀x ∈ U, μ S (x, x) = 1; Fuzzy symmetric: For any two elements x and y ∈ U, μ S (x, y) = μ S (y, x) and Fuzzy transitive: For ∀x, z ∈ U, μ S (x, z) ≥ miny {μ S (x, y), μ S (y, z)} Let E i denote any fuzzy equivalence class on U with respect to a fuzzy equivalence relation S. Then for any fuzzy subset Y ∈ U,the lower and upper approximation X and X of Y with respect to S are given through their membership functions μ X (F i ) = inf max{1 − μ F i (x), μ X (x)} x
μ X (F i ) = sup min{μ F i (x), μ X (x)} .
and
(6.6) (6.7)
x
Definition 6.2.9. The pair (X, X) is said to be the fuzzy rough set associated with X. Let us denote by F(U) as the set of all fuzzy subsets of U. Let S be an equivalence relation defined over U and X i , i = 1, 2. . .n be the equivalence classes of U with respect to S. Then for any fuzzy subset X ∈ U, the lower and upper approximations of X are denoted by μ RX , μ RX respectively are given by μ RX (X j ) = inf μ X (y) y∈X j
μ RX (X j ) = sup μ X (y) .
and
(6.8) (6.9)
y∈X j
Definition 6.2.10. The rough fuzzy set associated with X is the pair (RX, RX) when the two components of the pair are unequal.
128 | 6 Rough set and soft set models in image processing Definition 6.2.11. Let U, R and X i , i = 1, 2. . .n be as above. Let IF(U) be the set of all intuitionistic fuzzy sets defined over U. ∀X ∈ IF(U) we define RX and RX as lower and upper approximations of X with respect to R respectively, these are intuitionistic fuzzy sets such that μ RX (X j ) = inf μ X (y)
and
μ RX (X j ) = sup μ X (y)
and
y∈X j
y∈X j
ν RX (X i ) = sup ν X (y)
(6.10)
ν RX (X i ) = inf ν X (y)
(6.11)
y∈X i
y∈X i
The pair (RX, RX) is called a rough intuitionistic fuzzy set [17] associated with X.
6.3 Applications of rough set theory in image processing In this section we will discuss applications of image processing using rough sets. A granule is defined as the collection of a few information units forming large units and a granulation operation is the decomposition of the universe into several parts. Information granulation helps to represent objects and solve problems in real life, which is essential as a tool for image processing in particular, and pattern recognition in general. One of the major steps in image processing is segmenting the images. This process in case of medical images increases the challenge involved because of factors like unclear contrast in images and the blurriness occurring at the boundaries. The idea of segmentation is that some of the pixels will be belonging to the positive region and will be included in a set X. Some other pixels will not be included in the set X and are called negative regions. The remaining pixels will be in the boundary case. Automatic and computationally efficient identification of the cellular automata for image processing systems is another application where the rough set is being applied [5]. An integration of rough set theoretic knowledge extraction, the EM algorithm and MST clustering for segmentation of multispectral satellite images is another interesting application. Image classification and retrieval is another important task in image processing. Some breakthrough achievements in this field using rough set theory are mammogram classification, object-based image retrieval and image retrieval method insensitive to location variations. Rough set can also be combined with various computational intelligence techniques to provide better results and future research will concentrate on these issues and its implementation. Operations based on images mostly use image segmentation as a key step in the field of image processing. Partitioning an image into several components is termed image segmentation. Similarity and discontinuity of pixels are used in most of the methods developed for segmentation. Edge based techniques are those which use the second characteristic of pixels whereas region based techniques use the first characteristic. The reasons for the majority of problems faced in image segmentation
6.3 Applications of rough set theory in image processing
| 129
arise as a result of factors like noise, object regions not having homogeneity properties, blurred boundaries between objects, low contrast and other issues related to these. One of the major hurdles in processing of satellite images is the determination of class numbers. One such problem is the initial guess to set the parameters in the beginning although several methods are based upon clustering in two stages and voting using sub samples. But, these techniques are over sensitive to noise or expensive from the computation point of view. However, the stochastic EM (SEM) algorithm developed under this approach does not have most of these drawbacks as characterized by small changes in the values of the initial parameters, the termination of the process and the constraint on the maximum number of clusters. Techniques based upon rough set theory, construct approximations of data sets acquired, and synthesize them that leads to efficient analysis of data. The notions of “reducts” and granulation are helpful in achieving this. The universe under study is divided into granules which help in the precise representation of objects in real situations. The information reduction in the form of selecting important attributes is achieved through the concept of reducts so that only a portion of a database needs to be examined instead of the high volume of the original table. Rule generation is an important characteristic of rough set theory, which more often uses the process of breaking the space into granules. These rules have the additional advantage of predicting future events and makings the knowledge base dynamic. Rules generated are region specific, which are parts of the feature space represented through clusters.
6.3.1 Image segmentation using rough set theory In [11], the rough fuzzy C-means (RFCM) was modified to rough fuzzy k-means (RFKM) to make it applicable for segmentation of images. It has two stages. In the first phase cluster centers are identified. In the second phase the number of centres is minimised using rough set theory. These optimized cluster centers and hence clusters are used for segmentation of images by using the k-means clustering algorithm. The advantage of this approach is that it does not require that the cluster centres be initialized and so the number of clusters is also not pre-fixed. The power of rough sets in managing uncertainty is explored here in segmenting images through dimensionality reduction leading to pattern classification. Finally through experimentation it has been established that RFKM generates better image segmentation than RFCM and FCM by using the Xie–Beni index. A review of the techniques for segmenting images using rough set theory is presented in [8]. In [18] a new concept called the “histon” was introduced which is a form of encrustation of a histogram. This concept is used for capturing information related to their colour in multiple dimensions in an integrated manner. The histon finds the relation
130 | 6 Rough set and soft set models in image processing
of all elements which possibly belong to a single segment such that they have similar values of colours with the upper approximation of the set of elements. Following this in [8] the voxels of an image are divided into three categories, which are, in parallel to the lower approximation, the complement of upper approximation and the boundary of a set called the region of interest (ROI), and contain voxels which are certainly contained in the ROI, voxels which are definitely included in the ROI, and the voxels which are possibly in the ROI respectively. In [8] a new method which is based upon rough set theory and neural networks was put forth such that it uses rough set theory so that important attributes are filtered out in the form of reducts leading to extraction of rules. Furthermore, the important parts of an image are extracted and given as input to a neural network for which the number of neurons in the intermediate layer are determined as per the rules obtained, and consequently the weights of the neural network are modified using the essential attributes determined using rough set technique. The concept of a fuzzy rough set (FRS) introduced by Dubois and Prade in [5] was used by Li et al in [19] in order to segment a tongue image by using a new fuzzy rough set clustering approach which depends upon grids. This algorithm first extracts condensation points using FRS and divides the data space into compartments through layering and in the process the edges of the dense blocks are polished by adding condensation points in the border. Another approach was proposed in [20], where the dissimilarity is derived by using the vector angle and Euclidean distance. Next, the binary matrices comprising of similar colour sphere and histon for each colour component are computed. Basing upon roughness the histogram of the colour image segmentation is performed. Following their approach to segment white blood cells which is a combination of rough set based clustering and segmentation based on colours in [21], Mohapatra et al found an integrated method by combining fuzzy set and rough sets to segment the leukocytes in a clustering framework. The strong points of fuzzy and rough sets are selected and used so as to attain high performance. A more efficient technique for segmenting gray scale images using rough sets was proposed in [22] such that the process is performed without any human intervention.
6.3.2 Intuitionistic fuzzy rough sets in colour image segmentation Intuitionistic fuzzy rough set is a better model than the fuzzy rough set model. This model was used in [6, 23] for segmentation of colour images. As mentioned there a problem with the basic measures involving rough sets, as enough importance is not given to the homogeneity of the regions. However, this problem is handled efficiently by taking this new model [18]. To avoid the disturbances created by trivial regions multiscale representation is preferred to normal representation. So, it is an ideal approach to handle the homogeneity problem as well as the handling the trivial regions.
6.3 Applications of rough set theory in image processing
| 131
Gray scale image segmentation is much easier than segmenting colour images because of the capability of human beings in distinguishing between them. There is a huge difference in this level from the point of view that human beings can distinguish around 12 gray scales but can do so for colour images to an extent of thousands. From the application point of view segmentation of colour images has a very important role to play in pattern recognition and computer vision. It has been observed that no particular algorithm would work well for all images; if it is good for one class of images then it may not be suitable for other classes. Histogram techniques are used in the segmentation of both kinds of images. But, two dimensional histogram thresholding is enough for segmentation of gray scale images and for the same process we need three dimensional histogram thresholding. However, the histogram based methods have low requirements from the computational point of view but provides less accuracy as they do not consider spatial correlation. So, region based methods are preferred to histogram based methods. As mentioned above, the use of the histon developed by using rough sets is an ideal solution. Histons represent the upper approximation of ROI in the terms of rough set theory and take care of correlation among neighbouring points and planes. The histogram can take care of the lower approximation in terms of rough set theory. In [24], the advantages of both the multiscale roughness and IFS are taken in segmenting colour images. The selection of the threshold is done on an optimal basis so that one can get non-overlapping images. 6.3.2.1 Linear scale space theory This is a framework proposed by researchers from the field of computer vision, signal processing and image processing and it has derived support from other fields like physics, and biological vision. In order to describe real world objects by their properties, the entities must be over certain ranges of scale for their meaningful existence. This also implies that the look of the objects depends upon the scale of observation. This observation stresses the importance of multiscale representation. The examples from cartography and physics in the branches of thermodynamics and solid mechanics can be considered in this context. Gaussian scale space is the most important of the scale spaces, which is also called the linear scale space. The kernel in Gaussian scale space is the controller of the smoothness of an image. The wideness of the Gaussian kernel is proportional to the value of the scale. Also, taking larger values, the noise present in an image can be removed. 6.3.2.2 Representation of images using intuitionistic fuzzy sets [4] The intuitionistic fuzzy set model was used in [24] to develop a representation of images based upon it and further segment the image. The intensity levels of the image pixels come into use. The representation of an image G using IFS is mathematically described as G = {(z ij , μ G (z ij ), ν G (z ij ), π G (z ij )}, (6.12)
132 | 6 Rough set and soft set models in image processing where i and j vary over the dimension of the pixel size of G, say i = 1, 2. . .P and j = 1, 2. . .Q. Following the intuitionistic fuzzy set notation, μ G (z ij ), ν G (z ij ) and π G (z ij ) represent the membership, non-membership and hesitation values of belongingness of the pixel z ij in the image respectively. The Sugeno non-membership function is used to derive the non-membership values and hence the hesitancy values of the pixels from their membership values. So, Sugeno’s complement function is used to generate a intuitionistic fuzzy image. 6.3.2.3 Roughness index The roughness index depends upon the lower and upper approximation of the image G(p, q, i) for the i-th colour band with intensity l, denoted by H i (ℓ) and H i (ℓ) respectively. K
L
H i (ℓ) = ∑ ∑ β(G(p, q, i) − ℓ) ,
0≤ ℓ≤ M−1,
i ∈ {R, G, B}
(6.13)
p=1 q=1 K
L
H i (ℓ) = ∑ ∑ (1 + s t (p, q)) ⋅ β(G(p, q, i) − ℓ) ,
0≤ ℓ≤ M−1,
i ∈ {R, G, B}
p=1 q=1
(6.14) Here, G(p, q, i) β(.) M K×L
represents the gray level of the pixels represents the impulse function represents the number of intensity levels is the dimension of the image
The data set for an image may comprise of the object and the background, then for both of these components the lower and upper approximations given by (3.2) and (3.3) will be evaluated. The roughness of the image G(p, q, i) is given by ϕ i (ℓ) = 1 −
|H i (ℓ)| |H i (ℓ)|
(6.15)
The notion of approximation space can be used to find the relation between the two concepts of histogram and the IF based histon. If the roughness is too small then it may lead to data redundancy. The key point in this is to determine a scale that is optimal so that the roughness of colour distribution can be measured. When the scale is low the amount of redundant information is displayed and a high scale leads to loss of important information. Many researchers have worked on properties of scale space using entropies. These are depending upon the notions of fuzzy set and rough set and roughness which are used to measure data uncertainties. However, it is proposed in [25] to use the classical rough entropy.
6.3 Applications of rough set theory in image processing
|
133
6.3.2.4 Image segmentation There are four stages in this process. In the first stage the image is represented using IFS so that boundary ambiguities can be handled. In the next stage, a Gaussian filter is used to come up with a multi scale representation. In this stage the approximations of each colour component are first obtained using a scale‘s’. Then the roughness index with the scale‘s’ is derived. The two parameters involved in computing multiscale roughness are the neighbourhood size and the adaptive level. In the third stage bands of each colour segment are obtained by using the valleys and peaks of the roughness index. Depending upon the colour bands, initial segmentation of the image is performed. An algorithm called the “peak selection algorithm” is used for this purpose. In the fourth stage post processing is used to merge colours. This is important to restrict unwanted segmentation beyond the required stage and it removes unwanted colours by merging similar colours within a certain threshold. The selection of the peak plays an important role in getting improved results in segmenting images. Two factors which affect the selection of peaks are the height of the peaks and the distance between adjacent peaks. It is noted that a fixed threshold will not be an efficient approach due to diversity in the distribution of colours and the roughness variation. An efficient approach to select the threshold value is to make it self-adaptive. So, this approach is used to develop an algorithm for peak selection in [24] as follows. Step 1: For the individual roughness index generate the local peaks Step 2: Find the maximum and minimum values Vmax and Vmin of the peak value. Let meanm be the mean of Vmax and Vmin . The S.D. is given by k
SD m = √∑ i=1 (P li + meanm )/k
(6.16)
and height of the peak P h = meanm − SD m . Step 3: The significant peak is selected w.r.t the height and width of the threshold Step 4: Remove the local peaks by applying the above filter and that will output the sequence of the significant peaks The procedure requires finding minimum values between every two adjacent peaks from the output of the above algorithm. Since this causes over segmentation, merging of regions becomes necessary so that redundancies will not occur. The objective of merging is to achieve homogeneity. As the initial segmentation might not have this property and some of the clusters might have very few pixels, they are needed to be merged with the nearby colour region. For this the images are converted into their HSV form and all the regions having a number of pixels less than a threshold value are combined with the nearest region. This process is continued till there are no such regions left. After the merging is over the image is again transformed
134 | 6 Rough set and soft set models in image processing
to its original representation in RGB form. While merging a region having the nearest colour is given preference in addition to the distance. This algorithm when applied to images from the Berkeley segmentation database, which has the results for segmentation performed manually, is found to provide matching segments. The Even Rand Index is used to provide a quantitative account of the effectiveness of the algorithm [24]. For this purpose random images are taken as inputs.
6.4 Soft sets and their applications in image processing A soft set is nothing but a family of sub sets associated with a parameter set. It was introduced by Molodtsov (1999) to provide adequate parameterization tools in an uncertainty model. A soft set associates every parameter e of a parameter set E with a subset of U, called as the e-approximate elements of U and we denote this set by F(e). F(e) may be empty for some elements e and also the intersection of subsets associated with different parameters may be non-empty. Several applications of the soft set are found in the literature [2].
6.4.1 Similarity measures Similarity measure is a methodology to quantify the similarity between two sets. The similarity measures concept is applied in pattern recognition, region extraction, coding theory, and extensively in image processing. Majumdar and Samanta [26] introduced a similarity measure between two soft sets. They have taken tabular representation of soft set as a matrix. If the element maps to a particular parameter than the value will be 1 otherwise it will be 0. Definition 6.4.1. Let E be a parameter set. Then similarity between two soft sets (F1, E) and (F2, E) is given by S(F1 , F2 ) =
∑i F⃗ 1 (e i ) ∙ F⃗ 2 (e i ) . ∑ [F⃗ 1 (e i )2 ∨ F⃗ 2 (e i )2 ]
(6.17)
i
6.4.1.1 Similarity measures in between two fuzzy soft sets In [26], Majumdar and Samanta extensively discussed similarity measure in soft sets. Let (F, E) and (G, E) be two fuzzy soft sets where E is a parameter set. Let the similarity between these two sets be denoted by S(F, G). S(F, G) = max{S i (F, G)} , where S i (F, G) = 1 − (∑ nj=1 |F ij − G ij |)/(∑nj=1 |F ij + G ij |) in which G ij = μ G(e i ) (x i ).
(6.18)
6.4 Soft sets and their applications in image processing
|
135
6.4.2 Feature selection Feature selection refers to the process of selecting a subset of relevant attributes (features, variables, or predictors) from the given large set of attributes. It is a significant process for data mining to lower the complexity with a similar accuracy in the result [27, 28]. It is the process of finding the most relevant features for input and discarding the less relevant features without which it can give a result with almost same accuracy. This process makes the data mining faster by reducing the size of the input. A similar concept called feature extraction refers to the process of extracting the most relevant features from the given data. Feature selection is having a very important role in building a good model with for many reasons. One is that it provides some degree of cardinality reduction, by reducing the number of attributes. As an example, if there are 1000 attributes in a data set, all the data may not be necessary for a particular model. It’s better to reduce the number of less significant attributes as much as possible until it’s not affecting the result, which will increase the quality of the model and make it faster and more efficient. Feature selection is one of the significant facets in knowledge discovery. It facilitates better accuracy for classification with a reduced computational time of classification algorithms. Feature selection can be categorised into two categories such as supervised and unsupervised. If all class labels of the data are known, supervised feature selection should be used. But if the class labels of the data is not available, then the unsupervised feature selection is useful. There is more chance of having missing, or unknown class labels in most of the data sets, which signifies the importance of the unsupervised feature selection [29]. With respect to methods of feature selection, it can be further divided into two categories such as filter and wrapper categories. If the evaluation of features based on the general characteristics of the data is available without referring to any data mining algorithm, then the process is known as a filter model of feature selection method. Whereas, if there is a requirement of any data mining algorithm for the evaluation of features, then the process is known as the wrapper model of feature selection method.
6.4.3 Soft set based feature selection approach for lung cancer images Lung cancer is the most fatal malignant tumour for both men and women. Every year, the number of people dying due to lung cancer is more than breast, colon and prostate cancers altogether. Presently CT-scanning is one of the most used diagnostic procedures for lung cancer, because of its machine learning, pattern recognition and image processing capability [14, 20, 22]. Jothi and Inbarani (2012) investigated the feature selection process and have shown the way in use of soft set theory in computed tomography. Feature selection has an important role for detection of cancer cells by image
136 | 6 Rough set and soft set models in image processing
classification. Based on soft set theory, they proposed Quick Reduct (SSUSQR), which is an unsupervised feature selection algorithm. They applied the algorithm to data of nineteen features extracted from the segmented images of lungs for which two different techniques were used. Those techniques are the grey level co-occurrence matrix (GLCM) and the grey level different matrix (GLDM). The algorithm is used for feature selection from a given data set. By using K-Means and self organizing map (SOM) clustering algorithms to cluster data, the performance of feature selection algorithm was evaluated which gave a positive result. To analyze gene expression data for the differentiation of samples of malignant and non-malignant lung tumours, biomarker identifier (BMI) is used in [29]. BMI is based on a filter model of feature selection. Amer et al. [16] made a computer aided system to recognise abnormalities in lungs and experimented CT images of chest. They investigated texture, wavelet as well as Fourier based features. Aliferis et al. developed some machine learning techniques to differentiate tumour or non-tumour, and metastatic or non-metastatic lung cancer. This method also can be used to differentiate between histological subtypes of lung cancer. It can also be used to differentiate small sets of gene predictors by which the properties such as stability, size and their relation to the lung cancer can be studied. A new approach to attribute reduction using soft set introduced by Herawan et al. [12] using AND operation and obtained results are similar to Pawlak’s rough set [30] based attribute reduction. 6.4.3.1 Attribute reduction Definition 6.4.2. Let (F, E) = {(F, a i ) : i = 1, 2, . . . , |A|} be a multi soft set over U representing a multi-valued information system S = (U, A, V, F). A set of attributes B ⊆ A is called a reduct for A if C F(b1×b2 ×...×b|B| ) = C F(a1×a2 ×...×a|A| ) [12]. Definition 6.4.3. Assume a parameter subset X ⊆ A, a parameter x ∈ A; the significance of x in X is denoted by Sig x (x) and can be defined as Sig x (x) = 1 − |X ∪ {x}|/|X|, where |X| = |IND(X)|. Let U/|IND(X)| = U/X = {X1 , X2 , . . . X n } then |X| = |IND(X)| = ∑ni=1 |X i |2 .|X| − |X ∪ {x}| represents the reduction or increment of indiscernibility. The indiscernibility increment can be expressed by (|X| − |X ∪ {x}|)/X = 1 − 1 − |X ∪ {x}|/|X|. 6.4.3.2 Processing model Any image processing model in general will have all components such as image acquisition, enhancement, image segmentation, feature extraction, feature selection and clustering or classification (Fig. 6.2).
6.4 Soft sets and their applications in image processing
| 137
Fig. 6.2: Steps in the Lung Image Processing Model
6.4.3.3 Image enhancement The viability of CT scan images will be reduced if the image is of low quality with abnormalities in it. So, by the image enhancement process we can increase the quality of an image. Image enhancement is nothing but improving the image quality by removing noise using one or more suitable image filter techniques such as a Gaussian, Average or Median filter [1]. The resultant images obtained by using different filters are shown in Figure 6.4. from the noisy image shown in Figure 6.3(b). Signal to noise ratio (SNR) is the measure to quantify the quality of an image. An increased SNR value signifies an increase in the quality of an image. To compute the SNR the following formula is used. k k SNR = 10log10 ( ∑ i=1 S2i / ∑i=1 (Ŝ 2i −S2i )) ,
where Ŝ = Given image, S = Original image, K = Image size.
(a)
(b)
Fig. 6.3: (a) Original image and (b) Noisy image
(6.19)
138 | 6 Rough set and soft set models in image processing
Fig. 6.4: (a) Filtered image using average filter, (b) Filtered image using Gaussian filter, (c) Filtered image using median filter
6.4.3.4 Feature extraction Feature extraction techniques extract the most significant features as the representative of different classes of objects by analysing images and objects. Jothi and Inbarani (2012) used gray level co-occurrence(GLCM) in four directions – 0°, 45°, 90°, 135° and gray level different matrix (GLDM) to extract textures of given CT images [9]. In Table 6.1 the SNR is given for the images in Figure 6.4. Tab. 6.1: SNR of filtered images given in Figure 6.4 Filter Used
SNR Value of resultant Image
Mean Filter Gaussian Filter Median Filter
4.44 7.5788 11.2069
Jothi and Inbarani (2012) proposed one soft based unsupervised quick reduct algorithm. The AND operation in soft set theory is used to achieve dimensionality reduction. Initially the algorithm starts with an empty set to find the indiscernibility order of each feature in the universal set. The feature with highest indiscernibility value is taken as core in the reduct set. If multiple features are having highest indiscernibility score than the core feature can be selected using Sig(x). CORE(x) will be combined with other significant attributes to get different reducts. The reduct with the highest indiscernibility will be considered for further processing. This procedure continues to get the best reduct which will have same indiscernibility score as the universal set. The algorithm proposed by Jothi and Inbarani is given below.
6.4 Soft sets and their applications in image processing
| 139
Algorithm 6.1: SSUSQR (C) U – Universal Set, C – the set of attributes 1: R ← {} 2: do 3: T←R 4: Find S T (U) = |IND(U)| 5: ∀x ∈ (C − R), find maximum |IND(x)| 6: if multiple attributes are having same Maximum |IND(x)| then 7: Compute Sig(x) for all the attributes having same maximum |IND(x)| 8: x ← CORE(x) 9: end if 10: T ← R ∪ {x} 11: R←T 12: while |R| != S T (U) 13: Return R
In Jothi and Inbarani (2012) the above algorithm is used and experimentally it’s proven that the algorithm is giving better results than the previous algorithms.
6.4.4 Fuzzy soft rough k-means clustering In Dhanalakshmi and Inbarani (2012), image processing is used for clustering of gene expression data using rough k-means clustering approach for image processing. Gene expression is a process that happens in the synthesis of DNA and RNA so that the damaged cells can be replaced with a healthy copy of the same cell. The expression of a gene gives the activities of the gene under different conditions. Gene expression refers to the degree of activation of a gene of a certain organism in particular time. The gene expression data can be expressed as an expression matrix M = {w ij : 1 ≤ i ≤ n, 1 ≤ j ≤ m} as given in Figure 6.5. In Figure 6.5, the rows delegate (G = { g⃗ 1 , . . . , g⃗ n }) patterns of gene expression. The columns S = {S1 , . . . , S m } in matrix M delegate the expression profiles of samples. Each cell w ij represents the degree of expression of gene i in sample j. The gene expression data can be clustered using any one of the various existing clustering algorithms to classify the data in clusters. So that the related genes will come under same cluster. The genes clustered under same cluster will have a higher degree of relationship, and vice versa is also true. Methodology for processing gene expression data is given in Fig. 6.6.
140 | 6 Rough set and soft set models in image processing
Fig. 6.5: Gene expression matrix
Fig. 6.6: Methodology for processing gene expression data
6.4.4.1 Entropy filtering Entropy filtering can be used for univariate gene selection and to calculate the significance of the genes. Entropy shows the uncertainty of a random variable. For interdependency measurement of two random genes X and Y Shannon’s information theory can be used [1]. H(X) = − ∑ i P(X i ) log P(X i ) (6.20) For each individual gene X, H(X) denotes the entropy value of X. Information gain for the random genes can be computed using this filter. The selection of genes depends on the higher gain value. IG(X, Y) = H(X) + H(Y) − H(X, Y)
(6.21)
6.4 Soft sets and their applications in image processing
141
|
6.4.4.2 Entropy based gene selection procedure Step 1: Input gene expression matrix G = g1 , g2 , . . . , g n , where g1 , g2 , . . . , g n are given genes. k = c1 , c2 , . . . , c n are the classes in the given data. Step 2: Using equation (6.20) compute the entropy of every class. Step 3: Using equation (6.21) compute information gain value of every gene. Step 4: The genes having highest entropy value will be selected. 6.4.4.3 Fuzzy soft rough k-means clustering In this section the fuzzy soft rough k-means clustering algorithm mentioned in Dhanalakshmi and Inbarani (2012) is discussed. In the said article the authors have shown the methodology to use fuzzy soft set similarity for clustering of cancerous genes from a set of samples of gene expression values. Algorithm 6.2: This algorithm consists multiple phases for different processes. 1. Pre-processing Phase Step 1: Select significant genes using entropy filter approach. Remove the genes with lower entropy values. Step 2: Fuzzify the attribute vector values using S-shaped membership function or Z-shaped membership function. 2. Clustering Phase K – Number of clusters n – Total number of Genes m – Number of samples W lower , W upper are threshold (ϵ). Output: K-gene clusters Step 1: Randomly assign each object into exactly one lower approximation C K , the objects also belongs to upper approximation C K of the C KB . Step 2: Compute Cluster centroids Z K . i = 1, 2, . . . , n; j = 1, 2, . . . , m and h = 1, 2, . . . , k if C KB = C K − C K ≠ ϕ then Z K = (W lower ×
̄ K − C .X i ∑X ∈C ∑ X ∈ C K .X i K ) ) + (W Upper ̄ C K C K − C K
else Compute new centroids Z K = ∑ X∈C K X i C K / |C K | end if Step 3: Find Similarity S i using the following formula S i (X, Z) = 1 −
∑nj=1 |X ij − Z ij | ∑nj=1 |X ij + Z ij |
where X i – Represents genes, Z – Represents a centroid. Step 4: Compute P i = max(S i )/min(S i ) and normalize the P i values If P i ≥ (ε). i-th object in K-th Cluster. Step 5: Update centroids. Repeat the steps 2–step 5, until New centroid = Old centroid.
(6.22)
142 | 6 Rough set and soft set models in image processing
Experimentally the authors have shown that the proposed clustering algorithm is a having higher accuracy than other two bench mark clustering algorithms. 6.4.4.4 Framework for medical images classification using soft set Lashari and Ibrahim proposed a framework for medical images classification using soft set. The framework having six different phases of image processing techniques, such as acquisition, preprocessing, partition, soft set classifier, data analysis, and performance evaluation.
6.5 Conclusion Pattern recognition is an important field of study dealing with patterns, their properties and applications. Also, uncertainty in images is a common feature now days. In this chapter we studied different aspects of pattern recognition in general with an emphasis on image processing using two of the uncertainty based models; rough sets and soft sets. Rough sets, since its inception has been used in many fields of investigation and several application areas. Pattern recognition is one such field. Recently this model and its hybrid models like rough fuzzy sets and intuitionistic fuzzy rough sets have been used for colour image segmentation. We have presented one such algorithm that deals with many new concepts [24]. Soft set is one of the growing fields in research. The concept has wide applications in the fields of decision making, data mining, image processing, etc. In this chapter we discussed the role of soft sets in image processing. Feature selection plays a vital role in cancer classification. We also discussed an unsupervised soft set based quick reduct (SSUSQR) algorithm.
References [1] [2]
[3] [4] [5] [6]
[7]
R. C. Gonzalez, R. E. Woods, et al. Digital image processing, 2002. T. R. Sooraj, R. K. Mohanty, and B. K. Tripathy. Fuzzy soft set theory and its application in group decision making. In Advanced Computing and Communication Technologies, pp. 171–178. Springer, 2016. B. K. Tripathy and K. R. Arun. A new approach to soft sets, soft multisets and their properties. International Journal of Reasoning-based Intelligent Systems, 7(3–4):244–253, 2015. K. T. Atanassov. Intuitionistic fuzzy sets. Fuzzy Sets and Systems, 20(1):87–96, 1986. L. A. Zadeh. Fuzzy sets. Information and control, 8:3, 1965. A. Mohabey and A. K. Ray. Fusion of rough set theoretic approximations and FCM for color image segmentation. In Systems, Man, and Cybernetics, 2000 IEEE International Conference, vol. 2, pp. 1529–1534. IEEE, 2000. D. Molodtsov. Soft set theory—first results. Computers & Mathematics with Applications, 37(4):19–31, 1999.
References | 143
[8]
[9]
[10]
[11] [12]
[13] [14]
[15] [16]
[17] [18] [19] [20] [21] [22] [23] [24]
[25] [26] [27] [28]
P. Roy, S. Goswami, S. Chakraborty, A. T. Azar, and N. Dey. Image segmentation using rough set theory: a review. International Journal of Rough Sets and Data Analysis (IJRSDA), 1(2):62–74, 2014. N. M. Anupama, S. S. Kumar, and E. S. Reddy. Rough set based MRI medical image segmentation using optimized initial centroids. International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS), 6(1):90–98, 2013. X. Fu, J. Liu, H. Wang, B. Zhang, and R. Gao. Rough sets and neural networks based aerial images segmentation method. In International Conference on Neural Information Processing, pp. 123–131. Springer, 2012. E. V. Reddy and E. S. Reddy. Image segmentation using rough set based fuzzy k-means algorithm. International Journal of Computer Applications, 74(14), 2013. D. N. Vasundhara and M. Seetha. Rough-set and artificial neural networks based image classification. In Contemporary Computing and Informatics (IC3I), 2016 2nd International Conference on Contemporary Computing and Informatics, pp. 35–39. IEEE, 2016. G. Suseendran and M. Manivannan. Lung cancer image segmentation using rough set theory. Indian Journal of Medicine and Healthcare, 4(6), 2015. S. A. Lashari and R. Ibrahim. A framework for medical images classification using soft set. Procedia Technology, 11:548 – 556, 2013. 4th International Conference on Electrical Engineering and Informatics, ICEEI 2013. S. Sreedevi and E. Sherly. A new approach to microcalcification detection using fuzzy soft set approach. Indian Journal of Computer Science and Engineering (IJCSE), 7(2):46–53, 2016. R. V. Krishna and S. S. Kumar. Color image segmentation using soft rough fuzzy c-means clustering and SMO support vector machine. International Journal on Signal & Image Processing, 6(5):49–62, 2015. B. K. Tripathy and J. Anuradha. Soft Computing- Advances and Applications. Cengage Learning, 2015. M. M. Mushrif and A. K. Ray. A-IFS histon based multithresholding algorithm for color image segmentation. IEEE signal processing letters, 16(3):168–171, 2009. C. H. Li and P. C. Yuen. Tongue image matching using color content. Pattern Recognition, 35(2):407–419, 2002. A. E. Hassanien. Intelligent data analysis of breast cancer based on rough set theory. International Journal on Artificial Intelligence Tools, 12(04):465–479, 2003. A. Halder and A. Dasgupta. Color image segmentation using rough set based k-means algorithm. International Journal of Computer Applications, 57(12), 2012. A. Hassanien. Fuzzy rough sets hybrid scheme for breast cancer detection. Image and vision computing, 25(2):172–183, 2007. M. M. Mushrif and A. K. Ray. Color image segmentation: Rough-set theoretic approach. Pattern Recognition Letters, 29(4):483–493, 2008. Y. K. Dubey, M. M. Mushrif, and P. R. Nehare. Multiscale intuitionistic fuzzy roughness measure for color image segmentation. In Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing, p. 14. ACM, 2014. S. K. Pal. Rough-Fuzzy granulation, rough entropy and image segmentation. In First Asia International Conference on Modelling & Simulation, AMS’07, pp. 3–6. IEEE, March 2007. P. Majumdar and S. K. Samanta. Similarity measure of soft sets. New Mathematics and Natural Computation, 4(01):1–12, 2008. H. Liu and H. Motoda. Computational methods of feature selection. CRC Press, 2007. X. Sun, X. Tang, H. Zeng, and S. Zhou. A heuristic algorithm based on attribute importance for feature selection. In International Conference on Rough Sets and Knowledge Technology, pp. 189–196. Springer, 2008.
144 | 6 Rough set and soft set models in image processing
[29] G. Jothi and H. H. Inbarani. Soft set based feature selection approach for lung cancer images. arXiv preprint arXiv:1212.5391, 2012. [30] Z. Pawlak. Rough sets. International journal of computer & information sciences, 11(5):341– 356, 1982.
Alokananda Dey*, Sandip Dey, Siddhartha Bhattacharyya, Vaclav Snasel, and Aboul Ella Hassanien
7 Quantum inspired simulated annealing technique for automatic clustering Abstract: Clustering is a popular data mining tool whose aim is to divide a data set into a number of groups or clusters. The aim of this work is to develop a quantum inspired algorithm which is capable of finding the optimal number of clusters in an image data set automatically. The article aims to attention on the quantum inspired automatic clustering technique based on a meta-heuristic algorithm named simulated annealing. The quality of this clustering algorithm has been measured by two separate fitness functions named DB index and I index. A comparison has been made between the quantum inspired algorithm with its classical counterparts on the basis of the mean of the fitness, standard deviation, standard error and computational time. Finally, the proof of the superiority of the proposed technique over its classical counterparts has been made by the statistical superiority test named t-test. The proposed technique has been used to cluster four publicly available real life image data sets and four Berkeley image datasets of different dimensions. Keywords: Simulated Annealing, Quantum Computing, Automatic Clustering, DB Index, I Index, t-test
7.1 Introduction The aim of clustering is to reduce the size of a data set by categorizing similar data items together [1, 2]. The main motivations behind the use of clustering algorithms are to provide automated tools to help in constructing categories or taxonomies in the same way human a does. Also, these methods may be used to minimize the human efforts in the process. So, formally we can define clustering as the grouping of similar objects together from a particular set of objects based on their similarity or closeness. The aggregation is carried out in such a way that the objects in the same cluster resemble each other in some way, and objects in different clusters are dissimilar in the same way.
*Corresponding author: Alokananda Dey, Siddhartha Bhattacharyya, Department of Computer Application, RCC Institute of Information Technology, Beliaghata, Kolkata-700015, India Sandip Dey, Department of Computer Science & Engineering, OmDayal Group of Institutions, Birshibpur, Howrah-711316, India Vaclav Snasel, Faculty of Electrical Engineering and Computer Science, VSB-Technical University of Ostrava, Ostrava, Czech Republic Aboul Ella Hassanien, Faculty of Computers and Information, Department of Information Technology (IT),Cairo University, Cairo, Egypt https://doi.org/10.1515/9783110552072-007
146 | 7 Quantum inspired simulated annealing technique for automatic clustering Let us consider a dataset X = {x1 , x2 , . . . , x n } be a set of n objects. The aim of a clustering algorithm is to divide the dataset into p numbers of clusters viz. C1 , C2 , . . . , C p , which follows the following properties [1]. 1) C i ≠ 0 , for i = 1, 2, . . . , p 2) C i ∩ C j = 0 , for i = 1, 2, . . . , p i, j = 1, 2, . . . , p and i ≠ j p
3) ⋃ C i = X i=1
This means each of the clusters must contain at least one object, no two clusters can have the same objects and the union of all the clusters should be X. A good clustering algorithm always tries to minimize the inter cluster similarity and maximize the intra cluster similarity of a set of objects. Clustering techniques have been used in different disciplines like psychiatry, financial data analysis, etc. For example, Levine et al. used clustering technique to develop a classification of mental depression in 1969 [3]. Several clustering techniques have been developed so far. Some of them are interested in finding out the number of clusters from an input image data set automatically. Also, it has been proven that the evolutionary meta-heuristic algorithms are performing well in this regard but the quantum inspired evolutionary meta-heuristic algorithms out perform their classical counterparts [4]. The fundamental concept of quantum computing (QC) has been generated from quantum mechanics [5]. Due to the capability of executing its basic operations at the atomic level it can perform better than the traditional computer which makes it more attractive and appreciable. Basically, quantum computers can process millions of operations at the same time. So, it can be embedded to increase the efficiency of the traditional algorithms to a great extent. This influences researchers to incorporate the excited features of quantum computing into meta-heuristic algorithms. The excited features of quantum computing are superposition, quantum orthogonality and entanglement which help the traditional algorithms to perform better. A few examples of such algorithms are presented in [6]. Meta-heuristic algorithms can be used to solve almost all kinds of optimization problems and they are able to provide a sufficiently good solution of an optimization problem [7]. These types of algorithms are designed in such a way that they can optimize a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. They can solve both simple and complex problems efficiently by exploring the search space to determine the optimal solution within a stipulated time frame. These type of algorithms can be reframed in such a way that they are able to adapt any situation with relatively few modifications using some strategic guidelines. They generally collect information from other sources to find out the global optima by directing the search process towards that. Some of the meta-heuristic algorithms are ant colony optimization (ACO), simulated annealing (SA), particle
7.2 Type of clustering | 147
swarm optimization (PSO) [8], genetic algorithm (GA) [9], tabu search (TS) [10–12], etc. A few applications of meta-heuristic techniques are presented in [7, 13]. In this article a quantum inspired simulated annealing technique has been presented for automatic clustering of gray level images. This technique has been applied to several gray scale and Berkeley image datasets of different dimensions to find out the optimal number of clusters on the run. The chapter is organized as follows. Section 7.2 describes the basic types of clustering techniques. In Section 7.3, the basis of quantum computing is presented to provide basic concepts of quantum computing. In Section 7.4, an overview of basic simulated annealing is discussed. Section 7.5 presents the details about cluster validity indices. Two different cluster validity indices, namely DB Index and I Index are described in this section. A brief literature survey is presented in 7.6. The proposed methodology is described in Section 7.7 in detail. The section includes a discussion about the proposed algorithm, its complexity analysis, flow Charts and information regarding the dataset used for experimental purposes. In Section 7.8, the overall analysis of the experimental result is discussed. Finally, a concluding section regarding the proposed method is presented in Section 7.9. In that section, we discuss the future scope of research in the direction of quantum computing.
7.2 Type of clustering Various types of clustering methods have been developed for image segmentation, pattern recognition and exploratory data analysis. These methods can be broadly classified into two basic types; namely, hierarchical and partitioning clustering. Each of these methods can be further divided into two methods [14]. In the case of the agglomerative method, at each level the two most analogous clusters are combined together and the combined clusters remain intact in all upper levels. In the divisive method, the entire data set is initially considered as a solitary cluster which is then divided into smaller clusters at each level of the hierarchy depending on the properties of the data. In the case of partitioning clustering, the entire data set is directly decomposed into a set of disjoint clusters. There are different types of partitioning clustering methods available. Among them, the most popular is k-means clustering [1], in which each cluster is represented by the center or mean of the data points belonging to the cluster. The classification of clustering techniques is described in Figure 7.1.
148 | 7 Quantum inspired simulated annealing technique for automatic clustering
Clustering Method
Hierarchical Method
Agglomerative Method
Partitioning Method
Divisive Method
Non- Overlapping Method
Overlapping Method
Fig. 7.1: Classification of clustering
7.3 Basis of quantum computing Now a days computers play an important role in our daily life. Enormous advances have been made in the computation skill of a computer to make it more powerful. Yet, despite all efforts, a lot of complex problems exist that are beyond the reach of the world’s most efficient computers and no one can guarantee that they will be handled easily. With technological advancement, the memory units of computers are now going to advance almost on daily basis. This is required to change computing in a completely different way to obtain smaller and more efficient computers than current versions. So, entering the territory of atoms opens up powerful new possibilities in quantum computing along new processors that will be able to complete to a task millions of times quicker than computers we use today. However, quantum computing is enormously more complex than traditional computing. Quantum theory is basically considered as the branch of physics. It deals with the realm of atoms or subatomic particles inside them. The atomic world does not resemble the macro world. They behave in their own diminutive ways. The rules change for the atomic scale, and the “classical” laws of physics no longer automatically apply on them. Initially, in 1970s, some computer scientists and physicists brought the idea of a quantum computing to computational devices [15]. During bit manipulation inside a traditional computer an enormous amount of energy is wasted. By using a quantum computer this problem of wasting enormous amounts of energy can be prevented during the execution of extremely complex computations. So, in 1981, Paul Benioff from
7.3 Basis of quantum computing |
149
the Argonne national laboratory, first thought about a computer, which might be designed on the basis of principles of quantum physics [16, 17]. In 1982, it became a reality when Feynman showed that the fundamental computations can be carried out on a quantum system [18]. The first universal quantum turing machine was proposed in 1985 by David Deutsch of the University of Oxford [15]. The Deutsch–Josza’s oracle [19] and Simon’s oracle [20] were discovered in 1992 and 1994, respectively, and these two are basically considered as the beginning of progress in quantum algorithms. In 1994, Shor’s algorithm was published [21]. This algorithm was considered to be a phase transistor of the development of quantum computing. This work also strongly encourages researchers to research quantum computing. After 2000, an enormous growth happened in the field of quantum computing [5]. Unlike a classical computer, a quantum computer uses quantum bits or in short qubits, which work in an attractive way. In a classical computer either a zero (0) or a one (1) can be stored as a bit, but in a quantum computer at a time a zero (0), a one (1), both zeros (0) and ones (1), or an infinite number of values in between can be stored in qubits [6]. That means a quantum superposition state all four of the possible states can co-exist simultaneously in time and space. A quantum computer (QC) can process them simultaneously as it stores multiple numbers at once. It has the ability to process more than one task in parallel instead of working in a sequential order. Due to the ability of executing the computation in a parallel way it is able to complete it’s tasks millions of times faster than any conventional computer. In a two-dimensional Hilbert space arrangement a qubit can be defined as a unit vector. It is basically a two state quantum-mechanical phenomena [4, 6, 22–26]. The superposition of the basis states in a QC can be expressed as [27, 28] n
|ψ⟩ = ∑ c p |v p ⟩ = c1 |v1 ⟩ + c2 |v2 ⟩ + ⋅ ⋅ ⋅ + c n |v n ⟩
(7.1)
p=1
where, v p refers to the p-th states and c p ∈ ℂ. In particular for a two state quantum bit equation (7.1) can be rewritten as |ψ⟩ = c1 |0⟩+c1 |1⟩. The states |0⟩ and |1⟩| are referred as the “ground state” and the “excited state”, respectively, and c p is a complex number satisfying the relationship (quantum orthogonality), given by [27, 28] n
∑ c2p = 1.
(7.2)
p=1
A qubit is typically a microscopic system, such as an atom, a nuclear spin, or a polarised photon. A collection of n qubits is called a quantum register of size n. A mathematical description of a quantum register can be achieved by using the tensor products of qubit bra or ket vectors. For example, an n qubit quantum register and its equivalent decimal value can be described as |1⟩⊗|0⟩⊗⋅ ⋅ ⋅ ⊗|1⟩⊗|1⟩ ≡ |11 ⋅ ⋅ ⋅ 01⟩ ≡ |D⟩. Here, D is the identical decimal number of the qubits in a quantum register of size, n and ⊗ is called the tensor product. QC uses quantum logic gates as hardware devices. They
150 | 7 Quantum inspired simulated annealing technique for automatic clustering
are used to update qubits individuals using a predefined unitary operator. Generally they act over a fixed time period. Mathematically, for the unitary operator, U, quantum gates hold the relationship given by U † = U −1 and UU † = U † U = I. Some popular quantum gates are the NOT gate, C-NOT gate, Hadamard gate, Toffoli gate, controlled phase-shift gate, Fredkin gate, etc. For example, a rotation gate responsible for updating the i-th qubit value (α i , β i ) is α cos(θ i ) − sin(θ i ) α i (7.3) [ i ] = [ ][ ]. sin(θ i ) cos(θ i ) βi βi Here, θ i is the rotation angle of each qubit and is generally designed to be compliant with specific problems. In QC there two stimulating features, called coherence and decoherence. Coherence can be described as a linear superposition of the basis states of |ψ⟩ as given in equation (7.1). And decoherence can be achieved by forcefully destroying the above mentioned linear superposition. Quantum entanglement is a quantum mechanical phenomenon which can be defined as a unique correlation between the existing quantum systems. In QC, the entangled qubit states help to accelerate the computational capability to a very large extent [27, 28].
7.4 Overview of simulated annealing Simulated annealing (SA) is a popular optimization algorithm. SA is stochastic in nature and this search technique is designed to avoid local optima [29]. This algorithm is basically influenced by the thermal process called annealing, in which low energy states of a solid are obtained by a heating and cooling process. Initially the temperature of the heat bath is increased up to a high value at which the solid melts and then the temperature of the heat bath is decreased carefully until the particles organize themselves in the ground state of the solid. The ground state is the lowest energy state of the solid. The process of annealing can be simulated with the Metropolis algorithm, which is based on Monte Carlo techniques. An objective function is chosen as an energy function such that a better solution has a lower energy. Initially the search is started with some randomized state. Next it searches for the next random solution near the current one and the next solution is chosen with a probability that depends upon the temperature T along with the energy function value of the next solution. When T is high SA generally chooses the next solution randomly and when the T is low it follows the energy function downward. Slower cooling of T increases the probability of finding the global optimum. The probability of finding the global optimum increases when the cooling of T proceeds slowly [15]. The details of SA are described in Algorithm 7.1.
7.4 Overview of simulated annealing
| 151
Algorithm 7.1: Steps of SA Input: Initial Temperature with a large value: T1 Final Temperature with a very small value: T2 A cooling rate: ς Number of Iterations: I Output: Optimal Solution: O s 1: Initialize a state S by choosing a set of variables randomly. 2: E S = Evaluate the cost of the initial state S. 3: repeat 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16:
repeat Let N be a randomly generated neighbour state of S. E N = Evaluate the cost of the neighbour state N. Compute ∆E = E N − E S . if (∆E < 0) then Accept E N and update S by N else ∆E accept E N and update S by N with the probability P = e T1 . end if until I becomes 0 Set T1 = ς × T1 , where (0 < ς < 1) until T1 becomes T2 Update O S by S.
The basic SA procedure is known as Boltzmann annealing. Considering the efficiency of SA the cooling schedule of T should be chosen efficiently. If T is decreased very quickly then a premature convergence to a local minimum may occur. And in contrast if it is decreased very slowly then the algorithm may converge very slowly. So, the rule for decreasing temperature is a very important parameter. Thus, the parameters of the cooling schedule can be defined experimentally. These following parameters should be handled properly [15] 1. Initial temperature 2. Markov Chains Length 3. Temperature decreasing rule 4. Final temperature In the case of an inhomogeneous algorithm, the temperature can be reduced after each iteration, while in the case of the homogeneous algorithm; it can be reduced after a certain number of iterations. Also, self-adaptive schedules are extensively used for a given problem instance. In the case of self-adaptive schedules the reduction of the temperature can be adjusted by themselves automatically. In this article a numerical reduction function has been chosen for the cooling schedule T(t + 1) = ςT(t), where 0 < ς < 1, where ς is called the reduction rate and T is the temperature. The initial temperature and the reduction rate have been chosen by several ways during the execution of the proposed algorithm. The other important parameter of SA is the neighbourhood solution which is also chosen randomly from the solution space.
152 | 7 Quantum inspired simulated annealing technique for automatic clustering
7.5 Cluster validity indices In this article, two different cluster validity indices have been used, namely the Davies–Bouldin (DB) [1] Index and the I Index [30] for experimental purposes.
7.5.1 Davies–Bouldin (DB) index The DB index is a function of the ratio of the sum of within-cluster scatter to betweencluster separation. For this purpose it uses cluster centroids. A cluster similarity measure R ij between the clusters C i and C j can be defined as [1] R ij =
si + sj d ij
(7.4)
and should satisfy the following conditions. – R ij ≥ 0 – R ij = R ji – if S i = S j = 0 then R ij = 0 – if S j = S k and d ij < d ik then R ij > R ik – if S j > S k and d ij = d ik then R ij > R ik Here, S i is the dispersions measure of i-th cluster and can be defined as 1
2 1 n i (i) S i = [ ∑ ‖x i − z i ‖2 ] n i i=1
(7.5)
where, n i and z i represent the total number of objects and the i-th cluster centres in (i) clusters C i respectively, ∀x i ∈ C i . The cluster dissimilarity measure d ij is the distance between clusters C i and C j and can be defined as d ij = ‖z i − z j ‖. Finally the Davies–Bouldin (DB) index can be defined as [1] DB =
1 nc ∑ Ri nc i=1
(7.6)
where, R i = maxj=1,2,...,nc , i=j̸ (R ij ), i = 1, 2, . . . , nc . The optimal number of clusters can be achieved by minimizing the value of the DB index.
7.5.2 I index Let us consider a data set X = {x1 , x2 , . . . , x n } that is to be partitioned into K number of clusters, viz., (C1 , C2 , . . ., C K ). A matrix P(X) having dimension K × n will be
7.6 Related work | 153
responsible for the partitioning of the dataset X and can be represented as [30] P(X) = {[p ij ], s.t, i = 1, 2, . . . , K and j = 1, 2, . . ., n} where p ij is the membership of pattern x j to cluster C i . In the case of crisp partitioning of data the following condition will be satisfied {1 , if x j ∈ C i ; p ij = { 0 , otherwise. { This index (I) [30] measures the separation based on the maximum distance between cluster centers and measures the compactness based on the sum of the distance between objects and their cluster center. The I Index can be defined as I(K) =
Ei 1 × × DK K EK
(7.7)
where, K denotes the number of clusters, E i = ∑x i ∈X ‖x i − c‖, E K = ∑Ki=1 ∑nj=1 p ij ‖x i − c‖, and D K = maxp,q∈K,q=p̸ ‖Z p − Z q ‖. Here, c is the center of the dataset X, n is the total number of points in the dataset and z i is the center of the i-th cluster. It can been seen from equation (7.7), that the index I is composed of three factors, namely, 1/K, E i /E K and D K . In this case, if K is increased then the first factor will try to decrease the value of index I. In the second factor E i will be a constant for a given dataset and E K will be decreased for an increase in K. So for this second factor the index I will be increased as E K decreases. Finally the third factor D K will be increased with the increment of K. Thus the overall performance is measured by the balancing of these three factors. The contrast between the different cluster configurations are controlled by using the power of r. For experimental purposes, the value of r was chosen as two. The optimal number of clusters can be obtained by maximizing the value of I index [30].
7.6 Related work In the past few years tremendous research effort has been done to develop the clusters in complex data sets through evolutionary computing techniques. Some evolutionary algorithms are available for clustering techniques which accept the number of classes as an input instead of determining the same on the run. But in some practical situations, it is difficult or often impossible to determine the appropriate number of groups in a previously not handled data set. A review of some major nature-inspired meta-heuristic algorithms for automatic clustering has been presented in this paper [31]. Data pre-processing task likes cleaning and normalization techniques have been proposed in [32] to produce optimum quality clusters. In this paper, automatic initialization of centroids has been proposed which is based on a modified k-means algorithm.
154 | 7 Quantum inspired simulated annealing technique for automatic clustering
An automatic clustering algorithm based on differential evolution is presented in [33]. Here a comparison is shown between the proposed technique and two popular optimization algorithms, namely the genetic algorithm and particle swarm optimization. Considering a relevant problem of social media like identification of faces among a large number of unlabeled face images was the research interest of some authors [34]. Their goal was to develop an efficient and effective clustering algorithm named RankOrder to achieve the desired scalability, and better clustering accuracy than other wellknown algorithms such as k-means and spectral clustering. Very recently an automatic clustering algorithm was developed which is based on an efficient meta-heuristic optimization technique called Grey Wolf Optimizer [35]. In this paper satellite image segmentation has been considered for implementing the proposed algorithm. Finally, a quantum inspirited automatic clustering technique has been presented in [26]. This paper introduced a quantum inspirited evolutionary algorithm to automatically find out the optimal number of clusters from image datasets. Genetic Algorithm has been used as an evolutionary algorithm along with features of quantum computing. A comparison between the quantum inspired technique with its classical counterpart has been presented to show the accuracy and effectiveness of the proposed technique.
7.7 Proposed metrology In this work a novel quantum inspired automatic clustering technique along with a meta-heuristic algorithm named simulated annealing has been introduced. The proposed technique has been applied to find out the optimum number of clusters of real life gray scale images and Berkeley images of various dimensions. Two popular clustering validity indices have been applied as an objective function for assessing the performance of the proposed technique. They are called Davies–Bouldin index (DB) [1] and I index [30]. This proposed technique is subdivided into the following parts. 1. Initialization: Initially a configuration Sc having length L has been created by choosing the image intensities randomly from the image dataset. Here L is considered as the square root of the maximum gray value of the test image dataset. 2. Encoding: The initial configuration has been encoded between a random number (0,1) to produce S+c and then each element of S+c has been passed through a quantum gate to satisfy the property of quantum orthogonality which is a fundamental feature of quantum computing. By establishing the quantum orthogonality, S++ c has been created. Also the image dataset has been normalized between (0,1) to produce Ds .
7.7 Proposed metrology
| 155
Fig. 7.2: Centroid representation scheme in QIACSA
3.
4.
5.
String Representation: For experimental purposes a new arrangement of strings have been introduced by using the basic features of quantum computing. Consider a string having r number of cluster centroids. Generally, a cluster centroid may be composed of multi-dimensional data points (say, d). Since in this article only the image pixel intensities have been considered so the value of d is one. The string representation arrangement are shown in the following Figure 7.2. In this example, the cluster centroids have been created with r number of qubits. At the beginning the initial configuration has been encoded by α i , 1 ≤ i ≤ L, thereafter, applying quantum orthogonality, β i , 1 ≤ i ≤ L, is produced. During the execution, a number of cluster centroids are activated at random for a particular configuration. The activation thresholds (ϕ i , i ≤ i ≤ r) are set to 1 for activated cluster centers, and set to 0 for non activated cluster centers. In this paper, initially for activating the cluster centers a random number R between (0,1) has been considered. If R > |β i |2 , 1 ≤ i ≤ L, is then the value of ϕ i is 1, otherwise, it is 0. Perturbation: For a given configuration the perturbation takes place by randomly changing a value of that configuration in a meaning way. In this article for performing perturbation a random number R is generated between (0, L) as a position of perturbation from the current configuration. Then the value of that position of the S++ c has been changed by a random number between (0,1) and also the corresponding value of S+c has to be changed by satisfying the property of quantum orthogonality. Fitness Function: The main aim of any clustering algorithm is to separate a given dataset into an optimal number of sub datasets which are called clusters. The quality of this kind of partitioning is generally evaluated by using cluster validity indices. In this article two popular cluster validity indices named DB index and I index have been introduced to measure the quality of the optimal number of clusters from any type of image dataset. The cluster validity index DB index has been used in QIACSADB and I index has been used in QIACSAI. The fitness value has been calculated in QIACSADB and can be represented as fDB = DB
(7.8)
156 | 7 Quantum inspired simulated annealing technique for automatic clustering
Fig. 7.3: Flow graph for QIACSADB
where, DB indicates the computed DB value of a configuration. In QIACSADB, fDB should be minimized to obtain the optimal number of clusters from any given input image dataset. The flowchart for QIACSADB has been presented in Figure 7.3. In addition to this, the fitness value of QIACSAI is computed by fI = I
6.
(7.9)
where, I indicates the computed I value of a configuration. In QIACSAI, f I should be maximized to obtain the optimal number of clusters from any given input image dataset. The flowchart for QIACSAI has been presented in Figure 7.4. Cooling Schedule: The various components of the cooling schedule basically includes the initial temperature (very large temperature), final temperature (say, 0), and a cooling rate ς (0 < ς < 1). Its purpose is to determine when and how much the temperature should be lowered down and when the annealing process is done.
7.7 Proposed metrology
|
157
Fig. 7.4: Flow graph for QIACSAI
7.7.1 Proposed algorithm The proposed methodology is represented here in an algorithmic structure as given in Algorithm . Algorithm 7.2: Steps of the Proposed Algorithm Input: Initial Temperature with a large value: Tmax Final Temperature with a very small value: Tmin A cooling rate: ς Maximum iteration number: I Output: Optimal number of clusters Solution: ONc Fitness value: Fv 1: Initialize a current configuration Sc having length L by choosing the pixel intensities randomly
from the input image. Let us consider L = √Mg , where, Mg is the maximum intensity value of a gray scale image.
158 | 7 Quantum inspired simulated annealing technique for automatic clustering
2: Now, each element in Sc is encoded to a real value between (0,1) by using the concept of qubits.
Let it produce S+c .
+ 3: Now using the quantum rotation gate, S+ c is updated by passing each element in Sc through the
4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:
feature of quantum computing called quantum orthogonality. Let it produce S++ c . Normalize the input image dataset between (0,1). Let it create Ds . Generate r number of cluster centers from the configuration Sc . Compute the fitness of the configuration S++ c by using the above mentioned objective functions. Let it produces Xcurr . Set Tt = Tmax and t = 1. while Tt > Tmin do for i = 0 to I do Create a new configuration called Nc by perturbing Sc . Repeat steps (2) and (3) to produce Nc+ and Nc++ . Repeat step (5) to generate r number of cluster centers from the configuration Nc . Compute the fitness of Nc++ by using the above mentioned objective functions. Let it produces Xnext . if Xnext is better than Xcurr then ++ Set Sc = Nc , S++ c = Nc , Xcurr = Xnext and Fv = Xnext . else −(Xnext −Xcurr )
++ Tt Set Sc = Nc , S++ c = Nc , Xcurr = Xnext with a probability e end if Now establish quantum orthogonality by using equation (7.2). end for Set t = t + 1 and Tt = Tt × ς. end while Return the optimal number of cluster ONc associated with the optimal fitness value Fv .
7.7.2 Complexity analysis The worst case time complexity of the proposed QIACSA algorithm is presented as follows. Step 1: Since Sc contains a single configuration, the time complexity becomes O(L), where L represents the length of the configuration. Steps 2–3: Since these three consecutive steps perform identical number of computations like step 1. So, the time complexity for performing these steps become O(L). Step 4: If the size of the dataset is N, the time complexity for normalizing the dataset becomes O(N). Steps 8–14: If the outer “while loop” and inner “for loop” are executed for i and j number of times, respectively in the proposed technique. So, the time complexity for executing these steps in QIACSA turns out to be O(i × j). So the overall worst case time complexity of the proposed QIACSA algorithm turns out to be O(L × i × j). The proposed algorithm is also depicted through a flow diagram, as given in Figure 7.5.
7.7 Proposed metrology
| 159
Fig. 7.5: Flow graph for the proposed QISA
7.7.3 Data set used In this paper, four real-life gray scale images and four Berkeley images of different dimension have been selected as the test image. The original test images are shown in Figure 7.6.
160 | 7 Quantum inspired simulated annealing technique for automatic clustering
Fig. 7.6: Original test images for (a) 86000 (80 × 120), (b) 92059 (80 × 120), (c) 94079 (80 × 120), (d) 97017 (80 × 120), (e) elaine_512 (512 × 512), (f) image (225 × 225), (g) image1 (225 × 225) and (e) Photo (512 × 512)
7.8 Experiment and result and analysis The experimental results are described by the following points. 1. Implementation: In this paper, a Python environment has been chosen for implementing the proposed technique and others. The proposed technique has been applied to two real life gray scale images of size 512 × 512, two real life gray scale images of size 225 × 225 and two Berkeley images of size 80 × 120 and other two Berkeley images of size 120 × 80. The execution of all these algorithms have been done on Windows 7 on a DELL Intel(R) Core(TM) i3, 2.00 GHz, 4.00 GB RAM. 2. Simulation Approach: In this article a novel quantum inspired automatic clustering technique has been developed. Application of the proposed techniques has been demonstrated on four real-life gray-scale images and four Berkeley images. For experimental purposes, the data set comprising pixel intensity values of the test images, are required to be normalized between (0,1).The framework of the proposed technique has been designed in such a way that this technique can be successfully applied for clustering purposes. Here two popular cluster validity indices, called DB index and I index have been used as the objective function in the proposed technique. In the case of the DB index the optimal solution can be obtained on the basis of minimizing the feasible solutions obtained by successive generations. On the other hand for I index the optimal solution can be obtained on the basis of maximizing the feasible solution obtained by successive generations.
7.8 Experiment and result and analysis
3.
| 161
Here a comparative study has been produced between the proposed technique and its classical counterparts. Experiments have been conducted for this classical technique by using the same cluster validity indices as the objective function. The performance of each technique has been judged based on the following criteria. (a) Quality of the solution has been measured using DB index and I index value; (b) Computing the standard error; (c) Accuracy and stability of each of the techniques; (d) Computational time (in seconds); (e) Statistical superiority test called the unpaired t-test; Computational results obtained from each technique: During the experiment each of the participating techniques has been run 40 times. Among them five representatives solutions (most promising) have been presented in Tables 7.1–7.4. The normalized values of eight test images have been used as the data sets for experimental purposes. The proposed technique can be applied to determine optimal number of clusters ONc from each input image and two popular cluster validity indices, called the DB index and the I index, have been used as the fitness/objective function in the proposed technique. The proposed technique has been compared with its classical counterpart for comparative purposes of different aspects. To find out the accuracy and stability of each technique the mean (μ) and standard deviation (σ) have been calculated among different runs and the computed results are shown in Table 7.5. Standard error (SE) has also been computed for each of the comparable techniques and the obtained results are presented in Table 7.6. The computational time for each technique is shown in Table 7.7.
Tab. 7.1: Five representative solutions for QIACSADB SNo.
86000
92059
94079
97017
ONc
Fv
ONc
Fv
ONc
Fv
ONc
Fv
1 2 3 4 5
4 4 4 5 4
0.208613 0.276964 0.263510 0.238577 0.252959
4 4 6 4 4
0.268912 0.245938 0.202835 0.248905 0.206708
4 4 4 4 4
0.209054 0.227280 0.265982 0.261041 0.213916
5 4 4 5 4
0.322705 0.235217 0.246791 0.284929 0.279810
SNo.
86000
1 2 3 4 5
92059
94079
97017
ONc
Fv
ONc
Fv
ONc
Fv
ONc
Fv
5 5 5 5 5
0.296923 0.230455 0.228444 0.201505 0.249699
4 5 5 5 4
0.229096 0.234707 0.219671 0.231535 0.234707
5 5 5 5 5
0.244906 0.210785 0.247503 0.271116 0.260054
4 4 4 4 4
0.274906 0.300785 0.317503 0.271116 0.260054
162 | 7 Quantum inspired simulated annealing technique for automatic clustering
Tab. 7.2: Five representative solutions for ACSADB SNo.
86000
92059
94079
97017
ONc
Fv
ONc
Fv
ONc
Fv
ONc
Fv
1 2 3 4 5
3 4 4 4 5
0.237880 0.220426 0.230212 0.336844 0.368286
4 4 3 3 4
0.310022 0.430792 0.080195 0.102124 0.099967
4 3 3 4 3
0.277848 0.216124 0.248747 0.298685 0.080195
4 4 4 4 4
0.296305 0.272508 0.312755 0.331299 0.302914
SNo.
86000
1 2 3 4 5
92059
94079
97017
ONc
Fv
ONc
Fv
ONc
Fv
ONc
Fv
5 5 5 5 4
0.309318 0.277862 0.245850 0.309318 0.277862
3 4 4 5 4
0.245850 0.309318 0.277862 0.245850 0.369318
4 5 5 5 4
0.267176 0.257442 0.206786 0.247521 0.250478
4 4 4 4 4
0.274906 0.360785 0.387503 0.287988 0.282139
Tab. 7.3: Five representative solutions for QIACSAI SNo.
86000
92059
94079
97017
ONc
Fv
ONc
Fv
ONc
Fv
ONc
Fv
1 2 3 4 5
4 4 5 4 4
0.112994 0.125822 0.128236 0.117859 0.139856
4 4 4 4 5
0.32147 0.31176 0.34984 0.33529 0.32299
4 4 4 4 4
0.624968 0.669955 0.613671 0.594753 0.648974
4 4 4 4 4
0.299575 0.268335 0.293349 0.304030 0.309932
SNo.
86000
1 2 3 4 5
92059
94079
97017
ONc
Fv
ONc
Fv
ONc
Fv
ONc
Fv
5 5 5 5 5
0.624837 0.609295 0.627158 0.629821 0.636140
4 4 5 4 4
0.619019 0.616271 0.626540 0.639886 0.622058
5 5 5 4 5
0.307629 0.323631 0.302916 0.318411 0.299128
4 4 7 4 4
0.640728 0.555603 0.635833 0.590052 0.691416
Finally the statistical unpaired t-test has been conducted between the proposed technique and its classical counterpart to prove the superiority of the proposed technique over its classical counterpart. The test has been conducted at a 5% significance level between each pair of techniques. This test basically finds the pvalue, which actually decides the acceptance between the null and alternative hypothesis. That means that a p-value less than 0.05 (for 5% significance level) indicates that the null hypothesis should be rejected and the alternative hypothesis accepted. The results of unpaired t-test are presented in Table 7.7.
7.8 Experiment and result and analysis
| 163
Tab. 7.4: Five representative solutions for ACSAI SNo.
1 2 3 4 5 SNo.
1 2 3 4 5
86000
92059
94079
97017
ONc
Fv
ONc
Fv
ONc
Fv
ONc
Fv
6 5 5 7 4
0.016564 0.018470 0.021779 0.012981 0.016414
6 5 5 4 4
0.278 0.209 0.222 0.229 0.221
6 4 5 4 5
0.108541 0.169557 0.129449 0.103971 0.174531
4 4 4 4 4
0.095686 0.088818 0.107724 0.101337 0.128160
ONc
Fv
ONc
Fv
ONc
Fv
ONc
Fv
5 5 5 5 5
0.446596 0.528930 0.862211 0.586447 0.912759
5 7 7 6 6
0.232472 0.088141 0.113605 0.132647 0.138900
5 6 5 5 5
0.085136 0.124338 0.081386 0.063218 0.132186
4 4 4 4 4
1.801808 1.830439 1.709644 1.83478 1.77039
86000
92059
94079
97017
Tab. 7.5: Mean (μ) and standard deviation (σ) of fitness values for QIACSADB, ACSADB, QIACSAI and ACSAI DATA SET
86000 92059 94079 97017 elaine_512 Photo image image1
QIACSADB
ACSADB
QIACSAI
ACSAI
(μ)
(σ)
(μ)
(σ)
(μ)
(σ)
(μ)
(σ)
0.23909 0.244545 0.245934 0.257221 0.242463 0.22872 0.2368 0.284518
0.016564 0.049449 0.021479 0.03998 0.023315 0.006384 0.01444 0.015887
0.302081 0.204727 0.192704 0.226658 0.276095 0.279167 0.269092 0.29206
0.050542 0.100888 0.102914 0.050757 0.026205 0.045018 0.067496 0.039175
0.172584 0.268146 0.622901 0.303439 0.614902 0.615167 0.32057 1.793336
0.031368 0.054644 0.024722 0.012832 0.018237 0.031776 0.016042 0.047702
0.163502 0.232433 0.131421 0.095165 0.662001 0.129287 0.09187 0.632693
0.036128 0.061117 0.026317 0.016211 0.189482 0.039783 0.024176 0.050379
From Tables 7.5 and 7.6, it can be stated that, for almost all test images, the proposed technique has better values for the mean, standard deviation and standard error compared to its classical counterparts. From this it can be concluded that the proposed technique is better and more stable than the other. In Table 7.7, it has been shown that, when a proposed technique is being compared with other classical technique, it gives a p-value < 0.05 which indicates that the proposed technique is better as compared to the other technique at this given confidence level. In Table 7.8 a comparison has also been made between two cluster validity indices. Also the proposed techniques take less computational time than others to execute and it is shown in Table 7.8. So, on the basis of all these experiments the proposed system can claim its superiority over its classical counterparts.
164 | 7 Quantum inspired simulated annealing technique for automatic clustering
Tab. 7.6: Standard error (SE) of the mean for QIACSADB, ACSADB, QIACSAI and ACSAI for the test data set DATA SET
QIACSADB
ACSADB
QIACSAI
ACSAI
86000 92059 94079 97017 elaine_512 Photo image image1
0.005645 0.009028 0.004296 0.007299 0.005213 0.001427 0.003229 0.003177
0.009228 0.01842 0.020583 0.009267 0.00586 0.010066 0.015093 0.007835
0.005727 0.009977 0.004944 0.002343 0.004078 0.007105 0.003587 0.00954
0.006596 0.011158 0.005263 0.00296 0.042369 0.008896 0.005406 0.010076
Tab. 7.7: Results of two-tailed (unpaired) t-test (p-value) between each quantum inspired technique with their respective classical technique DATA SET
a&b
(SL )
c&d
(SL )
86000 92059 94079 97017 elaine_512 Photo image image1
< 0.00001 0.057109 0.014682 0.012091 0.000991 0.000015 0.043147 0.376775
1 3 2 2 2 2 2 3
< 0.00001 0.02032 < 0.00001 < 0.00001 0.275463 < 0.00001 < 0.00001 < 0.00001
1 2 1 1 3 1 1 1
Significance Level (SL ): 1 → Extremely Significant, 2 → Significant, 3 → Not Significant a: → QIACSADB, b: → ACSADB, c: → QIACSAI d: → ACSAI Tab. 7.8: Computation time of QIACSADB, ACSADB, QIACSAI and ACSAI DATA SET
QIACSADB
ACSADB
QIACSAI
ACSAI
86000 92059 94079 97017 elaine_512 Photo image image1
15.38 20.44 23.11 19.32 28.91 32.10 24.34 22.63
28.72 29.23 35.34 32.27 35.11 39.07 33.32 30.00
10.48 12.36 13.28 10.32 23.89 28.45 26.11 22.67
19.41 21.35 20.65 20.11 32.70 36.25 31.22 28.45
7.9 Conclusion
| 165
7.9 Conclusion In this paper a quantum inspired simulated annealing has been introduced for automatic image clustering. This paper finds the number of clusters of any gray scale image dynamically. This proposed technique can be considered as superior to its classical counterpart as it can be successfully applied to discover the optimal number of clusters of an image data set automatically, and within a shorter time than other techniques. At present, the technique is limited to gray level images while satisfying only one objective at a time. As to the future direction of research work, researchers are working on extending this work to the color and multi-objective domain.
References [1] [2]
[3] [4]
[5] [6]
[7] [8] [9] [10] [11] [12] [13] [14] [15]
D. L. Davies and D. W. Bouldin. A cluster separation measure. IEEE Transactions Pattern Analysis and Machine Intelligence, 1:224–227, 1979. S. Dey, S. Bhattacharyya, V. Snasel, A. Dey, and S. Sarkar. PSO and DE based novel quantum inspired automatic clustering techniques. In Proceedings of the third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, India, pp. 285–290, 2017. I. Pilowsky, S. Levine, and D. M. Boulton. The classification of depression by numerical taxonomy. The British Journal of Psychiatry, 115(525):937–945, 1969. S. Dey, S. Bhattacharyya, and U. Maulik. Quantum inspired automatic clustering for multilevel image thresholding. In Proceedings of the International Conference On Computational Intelligence and Communication Networks (ICCICN 2014),RCCIIT, Kolkata, India, pp. 242–246, 2014. D. Mcmohan. Quantum computing explained. John Wiley, Hoboken, New Jersey, 2008. S. Dey, S. Bhattacharyya, and U. Maulik. Quantum inspired metaheuristic algorithms for multilevel thresholding for true colour images. In Proceedings of IEEE Indicon 2013, Mumbai, India, pp. 1–6, 2013. F. Glover and G. A. Kochenberger. Handbook on Metaheuristics. Kluwer Academic, 2003. K. Kennedy and R. Eberhart. Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks (ICNN95), Perth, Australia, vol. 4, pp. 1942–1948, 1995. J. Holland. Adaptation in Natural and Artificial Systems. Ann Arbor: University of Michigan Press, 1975. F. Glover and M. Laguna. Tabu Search. Kluwer, Boston, MA, 1997. F. Glover. Tabu search, part I. ORSA Journal on Computing, 1:190–206, 1989. F. Glover. Tabu search, part II. ORSA Journal on Computing, 2:4–32, 1990. C. Blum and A. Roli. Metaheuristic in combinatorial optimization : overview and conceptual comparison. Proceedings of IEEE Indicon 2013, 31(3):1–6, 2013. A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264–323, 1999. D. Deutsch. Quantum theory, the church turing principle, and the universal quantum computer. Proc. Roy. Soc. Lond., A, 400(1818):97–117, 1985.
166 | 7 Quantum inspired simulated annealing technique for automatic clustering
[16] P. Benioff. Quantum mechanical models of turing machines that dissipate no energy. Physical Review Letters, 48(23):1581–1585, 1982. [17] M. Pour-el. and I. Richards. The wave equation with computable initial data such that its unique solution is not computable. Advances in Mathematics, 39:215–239, 1981. [18] R. Feynman. Simulating physics with computers. International Journal of Theoretical Physics, 21:467–488, 1982. [19] D. Deutsch and R. Jozsa. Rapid solution of problems by quantum computer. Proc. Roy. Soc. Lond., A, 439:553–558, 1992. [20] D. R. Simon. On the power of quantum computation. In Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science, pp. 116–123, 1994. [21] P. Shor. Algorithms for quantum computation: Discrete logarithms and factoring. In Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science, pp. 124–134, 1994. [22] S. Dey, I. Saha, S. Bhattacharyya, and U. Maulik. Multi-level thresholding using quantum inspired meta-heuristics. Knowledge-Based Systems, 67:373–400, 2014. [23] S. Dey, S. Bhattacharyya, and U. Maullik. Quantum behaved swarm intelligent techniques for image analysis: A detailed survey. In S. Bhattacharyya and P. Dutta, eds., Handbook of Research on Swarm Intelligence in Engineering, pp. 1–39. IGI Global, Hershey, USA, 2015. [24] S. Dey, S. Bhattacharyya, and U. Maullik. Optimum gray level image thresholding using a quantum inspired genetic algorithm. In S. Bhattacharyya, P. Banerjee, D. Majumdar, and P. Dutta, eds., Handbook of Research on Advanced Hybrid Intelligent Techniques and Applications, chapter 12, pp. 349–377. IGI Global, Hershey, USA, 2015. [25] S. Dey, S. Bhattacharyya, and U. Maulik. Quantum inspired genetic algorithm and particle swarm optimization using chaotic map model based interference for gray level image thresholding. Swarm and Evolutionary Computation, 15:38–57, 2014. [26] S. Dey, S. Bhattacharyya, and U. Maulik. Quantum inspired automatic clustering for multilevel image thresholding. In Proceedings of the International Conference On Computational Intelligence and Communication Networks (ICCICN 2014),RCCIIT, Kolkata, India, pp. 247–251, 2014. [27] S. Dey, S. Bhattacharyya, and U. Maulik. Efficient quantum inspired meta-heuristics for multilevel true colour image thresholding. Applied Soft Computing, 56:472–513, 2017. [28] S. Dey, S. Bhattacharyya, and U. Maulik. New quantum inspired meta-heuristic techniques for multi-level colour image thresholding. Applied Soft Computing, 46:677–702, 2016. [29] S. Kirkpatrik, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220:671–680, 1983. [30] U. Maulik and S. Bandyopadhyay. Performance evaluation of some clustering algorithms and validity indices. IEEE PAMI, 24:1650–1654, 2002. [31] A. García and W. Flores. Automatic clustering using nature-inspired metaheuristics: A survey. Applied Soft Computing, 41:192–213, 2016. [32] V. R. Patel and R. G. Mehta. Performance analysis of MK-means clustering algorithm with normalization approach. In World Congress on Information and Communication Technologies, pp. 974–979, 2011. [33] S. Das, A. Abraham, and A. Konar. Automatic clustering using an improved differential evolution algorithm. IEEE Transactions on Systems, Man, and Cybernetics, 38:218–237, 2008. [34] C. Otto, D. Wang, and A. K. Jain. Clustering millions of faces by identity. IEEE Transactions on Pattern Analysis & Machine Intelligence, 40(2):289–303, 2018. [35] S. Kapoor, I. Zeya, C. Singhal, and S. J. Nanda. A grey wolf optimizer based automatic clustering algorithm for satellite image segmentation. Procedia Computer Science, 115:415 – 422, 2017.
Mithun Roy, Indrajit Pan*, and Siddhartha Bhattacharyya
8 Intelligent greedy model for influence maximization in multimedia data networks Abstract: Currently research on influence propagation and influence maximization in various multimedia data networks and social networks are gaining immense attention. This work introduces two new heuristic metrics, in the form of diffusion propagation and diffusion strength. Diffusion propagation capacity of the connecting edges of a network is measured on the basis of diffusion strength. The diffusion strength and diffusion propagation are further utilized to design a greedy method for seed selection within a k-budgeting scheme where the k-value is derived from the cardinality of the vertex set. This approach makes the seed selection process more intelligent and practically consolidated. These sorted seeds are further used to maximize influence diffusion within the non-seed node of the network. The proposed algorithm was assessed on standard benchmark data sets and experimental findings were compared with some other state-of-the-art methods. Comparative analysis has revealed better computational time and high influence spread, with this method. Keywords: Diffusion propagation, influence maximization, linear threshold, social network, sub-modular function
8.1 Introduction Online social networks (OSNs) have earned notable importance since their inception over a decade ago. Many people participate in different online social networks such as Facebook, MySpace, Flickr, Twitter and LinkedIn. In 2011 the number of active Facebook users was recorded at around 800 million. The life of the netizens are largely influenced by these social web portals. These influences are impacting people at different levels through multiple ways. This ever growing popularity of different online social network portals provides new avenues for wide-scale viral marketing. Viral marketing is often targeted as a part of company’s smart business policy of advertising their products. It is very cost effective. Initially it targets just a few people. This set of users are introduced to the new product, in return the company expects that the set of people will start influencing their friends or friends of friends on social network to adapt their product. Thus the product will be publicized among a large population through the word-of-mouth effect via online social corridors. Here the basic complex-
Mithun Roy, Siliguri Institute of Technology, Siliguri, Darjeeling, India *Corresponding author: Indrajit Pan, Siddhartha Bhattacharyya, RCC Institute of Information Technology, Kolkata, India https://doi.org/10.1515/9783110552072-008
168 | 8 Intelligent greedy model for influence maximization
ity lies within the factor of selecting first group of users who will use the product and who will publicize it to a large community. This primary set of users is known as a seed set and publicity through seed users is known as influence spread. This optimization problem practically figures out a tiny set of influential nodes subject to a budget f in an efficient way to maximize the spread of influence. This tiny set is known as the seed set and the budget factor determines the cardinality of the seed set. The influence maximization problem figures out an effective mechanism for maximum influence spreading among all collaborating peers associated with the members of the seed set. Influence maximization is a stochastic optimization problem [1]. A strong motivation for cultivating information and influence of diffusion models and mechanisms is viral marketing [2]. Research on different information diffusion models began in the middle of the twentieth century. Subsequently different models were proposed on information diffusion. Linear threshold (LT) and independent cascade (IC) are two widely used models for information diffusion and influence spread. Threshold models were first proposed in [3]. Kempe et al. [1] first proposed a linear threshold model in their work. The linear threshold model suggests that every arc (u, v) ∈ E is assigned an influence weight w(u, v) ∈ [0, 1] (which is also termed as w uv ), this denotes the importance of u on influencing v. Each node v within the network will have a threshold (θ v ) in the interval [0, 1]. It represents a total weight which will be enforced upon v by its active neighbors in order to make v active. Each currently inactive node v becomes active if and only if the total weight of its active neighbors is at least θ v [4–6]: ∑
w(u, v) ≥ θ v .
(8.1)
u∈N in (v)
The independent cascade model was conceptualized on the basis of various studies on marketing strategies by Goldenberg et al. [7]. This model is the simplest cascading model which models information flow on the basis of node activation (influenced) probability. Apart from independent cascade there is another concept, the Epidemic model, based on the studies of Kermac et al. [8]. The current work is based on the linear threshold model. Initially the η-Closure of each vertex is computed which considers the vertex itself and all other vertices having incoming edges to it. Based on this (η-Closure), the diffusion strengthh (ξ ) of each vertex is derived for all other vertices. These diffusion strength parameters are used to prepare a diffusion adjacency matrix (A ξ ). The diffusion propagation (μ) metric for each edge is also measured on the basis of diffusion strength. Finally on the basis of these diffusion dynamics, a new greedy algorithm is proposed to select an effective seed set (S) and measure influence spread on the seed set (σ(S)). Existing works in [1, 5, 5, 9] typically assumed that the diffusion models are monotone. Hence, σ() is a monotone and sub-modular function of S. (σ(S)) represents an average effective number of active (influenced) vertices after some trials. If the number of total influenced vertices in an i-th trial is represented by (N vi ), then after n number of trials σ(S) can be
8.2 Literature survey
|
169
represented as, ∑i∈1...n (N vi ) . (8.2) n The proposed method was simulated on eight different real world data sets and has recorded an improvement in influence spread by 3%–48%. Experimental results were compared with some of the so far best reported results from Kempe et al. [1] and Chen [4]. This article is organized in the following manner. Section 8.2 contains review of related research, the proposed method is elaborated on in Section 8.3. Section 8.4 contains experimental results of this proposed method and finally the conclusion of the work with the scope of future research being discussed in Section 8.5. σ(S) =
8.2 Literature survey Evaluation of the strength of a node or edge has been a fundamental task in this domain but most of the previous works treated this as a classification parameter. Based on this measurement the components were classified as strong or weak components. There are three different situations to invoke for a user performing an operation, 1. Influence by family members and friends. 2. Some external incident has affected that user. 3. The user is a very active in nature and is performing a task without being influenced by anyone. Domingos and Richardson [10, 11] first proposed influence maximization as a computational optimization problem and they have identified its impact on viral marketing. Kempe et al. [1] focused on two different diffusion models, known as the independent cascade (IC) model and linear threshold (LT) model. Both the independent cascade (IC) model and linear threshold (LT) model have perceived influence maximization problems as NP-complete in nature. Kempe et al. [1] have used a Monte-Carlo simulation for an information diffusion process to estimate the influence spread σ(S). They have used a seed set S to simulate the randomized diffusion process for R times. In each iteration they took a count of the number of active nodes, and then calculated the average of these counts over the span of R times to produce σ(S). Minoux [9] proposed an accelerated greedy algorithm for general sub-modular set functions. A well-known optimization technique called lazy evaluation was applied for significant reduction in the count of evaluations without modifying the outcome of the greedy algorithm. Leskovec et al. [12] has empirically demonstrated that lazy evaluation can speed up by 700 times influence maximization related network optimization problems. Lazy evaluation is a general technique that works for all monotone and sub-modular functions. Recently, Goyal et al. [13] has proposed a CELF++ algorithm to further enhance the lazy evaluation technique for influence maximization problems under IC and LT
170 | 8 Intelligent greedy model for influence maximization
models. The basic idea of this method states that the marginal gain of an iteration can’t be superseded by its subsequent iteration. A stochastic optimization method comprising of neighborhood estimation, variance reduction, Bayesian filter and stochastic gradient search is proposed in [14]. This work performs influence maximization over a Markovian graph. Initially from a fixed node set it derives a connectivity graph in the form of Markov chain. A probability distribution measure is taken over these fixed sets of nodes and a cascading model is adopted for the processing. A greedy inspired heuristic method is proposed in [15]. This method reduces the time complexity of the implementation which is further hybridized with the k-means clustering technique for influence maximization through identification of an effective seed set. The work in [16], has proposed an influence maximization technique in multiple networks. The authors have worked on a seed node distribution issues through a propagation resistance model. This propagation resistance model is used to design an heuristic evaluation function which considers more than one network together and assumes decisions regarding the locations of the seed nodes. The basic idea behind this proposed work is to distribute the positions of seed nodes efficiently such that the maximum number of nodes can be covered within a limited time. Results of this method are further optimized by adapting this propagation resistance quotient within a hill-climbing method. A credit distribution model as a sub-modular operational unit under the concept of a general knapsack constraint is proposed in [17]. It has modelled a probabilistic node streaming algorithm for deriving a measure of influence quotient σ(). Finally the method is revised to gain approximation towards optimality and conducting experiments with large data sets. An optimized seed set selection technique through a global heuristic search algorithm is proposed in [18]. [19] presents a graph based influence maximization approach using a nonsubmodular model under a generic threshold. Basically it offers an efficient seed selection method.
8.3 Proposed method The proposed method will consider a graph (G) as input. G is represented by (V, E), where V is the vertex set and E is the edge set. This work has introduced three new concepts in the form of a η-closure of a node, diffusion strength (ξ ) and a diffusion propagation metric (μ). These terms are defined below, Definition 8.1 (η-closure). η-closure of a node u (η(u)) is defined as the set of all nodes having outgoing edges towards u (Nin (u)) along with u itself. Equation (8.3) represents η(u). η(u) = Nin (u) ∪ u. (8.3)
8.3 Proposed method | 171
Definition 8.2 (Diffusion strength). The diffusion strength of a node u along v is symbolized as (ξ v (u)). It is defined as the modular of the intersection between η(u) and η(v). The diffusion strength of u for v takes the count of all common incoming nodes of u and v. Diffusion strength is formally shown in equation (8.4). ξ v (u) = |η(u) ∩ η(v)|
(8.4)
Definition 8.3 (Adjacency of diffusion strength). Mutual diffusion strength of a node u for another node v can be represented in two-dimensional matrix structure for a graph. This matrix can be called as adjacency of diffusion strength (A ξ ). Entries of A ξ can be represented as shown in equation (8.5). {ξ v (u) , ξ A uv = { 0, {
∀v ≠ u otherwise
(8.5)
Definition 8.4 (Diffusion propagation). The diffusion propagation is a metric for the edge between two nodes, whose diffusion strengths are known. Diffusion propagation of an edge between node u and v is represented as μ(u, v). It is the ratio between the diffusion strength of u towards v and the sum of diffusion strengths of all other uniquely connected nodes of v, towards v which are not in common with u. μ(u, v) =
ξ v (u) ∑ ξ v (n), ∀n ∈ η(v) \ η(u)
(8.6)
Example 8.1. The concepts of η-closure of a node, diffusion strength (ξ ) and diffusion propagation metric (μ) are further illustrated with the help of a small network diagram as shown in Figure 8.1.
Fig. 8.1: Demo Network Diagram
172 | 8 Intelligent greedy model for influence maximization
η-closure’s are calculated using equation (8.3) as follows η(1) = {1, 2, 3, 5} η(2) = {1, 2, 3} η(3) = {1, 2, 3, 4} η(4) = {3, 4, 5} η(5) = {1, 4, 5} Diffusion Strength for some nodes are calculated using equation (8.4) as follows ξ 2 (1) = |η(1) ∩ η(2)| = {1, 2, 3} = 3 ξ3 (1) = |η(1) ∩ η(3)| = {1, 2, 3} = 3 ξ4 (1) = |η(1) ∩ η(4)| = {3, 5} = 2 ξ5 (1) = |η(1) ∩ η(5)| = {1, 5} = 2 Similarly, we can calculate the diffusion strength (ξ ) of the remaining nodes. The Diffusion Strength of each node u along v can be represent into the following matrix form as shown in equation (8.5),
ξ
A uv
0 [3 [ [ = [3 [ [2 [2
3 0 3 1 1
3 3 0 2 2
2 1 2 0 2
2 1] ] ] 2] ] 2] 0]
In the following section some calculations of diffusion propagation have been shown using equation (8.6), μ(1, 2) =
ξ1 (2) 3 = = 1.5 ξ1 (5) 2
μ(1, 5) =
2 ξ1 (5) = = 0.33 ξ1 (2) + ξ1 (3) 3 + 3
μ(1, 3) =
ξ1 (3) 3 = = 1.5 ξ1 (5) 2
[since, (η(1) \ η(2)) = {5}] [since, (η(1) \ η(5)) = {2, 3}]
[since, (η(1) \ η(3)) = {5}]
Similarly, we can also calculate the diffusion propagation (μ) for the remaining edges.
8.3.1 Diffusion dynamics based greedy algorithm The proposed greedy approach works in two phases. In the first phase it refines the seed set of the network based on diffusion strength. In the second phase it performs
8.3 Proposed method | 173
influence propagation. Optimal influence propagation is achieved through the effective selection of the seed set. This work models an efficient seed set selection technique based on diffusion strength. Each row of the adjacency matrix of diffusion strength (A ξ ) contains the diffusion strength of a particular node for every other node of the network. In the seed selection process, row-wise summation (δ(u)) is done on (A ξ ) by equation (8.7) and from there the nodes having high values are selected within the seed set as per desired number of elements. n
ξ
δ(u) = ∑ A ui .
(8.7)
i=1
Cardinality (number of elements) of the seed set is decided upon the cardinality of the node set (vertex set) of the network. Usually in this work, cardinality of the seed set is fixed as half of the cardinality of the node set. Accordingly if k is the cardinality of seed set (S) then equation (8.8) represents k as, k=
|V| . 2
(8.8)
A greedy approach is used to select members of S from V on the basis of descending order of δ(u) where u ∈ V. Thus the seed set selection method is accomplished and this seed set is used in the first trial of the influence diffusion method. In Algorithm 8.1, line number 1 to 7 represents the seed set selection mechanism through a greedy approach on the basis of diffusion strength. On completion of this seed set selection mechanism, the diffusion propagation (μ()) is computed for each edge of E following equation (8.6). This diffusion propagation (μ()) is converted into a normalized diffusion propagation (μ norm ()) by equation (8.9). μ(u, v) μ norm (u, v) = . (8.9) OutDegree(u) Algorithm 8.1: Greedy Algorithm on Diffusion Dynamics Data: G(V, E) = Graph with vertices set V and Edge set E. Result: I, Influence spread through seed set S. 1: Seed set, S ← 0 ; |V|
2: Cardinality of S, k = 2 ; 3: d[] ← δ(u) | ∀u ∈ V ; 4: Sort d[] in descending order ; 5: while (|S| ≤ k) do 6: 7:
S = S ∪ {d[i]} ; i = i+1;
8: S0 = S ; 9: u ← θ u | ∀u ∈ V ;
174 | 8 Intelligent greedy model for influence maximization
10: Target set, T ← 0; for (i = 1 to n) do
18:
S i = S i−1 ; w(u) ← 0 | ∀u ∈ V ; while (S i ≠ 0) do Select a node n from S ; for (all edge ((n, u) ∈ E) and (u ∉ S i )) do w(u) = w(u) + μnorm (n, u) ; if (w(u) ≥ θ u ) then T = T ∪ {u} ;
19:
Si = Si ∪ T ;
11: 12: 13: 14: 15: 16: 17:
20: I = σ(S) =
∑ni=1 |S i | ; n
21: return(I) ;
The second phase of the method is described from line number 8 to 21 of Algorithm 8.1. Here a random threshold (θ) (between 0 to 1) is applied on each vertex and each of these vertices is assigned an influence weight (w). Then (w) is initially set to NULL and later it is updated by (μ norm ()) as per the chosen seed vertex. Owing to the concept of the linear threshold model, when this (w()) of a vertex exceeds (θ) of that vertex then following the principle of equation (8.1), that vertex is called targeted/ influenced. A concept of Monte Carlo simulation has been applied here, which suggest a similar n- number of trials and finds an average result on those trials to derive the influence spread (σ()). A work-flow example of the proposed Algorithm 8.1 on the demo network diagram of Figure 8.1 has been shown in Table 8.1. Here, eight different trials are thoroughly explained. In each trial, a set of threshold values (θ v ) are randomly generated between 0 and 1 for all five nodes of the network. Seed set (S) cardinality is calculated by equation (8.8) and simultaneously δ(u) is computed individually for all participating nodes using equation (8.7). After that all nodes are sorted in descending order of δ(u) and greedy method for seed selection is used to find out the initial seed set, (S). In the next phase, any node (u) from the seed set (S) is chosen to influence another node (v) of the same graph, where (v) is not a member of seed set. This node (v) is called the target node. Normalized diffusion propagation weight μ norm (u, v) is (through equation (8.9)) considered and updated within weight set [W] against (v). If (W v ) equals or exceeds (θ v ) then that attempt is called successful and (v) is inducted within seed set (S). This process continues for all possible pairs of combinations between nodes of the seed set and the remaining nodes of the graph. Cardinality of the seed set is measured at the end of each trial. Accordingly eight different trials have been performed rigorously and the cardinality value of the final seed sets are summed and averaged by the number of trials to determine the average influence spread (σ(S)). This averaging is done on the basis of aMonte Carlo simulation. The final schematic diagrams in the table represents the status of the network at the end of each trial. In
8.4 Experiments | 175
these diagrams, green circles represent the seed set, blue circles stand for targeted nodes and white circles represent inactive nodes.
8.4 Experiments A server running on 2.33 GHz Quad-Core Intel Xeon E5410 with 16 GB of memory has been used for the simulation. The proposed method has been simulated on different networks 200 times using Monte Carlo simulation methods in order to obtain the influence spread by the proposed algorithm. Average of the influence spread was calculated from 200 trials. Different settings for the experiment includes; 1. An arc (u, v) has been assigned with a Uniform Weight (UW) as p(u, v) = 1/din (v), where din is the in-degree of the node v [1]. 2. A Random Weight (RW) has been assigned between [0, 1] to all arcs [4]. 3. A Diffusion Propagation (DW) has been assigned after necessary computation through equation (8.6) and (8.9). Eight real world data sets have been considered in this research for simulation. These data sets are discussed below; 1. Zachary’s Karate Club (ZKC) is a social network representing friendships among 34 members of a karate club at an US University in the year 1970 [4]. 2. Les Miserables (LM) is a network of characters representing their co-appearance in the novel Les Miserables [6]. 3. Word Adjacency (WA) represents an adjacency network of common adjectives and nouns in Chales Dickens’s novel David Copperfield [20]. 4. American College Football Club (AFC) represents Division IA colleges in a network who participated in American football game in a regular season Fall 2000 [21]. 5. Dolphin Social Network (DSN) is a representative social network between 62 dolphins and their frequent associations in a community living off Doubtful Sound in New Zealand [22]. 6. Books About US Politics (PB) represents network of books published during 2004 presidential election and sold by Amazon.com, an online bookseller. Edges between books in this network represent frequent co-purchasing by the same buyers. This network is available in Kreb’s website and was compiled by V. Krebs. Though this is formally unpublished. 7. Neural Network (NN) represents C. Elegans neural network [23]. 8. Power Grid (PG) represents Western States Power Grid’s topology of the United States [24]. Details of these data sets are illustrated in Table 8.2.
Seed (u) 3 3 1
3 3 1 5 3 3 1 5 3 3 1 5
Seed Set [S] [3, 1] [3, 1, 2] [3, 1, 2] [3, 1] [3, 1, 2] [3, 1, 2] [3, 1, 2, 5] [3, 1] [3, 1, 2] [3, 1, 2] [3, 1, 2, 5] [3, 1] [3, 1, 2] [3, 1, 2] [3, 1, 2, 5]
Threshold [θ v ]
[0.07, 0.01, 0.1, 0.16, 0.17]
[0.19, 0.01, 0.09, 0.02, 0.0.04]
[0.06, 0.09, 0.11, 0.13, 0.0.04]
[0.05, 0.09, 0.16, 0.18, 0.09]
Trial #
1
2
3
4
Tab. 8.1: Work-flow Example of Algorithm 8.1 on Figure 8.1 with 8 Trials
2 4 5 4
2 4 5 4
2 4 5 4
2 4 5
Target (v)
[0.0, 0.5, 0.0, 0.0, 0.0] [0.0, 0.5, 0.5, 0.11, 0.0] [0.0, 0.5, 0.0, 0.11, 0.11] [0.0, 0.5, 0.0, 0.61, 0.11]
[0.0, 0.5, 0.0, 0.0, 0.0] [0.0, 0.5, 0.5, 0.11, 0.0] [0.0, 0.5, 0.0, 0.11, 0.11] [0.0, 0.5, 0.0, 0.61, 0.11]
[0.0, 0.5, 0.0, 0.0, 0.0] [0.0, 0.5, 0.5, 0.11, 0.0] [0.0, 0.5, 0.0, 0.11, 0.11] [0.0, 0.5, 0.0, 0.61, 0.11]
[0.0, 0.5, 0.0, 0.0, 0.0] [0.0, 0.5, 0.5, 0.11, 0.0] [0.0, 0.5, 0.0, 0.11, 0.11]
Updated Weight [W]
Success Fail Success Success
Success Fail Success Success
Success Fail Success Success
Success Fail Fail
Result
Final Schematic
176 | 8 Intelligent greedy model for influence maximization
[0.07, 0.18, 0.2, 0.1, 0.1]
8
2 4 5 4
2 4 5 4
2 4 5 4
2 4 5 4
Target (v)
= 4.75
3 3 1 5
[3, 1] [3, 1, 2] [3, 1, 2] [3, 1, 2, 5]
[0.04, 0.0, 0.15, 0.01, 0.11]
7
38 8
3 3 1 5
[3, 1] [3, 1, 2] [3, 1, 2] [3, 1, 2, 5]
[0.09, 0.09, 0.02, 0.09, 0.15]
6
=
3 3 1 5
[3, 1] [3, 1, 2] [3, 1, 2] [3, 1, 2, 5]
[0.16, 0.19, 0.04, 0.0, 0.17]
5
∑ 3+5+5+5+5+5+5+5 8
3 3 1 5
[3, 1] [3, 1, 2] [3, 1, 2] [3, 1, 2, 5]
Influence spread (I) after above 8 trials will be, I = σ(S) =
Seed (u)
Seed Set [S]
Threshold [θ v ]
Trial #
Tab. 8.1: (continued)
[0.0, 0.5, 0.0, 0.0, 0.0] [0.0, 0.5, 0.5, 0.11, 0.0] [0.0, 0.5, 0.0, 0.11, 0.11] [0.0, 0.5, 0.0, 0.61, 0.11]
[0.0, 0.5, 0.0, 0.0, 0.0] [0.0, 0.5, 0.5, 0.11, 0.0] [0.0, 0.5, 0.0, 0.11, 0.11] [0.0, 0.5, 0.0, 0.61, 0.11]
[0.0, 0.5, 0.0, 0.0, 0.0] [0.0, 0.5, 0.5, 0.11, 0.0] [0.0, 0.5, 0.0, 0.11, 0.11] [0.0, 0.5, 0.0, 0.61, 0.11]
[0.0, 0.5, 0.0, 0.0, 0.0] [0.0, 0.5, 0.5, 0.11, 0.0] [0.0, 0.5, 0.0, 0.11, 0.11] [0.0, 0.5, 0.0, 0.61, 0.11]
Updated Weight [W]
Success Fail Success Success
Success Fail Success Success
Success Fail Success Success
Success Fail Success Success
Result
Final Schematic
8.4 Experiments | 177
178 | 8 Intelligent greedy model for influence maximization
Tab. 8.2: Dataset Information Dataset
|V|
|E|
Average Degree
Zachary’s Karate Club Les Miserables Word Adjacency American College Football Club Dolphin Social Network Books About US Politics Neural Network Power Grid
34 77 112 115 62 105 297 4941
78 254 425 613 159 441 2148 6594
4.588 6.5974 7.5893 10.6609 5.129 8.4 14.4646 2.6691
Tab. 8.3: Result Set using different edge weights Dataset
DW
RW
UW
Zachary’s Karate Club Les Miserables Word Adjacency American College Football Club Dolphin Social Network Books About US Politics Neural Network Power Grid
29.29 51.91 103.9 61.325 48.54 55.98 264.22 1240.1
27.9 41.7 70.2 35.4 30.2 50.3 129 135
28.62 43.71 65.39 34.31 33.23 54.92 121.7 133.2
Now the proposed method was deployed on those eight different benchmark data sets along with the operational setting of (UW) [1] and (RW) [4]. Executions of the process settings called RW, UW and DW have been done separately and observed results are recorded in Table 8.3. It has been observed from the simulation that the proposed algorithm has achieved remarkable improvement over some bigger networks like a Neural network or a Power grid. Influence propagation has recorded more than 100% better coverage compared to earlier methods. A formal comparison of σ() after performing 200 trials on all three techniques are shown graphically in Figure 8.2. It has been observed that this currently proposed method has recoded 3% to 48% better efficiency in terms of influence spread over the methods of [1] and [4]. It is observed from Figure 8.3 that the proposed method has shown a fast computational time in comparison to the two other methods of [1] and [4]. The time axis (y-axis) of Figure 8.3 is converted into a logarithmic scale to enhance the visibility of the graph under varying data sizes, among eight different real world data sets.
8.4 Experiments |
179
Fig. 8.2: Comparison of σ() after 200 trials on (UW) [1], (RW) [4] and DW [Proposed]
Fig. 8.3: Comparative on computational time consumed by (UW) [1], (RW) [4] and DW [Proposed] for different benchmark data set
180 | 8 Intelligent greedy model for influence maximization
8.5 Conclusion Different multimedia data networks are basically represented by graphs for the computational analysis of influence propagation capability. The present work has introduced two new intelligent metrics called diffusion propagation. These two metrics serve as a measured heuristic on the influenceability power of different vertices and those are used within the greedy seed selection procedure. Diffusion propagation is populated on the diffusion strength of each node for other connecting nodes. The proposed intelligent greedy method was simulated on eight different types of benchmark data sets to test the efficacy of the algorithm. Experimental results show a very high influence propagation in comparison to other well known methods across all eight different benchmark networks. Also the inclusion of a realistic heuristic measure has enhanced the visibility of the method and thus it appears as a fast influencing technique. It has been observed that the vertices having high influencing capability always maintain a high precision. Also this proposed method has recorded a very low computational time across different benchmark data sets in comparison to earlier methods. This better time requirement is another key feature of the proposed technique. In the future certain betterment scopes can be explored over this method. One aspect can deal with the randomized threshold (θ v ) assignments between (0, 1) on each node. Some realistic measures can be adopted from the status of the network to set these thresholds of participating nodes. Also the seed selection mechanism can be made more agile. The linear threshold model concept has been adopted in this work and as a matter of convention it has used a Monte Carlo simulation. A huge number of trials are needed in a Monte Carlo simulation after which the average performances are derived to find a solution. This approach is time consuming. An alternative robust mechanism can be adopted to reduce the overhead of the repeated simulations.
References [1]
[2] [3] [4] [5]
D. Kempe, J. M. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 137–146, 2003. A. Goyal. Learning influence probabilities in social networks. In Proc. of Third ACM International Conference on Web search and data mining, pp. 241–250, 2010. M. S. Granovetter. Threshold models of collective behavior. The American Journal of Sociology, 83(6):1420 – 1443, 1978. W. Chen. Efficient influence maximization in social networks. In Proc. of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 198–208, 2009. W. Chen. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proc. of the Sixteenth International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1029–1038, 2010.
References | 181
[6] [7] [8] [9] [10] [11] [12]
[13]
[14]
[15]
[16]
[17] [18] [19] [20] [21] [22]
[23] [24]
W. Chen, L. V. S. Lakshmanan, and C. Castillo. Information and influence propagation in social networks. Morgan and Claypool, 2013. J. Goldenberg, B. Libai, and E. Muller. Using complex systems analysis to advance marketing theory development. Academy of Marketing Science Review, 2001. M. Kermac. Contributions to the mathematical theory of epidemics. Royal Society of Edinburgh, Section A, Mathematics, 115, 1972. M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In Proc. 8th IFIP Conf. on Optimization Techniques, pp. 234–243, 1978. M. R. Domingos. Mining the network value of customers. In Proc. of 7th International Conference on Knowledge Discovery and Data Mining, pp. 57–66, 2001. M. R. Domingos. Mining knowledge-sharing sites for viral marketing. In Proc. of 8th International Conference on Knowledge Discovery and Data Mining, pp. 61–70, 2002. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. M. VanBriesen, and N. S. Glance. Costeffective outbreak detection in networks. In Proc. 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 420–429, 2007. A. Goyal, W. Lu, and L. V. S. Lakshmanan. CELF++: Optimizing the greedy algorithm for influence maximization in social networks. In Proc. of the 20th ACM International Conference companion on World Wide Web (WWW), pp. 47–48, 2011. B. Nettasinghe and V. Krishnamurthy. Influence maximization over markovian graphs: A stochastic optimization approach. IEEE Transactions on Signal and Information Processing over Networks, 2018. G. Zhang, S. Li, J. Wang, P. Liu, Y. Chen, and Y. Luo. New influence maximization algorithm research in big graph. In Proc. of 2017 14th Web Information Systems and Applications Conference (WISA), 2017. S. Das. Seed node distribution for influence maximization in multiple online social networks. In Proc. of 2017 IEEE 15th International Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence & Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), 2017. Q. Yu, H. Li, Y. Liao, and S. Cui. Fast budgeted influence maximization over multi-action event logs. IEEE Access, 6:14367 – 14378, 2018. J. D. Nunez-Gonzalez, B. Ayerdi, M. Grana, and M. Wozniak. A new heuristic for influence maximization in social networks. Logic Journal of the IGPL, 6, 2016. L. Ma, G. Cao, and L. Kaplan. Contributions to the mathematical theory of epidemics. In Proc. of 2017 IEEE International Conference on Big Data, 2017. M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(036104), 2006. M. Girvan and M. E. J. Newman. Community structure in social and biological networks. In Proc. National Academy of Science, USA, 99, pp. 7821–7826, 2002. D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations - can geographic isolation explain this unique trait? Behavioral Ecology and Sociobiology, 54:396–405, 2003. S. Wasserman and K. Faust. Social network analysis. Cambridge University Press, Cambridge, 1994. M. E. J. Newman. Mixing patterns in networks. Physical Review E, 67(026126), 2003.
Index adaptive histogram equalization 7 adaptive threshold 66–71, 76, 89 adjacency matrix 109, 110 artificial intelligence 98, 102 assessment 96, 99, 109, 111, 115 automated diagram drawing 120 Boltzmann annealing 151 bulk synchronous parallel 31 candidate overlapping node screening 37 characteristic function 125 cluster validity indices 152 clustering 145 color histogram equalization 6 color image segmentation 130 color medical image enhancement 1 color space 1 computational intelligence 97 contrast 2 coronary angiography 43 data clustering 123 DB index 152, 154 difficulty level 97–99, 104, 105, 119 diffusion propagation 167, 168, 170–173, 175, 180 diffusion strength 168, 170–173, 180 Dr. Geo 99 dynamic community 20, 21, 27, 38 entropy 14, 44 entropy filtering 140 evolutionary computation 47 feature extraction 65, 66, 69, 72, 74, 76, 89, 123, 135, 138 feature selection 135 figure database 97, 99, 105, 108, 110, 113, 119 filter bank 46–48 functional intelligence 98 fuzzy 3 fuzzy c-means 38 fuzzy logic based enhancement technique 8 fuzzy rough set 124 fuzzy set 123–125 https://doi.org/10.1515/9783110552072-009
fuzzy soft rough k-means clustering 139, 141 fuzzy-PSO 38 Gaussian distribution 47 Gaussian matched filters 43–45, 53, 54, 59 genetic algorithm 38 GeoGebra 95, 99 geometric property 109 geometric transformation 46 geometrical correctness 97, 98, 101, 102 GeometryNet 98, 102, 105, 106, 115, 119 global threshold 66, 68, 76, 89 GPU 36 granulation operation 128 granule 128 graph-based representation 44, 45, 53, 54, 56, 58–60 gray levels 2 gray-level thresholding 123 HE stain 3 HSV color space 4 I index 152–154 image classification 125, 128 image enhancement 1, 137 image processing 101, 102, 119, 123 image retrieval 128 image segmentation 123, 124, 129, 133 image variance 66, 92 incremental identification 27 independent cascade 168, 169 InfoMap 102 information granulation 128 intelligent algorithms 19, 20, 38 intuitionistic fuzzy (IF) set 9, 123, 124, 126 intuitionistic fuzzy rough set 130 K-nearest neighbor (KNN) 74 knowledge base 102, 105, 119 LAB color space 4 label propagation algorithm 26 lesson mode 96–98, 103, 105 LIM-G 102 line simplification 44, 51, 53, 56, 59 linear scale space theory 131
184 | Index
linear threshold 168, 169, 174 link partitioning of overlapping communities 37 LinkSHRINK 35 lower approximation 125 machine understanding 107 Markov chain Monte Carlo 37 match score 97, 98, 111 Mathematics Tutor 101 medical image processing 1 medical images 45, 47 membership function 125 Mindspark 101, 115, 119 modularity 20, 24, 25, 27, 28, 31, 35, 37 Monte Carlo simulation 174, 180 multimedia data 19 natural language processing 102, 106, 107, 119 neighbourhood vector propagation algorithm 36, 37 neural network 124 NP hard 20, 25, 38 objective function 49–51, 53 ontology 102, 105 overlapping community 19–21, 28, 30, 32–38 parser 106 particle swarm optimization 38 pen-down 109 pen-up 109 personalized page rank 34 physical property 109 pixel classification 123 primary school level 95 PSNR 14 quantum computing 148 quantum gates 150 quantum orthogonality 149 qubit 149, 150 quick reduct algorithm 138 Ramer–Douglas–Peucker algorithm 43–45, 51, 59, 60 remote direct memory access 36
ROC curve 45, 48, 49, 54, 59 rough fuzzy k-means algorithm 124 rough fuzzy set 124 rough set 123, 124, 126 rough set model 124 roughness index 132 school level geometry 95, 97–99, 102, 105, 107, 120 semantic 106 sensitivity 49 signal to noise ratio 137 similarity measures 134 simulated annealing 150 Sketchometry 100 SNR 137 soft set 123, 124, 126 spatial image classification 124 static community 20, 21, 38 stochastic variation bayes 37 syntactic 106 test mode 96, 97, 102–105, 115, 119 text-to-drawing 105–107, 119 texture classification 123 threshold 104 true-positive-fraction 49 type-2 fuzzy set 9 uncertainty based models 123 univariate marginal distribution algorithm 43–45 univariate marginal distribution algorithm for continuous domains 50, 51 unsupervised framework 45, 53, 54, 57–59 upper approximation 125 vessel enhancement 43, 44 visual perceptual disease 98 word problem 98, 102, 104, 107 X-ray coronary angiograms 45, 53–56, 58–60