289 96 2MB
English Pages 193 [194] Year 2019
Surekha Borra, Nilanjan Dey, Siddhartha Bhattacharyya, Mohamed Salim Bouhlel (Eds.) Intelligent Decision Support Systems
De Gruyter Frontiers in Computational Intelligence
Edited by Siddhartha Bhattacharyya
Volume 4 Already published in the series Volume 3: Big Data Security S. Gupta, I. Banerjee, S. Bhattacharyya (Eds.) ISBN 978-3-11-060588-4, e-ISBN (PDF) 978-3-11-060605-8, e-ISBN (EPUB) 978-3-11-060596-9 Volume 2: Intelligent Multimedia Data Analysis S. Bhattacharyya, I. Pan, A. Das, S. Gupta (Eds.) ISBN 978-3-11-055031-3, e-ISBN (PDF) 978-3-11-055207-2, e-ISBN (EPUB) 978-3-11-055033-7 Volume 1: Machine Learning for Big Data Analysis S. Bhattacharyya, H. Baumik, A. Mukherjee, S. De (Eds.) ISBN 978-3-11-055032-0, e-ISBN (PDF) 978-3-11-055143-3, e-ISBN (EPUB) 978-3-11-055077-1
Intelligent Decision Support Systems Applications in Signal Processing Edited by Surekha Borra, Nilanjan Dey, Siddhartha Bhattacharyya, Mohamed Salim Bouhlel
Editors Surekha Borra Kammavari Sangha Institute of Technology Department of ECE Kanakapura Main Road 560109 Bengaluru, India [email protected] Nilanjan Dey Techno India College of Technology Department of Information Technology New Town 700156 Kolkata, India [email protected]
Siddhartha Bhattacharyya RCC Institute of Information Technology, Canal South Road, Beliaghata, Kolkata 700 015, India [email protected] Mohamed Salim Bouhlel Institut Supérieur de Biotechnologie de Sfax Sfax University Route Sokra km 4 – BP 1175 3038 Sfax, Tunisia [email protected]
ISBN 978-3-11-061868-6 e-ISBN (PDF) 978-3-11-062110-5 e-ISBN (EPUB) 978-3-11-061871-6 ISSN 2512-8868 Library of Congress Control Number: 2019945100 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2019 Walter de Gruyter GmbH, Berlin/Boston Typesetting: Integra Software Services Pvt. Ltd. Printing and binding: CPI books GmbH, Leck Cover image: shulz/E+/getty images www.degruyter.com
Preface Big data and Internet of things (IoT) play a vital role in prediction systems used in biological and medical applications, particularly in resolving the issues related to disease biology at different scales. Modeling and integrating the medical big data with IoT help in building effective prediction systems for automatic recommendation of diagnosis and treatment. Machine-learning techniques, related algorithms, methods, tools, standards, and infrastructures are used for modeling the prediction systems and management of the data; the main concerns being provision of privacy, security, establishment of standards, scalability, availability, user friendly, and continuous improvement of tools and technologies. This book addresses the issues and challenges related to development, implementation, and applications of automatic and intelligent prediction and decision support systems in healthcare. Also included are the applications of intelligent prediction algorithms in the development of speaker and gesture recognition systems/ software. Feature selection is an important step in everyday data mining, maintaining the interpretability of the final models, which is especially important for researchers and professionals developing intelligent systems for biomedicine. In Chapter 1, Alan Jovic provides an in-depth overview of the various feature selection approaches with a special focus on biomedical signal classification, including filters, wrappers, embedded methods, and various hybrid approaches. Intelligent decision support systems help dermatologist in their diagnosis in detecting and classifying skin lesions. In Chapter 2, Sabri et al. present an overview of image-processing algorithms with respect to image segmentation, features engineering, and classification algorithms, with a special focus on ABCD (asymmetry, border, color, and diameter) rules. To detect the MRI images efficiently and to locate the tumor position accurately, Ishita et al. in Chapter 3 used support vector machine technique, an equalization technique called contrast limited adaptive histogram equalization, and an adaptively regularized kernel fuzzy C-means clustering algorithm, threshold algorithm, and morphological operations. The proposed method decreases the variance of the prediction error and increases the flexibility to detect the nature of the tumor that can extend helping hands to the medical practitioners. Chapter 4 addresses and analyzes various machine-learning algorithms to aid decision-making of healthy and coronary heart disease (CHD) patients with respect to accuracy, execution, and sensitivity, and proves that the decision tree-based genetic approach predicts the CHD patient with high accuracy. In Chapter 5, various retinal features such as nerve lines, optic cup, optic disk, cup-to-disk ratio, and so on are used for the detection of glaucoma disease from retinal image. Different image-processing operations, fuzzy C-means clustering-based
https://doi.org/10.1515/9783110621105-202
VI
Preface
segmentation, and ellipse-fitting-based scheme are employed for glaucoma detection from a retinal image with an accuracy of about 97.27%. Chapter 6 presents a speech separation system capable of extracting the desired speakers from corrupted observation and enable facilitating human–computer interactions via the voice. The chapter focuses on the practical applications of singlechannel speech separation categories particularly based on the subspace decomposition-based approaches, phase-aware approaches, or phase-independent ones. Chapter 7 illustrates the possible human–machine interaction and its types, implementation of gesture control with camera and Kinect sensor, and the use of database for hand gesture recognition in real time. This book is useful to researchers, practitioners, professionals, and engineers in the field of biomedical systems engineering and may be referred by students for advanced material. We would like to express gratitude to the authors for their contributions. Our gratitude is extended to the reviewers for their diligence in reviewing the chapters. Special thanks to our publisher, De Gruyter press. As editors we wish this book will stimulate further research in developing algorithms and optimization approaches related to intelligent decision support systems. Editors Surekha Borra, K.S. Institute of Technology, Bangalore, Karnataka, India Nilanjan Dey, Techno India College of Technology, Kolkata, India Siddhartha Bhattacharyya, RCC Institute of Information Technology, Kolkata, India Med Salim Bouhlel, Head of the Research Lab: SETIT (Sfax University), Tunisia
Contents Preface
V
List of Contributors
IX
Alan Jovic 1 Feature selection in biomedical signal classification process and current software implementations 1 My Abdelouahed Sabri, Youssef Filali, Assia Ennouni, Ali Yahyaouy and Abdellah Aarab 2 An overview of skin lesion segmentation, features engineering, and classification 31 Banerjee Ishita, P. Madhumathy and N. Kavitha 3 Brain tumor image segmentation and classification using SVM, CLAHE, and ARKFCM 53 Reddi Sivaranjani, Vankamamidi S. Naresh and Nistala V.E.S. Murthy 4 Coronary Heart Disease prediction using genetic algorithm based decision tree 71 Vipul C. Rajyaguru, Chandresh H. Vithalani and Rohit M. Thanki 5 Intelligent approach for retinal disease identification 99 Belhedi Wiem, Ben Messaoud Mohamed anouar and Bouzid Aïcha 6 Speech separation for interactive voice systems 131 Varnita Verma, Anshuman Rajput, Piyush Chauhan, Harshit Rathore, Piyush Goyal and Mukul Kumar Gupta 7 Machine vision for human–machine interaction using hand gesture recognition 155 Index
183
List of Contributors Alan Jovic University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia [email protected]
N. Kavitha Dayananda Sagar Academy of Technology and Management, Bangalore, India [email protected]
My Abdelouahed Sabri LIIAN, Department of Computer Science University Sidi Mohamed Ben Abdellah, Fez, Morocco [email protected]
Reddi Sivaranjani Department of Computer Science and Engineering Anil Neerukonda Institute of Technology & Science, Visakhapatnam, India [email protected]
Youssef Filali LIIAN, Department of Computer Science University Sidi Mohamed Ben Abdellah, Fez, Morocco [email protected] Assia Ennouni LIIAN, Department of Computer Science University Sidi Mohamed Ben Abdellah, Fez, Morocco [email protected] Ali Yahyaouy LIIAN, Department of Computer Science University Sidi Mohamed Ben Abdellah, Fez, Morocco [email protected] Abdellah Aarab LESSI, Department of Physics University Sidi Mohamed Ben Abdellah, Fez, Morocco [email protected] P. Madhumathy Dayananda Sagar Academy of Technology and Management, Bangalore, India [email protected] Banerjee Ishita Dayananda Sagar Academy of Technology and Management, Bangalore, India [email protected]
https://doi.org/10.1515/9783110621105-204
Vankamamidi S. Naresh Department of Computer Science and Engineering Sri Vasavi Engineering College, Tadepalligudeam, India [email protected] Andnistala V. E. S. Murthy Department of Mathematics, Andhra University, Visakhapatnam, India [email protected] Vipul C. Rajyaguru Gujarat Technological University, Ahmedabad, Gujarat, India [email protected] Chandresh H. Vithalani Gujarat Technology University, Gujarat, India [email protected] Rohit M. Thanki Faculty of Technology & Engineering, C. U. Shah University, Gujarat, India [email protected] Belhedi Wiem University of Tunis El Manar, National School of Engineers of Tunis, Department of Electrical Engineering, Tunis, Tunisia [email protected]
X
List of Contributors
Ben Messaoud Mohamed Anouar University of Tunis El Manar, National School of Engineers of Tunis, Department of Electrical Engineering, Tunis, Tunisia [email protected]
Piyush Chauhan Department of Analytics, University of Petroleum and Energy Studies, Dehradun, India [email protected]
Bouzid Aïcha University of Tunis El Manar, National School of Engineers of Tunis, Department of Electrical Engineering, Tunis, Tunisia [email protected]
Harshit Rathore Department of Electrical and Electronics Engineering, University of Petroleum and Energy Studies, Dehradun, India [email protected]
Varnita Verma Department of Electrical and Electronics Engineering, University of Petroleum and Energy Studies, Dehradun, India [email protected]
Piyush Goyal Department of Electrical and Electronics Engineering, University of Petroleum and Energy Studies, Dehradun, India [email protected]
Anshuman Rajput Department of Mechanical Engineering, University of Petroleum and Energy Studies, Dehradun, India [email protected]
Mukul Kumar Gupta Department of Electrical and Electronics Engineering, University of Petroleum and Energy Studies, Dehradun, India [email protected]
Alan Jovic
1 Feature selection in biomedical signal classification process and current software implementations Abstract: Feature selection is an important step in everyday data mining. Its aim is to reduce the number of potentially irrelevant expert features describing a dataset to a number of important ones. Unlike feature reduction and transformation techniques, feature selection keeps a subset of the original features, thus maintaining the interpretability of the final models, which is especially important for researchers and medical professionals in the field of biomedicine. The aim of this chapter is to provide an in-depth overview of the various feature selection approaches that are applicable to biomedical signal classification, including: filters, wrappers, embedded methods, and various hybrid approaches. In addition, the recently developed methods based on sequential feature selection and data filtering from streams are considered. Feature selection implementations in current software solutions are described. A comparison of feature selection with deep learning approach is provided. The feature selection approach used in our own web-based biomedical signal analysis platform called MULTISAB (multiple time series analysis in biomedicine) is presented. Keywords: feature selection, biomedical signal classification, biomedical software, deep learning, deep neural network
1.1 Introduction The abundance of data available today constitutes an increasing problem for currently developed data modeling methods. Large datasets, some of which not even fitting in the computer’s main memory of typical or advanced hardware configurations, present an obstacle for discovering relevant new information. To improve the analysis process and ameliorate the problem of efficient analysis of large datasets, feature selection and dimensionality reduction methods were developed in parallel with the improvement in data modeling techniques (i.e., classification, clustering, association rules, etc.). The aim of feature selection is to reduce the number of potentially irrelevant or redundant expert features describing a dataset to a smaller number of important ones that would lead to the feasibility of use and optimal effectiveness of modeling algorithms [1]. Thus, feature selection reduces the dataset size, which accelerates the model construction and allows its processing by machine-learning modeling methods. Feature selection also optimizes the accuracy of the modeling methods, because they need not concern https://doi.org/10.1515/9783110621105-001
2
Alan Jovic
themselves with many irrelevant features. The main difference between feature selection and dimensionality reduction (also called feature transformation or feature extraction [2]) is that feature selection keeps a subset of the original features, thus maintaining the interpretability of the final models. Although some researchers confuse the terms feature selection and dimensionality reduction [3], there is a clear distinction concerning interpretability. Maintaining interpretability is especially important in the field of biomedicine, where medical professionals usually require an explanation about the machine-reasoning process to be able to understand how a decision support software reached a proposed decision [4]. Therefore, in this work, we do not consider feature transformation/dimensionality reduction type of methods, although they are commonly used to improve model accuracy. Feature selection may be applied in a variety of scenarios, depending on the research goal: 1) in classification problems, where the task is to differentiate between two or more categories; 2) in clustering problems, where the exact categories are unknown, but where features may nevertheless be irrelevant to describe the coherent sample cluster; 3) in descriptive problems, where one only wants to discover which features (out of many) are relevant for modeling of their problem, but where classification and/or clustering are not immediately needed; and 4) in streaming information problems, where one needs to decide quickly whether a given feature is important enough to warrant its further storage and processing. The most common application in biomedicine is, by far, the use of feature selection in biomedical signal classification. Here, the goal is to effectively remove all or the majority of the informationally weak (unimportant, irrelevant) features and thus enable more efficient and accurate decisions about the classes of biomedical signals under consideration. This usually involves discerning between several disorders or organism’s states measured by the signal (e.g., classification of arrhythmic heartbeats, detection or prediction of epilepsy, etc.). A subtype of classification where learning is achieved on a small set of labeled examples and is progressed in iterative steps on the (usually large) unlabeled set is called semi-supervised learning [5]. Although they are occasionally used in signal reconstruction, its applications in biomedical signal classification have thus far been limited and are thus not discussed here. In addition, although clustering and descriptive problems are useful in general, there are not many prominent applications of feature selection in such context in biomedical signal processing. Thus, feature selection in semi-supervised learning, clustering, and in descriptive models are considered to be out of scope for this chapter. Streaming information problems are usually used within the scope of classification and may be considered relevant for current biomedical engineering research. Hence, we include a description of online streaming feature selection in this work.
1 Feature selection in biomedical signal classification process
3
Feature selection is an integral part of signal analysis process. In this chapter, a detailed survey of various feature selection methods and their applications in biomedical signal analysis is provided, which includes their connection with data mining and machine learning, mostly in classification, but also in other topics in the field of artificial intelligence applications in biomedicine. The primary aim of this chapter is to provide an in-depth overview of the various feature selection approaches that are applicable to biomedical signal classification, including filters, wrappers, embedded methods, and various hybrid approaches [6]. Both the theoretical and practical aspects of the techniques are covered, with special interest given to the comparison of the methods already applied in different biomedical signal domains, for example, electrocardiogram (ECG), heart rate variability (HRV), electroencephalogram (EEG), electromyogram (EMG), and electrooculogram (EOG). As feature selection forms a part of the analytical process used in decision support systems, the aim is also to highlight the software solutions enabling feature selection applications, including the approach used in our own web-based biomedical signal analysis platform called MULTISAB (multiple time series analysis in biomedicine), which is used as case report in this work. The topic of feature selection for biomedical image classification is interesting and well explored in current literature [7–9]. Although feature selection techniques may be used similarly as in the signal classification area, the other approaches used for two-dimensional (2D) images classification differ from single-dimensional signal classification. Therefore, we do not delve into biomedical image classification techniques in this work. The chapter also covers the comparison of traditional feature selection methods and the newly developed methods based on sequential feature selection [10] and data online filtering from streams [11] that, due to their effectiveness, may be used in biomedical big data processing. Since deep learning methods such as convolutional neural networks, recurrent neural networks, and deep belief networks may be used to extract nonlinear features from biomedical signal raw data samples, without the prerequisite of expert based feature extraction, they can be considered as a threat to the traditional understanding of the data analysis process, effectively eliminating the need for feature selection methods altogether [12, 13]. Hence, this chapter also provides an overview and discussion about the effectiveness of such approaches in biomedical signal analysis through several recent research developments in this field. The chapter is organized as follows. In Section 1.2, various feature selection methods are organized and presented. Section 1.3 covers the related work in the use of feature selection approaches in biomedical signal classification. Application software that enables feature selection is described in Section 1.4. A subsection is devoted to our own web application software that also uses feature selection. Section 1.5 presents deep learning techniques used for biomedical signal classification. Finally, in Section 1.6, we discuss the use of feature selection and deep
4
Alan Jovic
learning approaches and conclude the chapter with a future prospective on using feature selection in biomedical signal classification.
1.2 Feature selection methods 1.2.1 Terminology and problem definition Let |N| be the number of samples (feature vectors) in the sample set N = {x1, x2, . . ., xN} and let |M| be the number of features (or variables) in the feature set M = {f1, f2, . . ., fM} that describe the samples. The goal of feature selection is to obtain a subset S ⊆ M of the feature set, such that the classification accuracy of S: acc(S) using a suitable classifier is equal to or higher than the classification accuracy of M (acc(S) ≥ acc(M)) using the same classifier and such that the size of the subset S is the smallest one retaining the high accuracy. The features are usually categorized in four classes with respect to their importance in a feature set [14]: strongly relevant, weakly relevant but not redundant, relevant but redundant, and irrelevant. The feature selection problem is a multiobjective one, as both accuracy and the number of features need to be optimized. It is also a highly complex problem. The large complexity stems from two sources: (1) the size of the feature set, with the number of possible subsets that need to be evaluated (i.e., powerset) equal to 2|M|; and (2) the possible interactions between the features. The size of the feature set is problematic in the sense that the exhaustive evaluation of all possible subsets is feasible only for a small |M|. Hundreds or thousands of features render the exhaustive approach infeasible, thus leading to a multitude of heuristic search approaches that have no real guarantee to return the optimal solution. The possible interactions between the features are problematic, since some of the strongly relevant features may prove to be redundant when considered in a larger feature set, while other, weakly relevant features on their own may prove to be strongly relevant when combined in a larger feature set. Thus, the usual aim of researchers when performing feature selection is to remove all the irrelevant features first and then proceed to eliminate most of the redundant features in a hope of finding the optimal (or near-optimal) subset. Feature selection approaches may be categorized into four distinct groups: filters, wrappers, embedded methods, and hybrid methods. The categorization is based on the type of feature consideration methods for elimination that are used. The whole feature selection process is depicted in Figure 1.1. The process is mostly independent of the group of feature selection methods used. The parameters that govern the choice of feature selection approaches used by researchers are primarily the size of the feature set (|M|), the size of the dataset (|N|), and the time at the researchers’ disposal. Specifically, large feature sets require fast
1 Feature selection in biomedical signal classification process
Initial feature set M
Search method
Candidate feature set Si
5
Evaluation method Evaluation results
No: Repeat search method Initial feature set S
Results satisfactory?
Yes
Figure 1.1: Feature selection process.
algorithms to remove the majority of irrelevant features. For that purpose, filter methods are normally used. Wrappers are normally used on somewhat smaller feature sets. Here, the choice of the wrapper method depends on the size of the dataset and the classifier used. For wrapper classifiers, methods that do not scale linearly with [M| and |N| should be avoided (e.g., C4.5, nonlinear kernel-based support vector machines), as the computational burden of these methods may be too high. The similar is true for embedded methods. Hybrid methods should be used when researchers have enough time to explore the feature set space in detail, but do not have the opportunity to use exhaustive search or wrapper approach on the full feature set.
1.2.2 Filter-based feature selection methods Filter methods (or filters) are the simplest. They establish the value of a feature or a feature subset without the use of a machine learning classifier as evaluation method. Instead, they use a measure, which may be categorized as statistical, distance based, consistency, or information based, to compare a feature or a feature subset with the target feature (i.e., class). For filters, the search method is usually the rank of individual features, obtained by using the given measure. However, some filter methods, for example, correlation-based subset selection [15] or minimum-redundancy-maximum-relevance (mRmR) [16], aside from individual feature evaluation, also evaluate feature subsets according to a joint criterion with respect to the target feature. These methods are known as multivariate filters, whereas the filters that measure a quality of a single feature are called univariate filters. In the case of multivariate filters evaluation, a search method must be selected that efficiently traverses the feature set space and finds the best (or near-best) solution. In Table 1.1, an overview of filters, used as feature selection methods in classification problems, is presented. The most commonly used and efficient search methods are presented in Table 1.2.
6
Alan Jovic
Table 1.1: Filters used for feature selection in classification problems. Filter measure
Type
Reference
Information gain (also gain ratio, feature-class mutual information)
Univariate, informational
[]
Symmetrical uncertainty
Univariate, informational
[]
χ (chi-square)
Univariate, statistical
[]
t-test
Univariate, statistical
[]
Inconsistency
Multivariate, consistency
[]
Fisher score
Univariate, statistical
[]
Relief family of measures
Univariate, distance
[]
Spectral feature selection
Univariate, distance
[]
Minimum redundancy, maximum relevance (mRmR)
Multivariate, informational
[]
Mutual information MIFS-ND
Multivariate, informational
[]
Joint mutual information maximization
Multivariate, informational
[]
Correlation-based feature selection (CFS)
Multivariate, statistical
[]
Fast correlation-based filter (FCBF)
Multivariate, statistical
[]
Graph clustering and ACO
Multivariate, distance
[]
Rough set theory approaches
Multivariate, consistency
[]
Fuzzy set theory approaches
Multivariate, consistency
[]
Table 1.2: Search methods commonly used for traversing feature set space. Search type
Methods
Reference
Optimality-based search
Exhaustive search, branch-and-bound
[]
Sequential search
Greedy forward selection or backward elimination, floating forward or backward search, best-first search, beam search (and beam stack search), race search
[]
Evolutionary computation and other heuristic searches
Random subset, simulated annealing, scatter search, genetic algorithm, genetic programming, ant colony optimization, particle swarm optimization, artificial bee colony, differential evolution, gravitational search algorithm
[–]
1 Feature selection in biomedical signal classification process
7
The main benefit of using filters for feature selection is their low computational complexity, which enables their use on large data sets, containing hundreds or thousands of features. They are especially effective for fast elimination of many irrelevant features in problems with huge dimensionality (e.g., in bioinformatics). However, filter methods may not be suitable for finding the optimal feature subsets, as they are rarely effective enough to replace the actual classifiers in modeling the dataset. In their study conducted on many datasets, Drotár et al. [33] concluded that univariate feature selection methods are more stable than the multivariate methods (i.e., the set of selected features changes less with statistical variation in the input dataset), with informational (entropy) measures being the most stable. As multivariate methods (such as mRmR) are typically designed to minimize redundancy, for high-dimensional datasets with many related variables, using such methods results not only in a more unstable behavior when compared to univariate methods, but also in higher accuracy. For evolutionary computation approaches to search, rough set theory approaches may be used to guide the optimization process [34]; however, the approach was only shown to be feasible for datasets with a rather small number of features (up to 100) [32]. When using filter-based feature selection, it is opportunistic to explore several methods before determining the best one [33]. This is especially true for univariate methods, which are mostly computationally undemanding. However, some filter methods such as Relief family [21] and spectral feature selection [22] exhibit quadratic or worst-time complexity with respect to the dataset size |N|. Such methods should be avoided in the case of large datasets.
1.2.3 Wrapper-based feature selection approaches Wrapper methods approach the feature selection problem by using a classifier to evaluate each given feature subset. The classification accuracy of a feature subset can be established using a variety of classifiers, that is, decision tree, support vector machines, naïve Bayes, K-nearest neighbors, linear discriminant analysis, and so on. The use of wrappers in high-dimensional datasets is very much resource demanding and therefore requires the use of only the simplest classification algorithms to be computationally feasible [35]. Wrapper approaches use the same search methods to traverse the feature space as do the multivariate filter methods, see Table 1.2. Since constructing a classifier for each subset takes time, wrappers are much slower, but they also tend to be more accurate than the filter methods. The classification method used to evaluate each subset is usually not used for constructing the final model on the eventually selected feature set, because more complex classifiers tend to bring more accurate results in the final modeling phase, when compared to the simpler classifiers used for evaluation in each step of the feature set search. The main limitation on the use of wrapper methods is the evaluation procedure, where, for each iteration of the search, a new classification model
8
Alan Jovic
needs to be constructed. The evaluation of a given feature subset during the search with a classifier is usually done using 10-fold cross-validation on the training dataset. Since this approach is computationally demanding, Liu et al. [36] proposed the use of a new statistical measure called LW-index that was shown to be able to replace the cross-validation approach used by the classifiers and obtain similar accuracies, with approximately 10-times reduction in the execution time.
1.2.4 Embedded feature selection Instead of using a classifier to evaluate various feature subsets, embedded feature selection relies on the internal capabilities of the machine-learning evaluation algorithm itself to find a good subset of features. Regarding both the accuracy of the models as well as the computational cost, these methods are considered to fall in between the effectiveness of filter and wrapper methods [37]. Various machinelearning methods can be used to select features [38]; however, perhaps the two most popular ones, due to their efficiency and accuracy, are random forest and linear support vector machines [39, 40]. Embedded methods may later be used as classifiers for the selected feature subset, or another classifier may be used on the selected feature subset, instead. While feature selection using embedded methods has its advantages, too much dependence on the underlying classifier may introduce a bias in discovering important features. Mostly due to this limitation, embedded methods have not been used much in practice, despite the valiant efforts in their introduction [38].
1.2.5 Hybrid feature selection approaches Hybrid feature selection approach combines filter and wrapper methods for feature selection [41]. Usually, a filter method is first used for fast removal of many irrelevant features. This can be achieved by ranking the individual features according to the value of a filter measure and then either eliminating all the features that do not reach a selected threshold value of the filter measure or eliminating a fixed percentage of lower rank features from the dataset. Thereafter, a wrapper method may be used more efficiently, as reduced dimensionality of the dataset from the first step now leads to a computationally more feasible search for the best feature subset. The downside of such an approach may be that the filter method already removed some weakly relevant features, bereaving the wrapper method of the features and leading to a suboptimal solution. This can be avoided by eliminating in the first step only those features that are truly irrelevant. Compared to filter methods, hybrid methods lead to more accurate models, while compared to the wrapper methods, they lead to faster finding of the best subset. Many hybrid approaches have been
1 Feature selection in biomedical signal classification process
9
proposed in literature in recent years, most of them involving some type of evolutionary search heuristics, for example: fuzzy random forest ensemble [42], conditional mutual information and ant colony optimization [43], rough sets and particle swarm optimization [44, 45], gravitational search coupled with support vector machines wrapper [46], mutual information maximization with adaptive genetic algorithm [47], and so on. Apparently, applying an appropriate hybrid solution for feature selection may work efficiently in practice. However, the main limitation on the use of hybrid methods is that they work only as good as the selected filter and wrapper combination. Hence, if a filter measure used is too restrictive, then some relevant features may be lost, and the wrapper method may not reach the optimal model. On the other hand, if the filter method is too loose, the wrapper approach would need much more time to find the best solution. Also, if the wrapper is not thorough enough with the search, there is no guarantee that the best subset is found. Therefore, the choice of the best hybrid method used is not theoretically well founded. Instead, the choice is experimental, dependent on the characteristics of the underlying dataset(s) and on the availability of the modeling tools, and usually involves a compromise between the most accurate and the least timeconsuming approach. Although the use of evolutionary computation and other heuristic approaches in hybrid feature selection appears lucrative, great care must be taken in experimental evaluation before claims of method superiority can be made.
1.2.6 Sequential and online streaming feature selection Recent research in feature selection deals with a possibility that all features either may not be available, or the target concept can be learned more efficiently by sequentially improving the model using the available features. The introduction of sequential feature selection brings the classification decision process accessible to reinforcement learning [10]. Sequential feature selection learns which features are the most informative at each timestep of the decision process, choosing the next feature depending on the already selected features and the internal belief of the classifier. In this way, the data consumption is minimized. Note that this approach is in some contrast with the wrapper methods, which add a feature at a time, not depending on the state of the classifier, but rather on the classification accuracy, thus requiring learning the model on the whole dataset, which maximizes data consumption and increases computation time. Ghayab et al. [48] used sequential feature selection for highly accurate classification of epilepsy from EEG records, combining random sampling of the data, sequential feature selection, and least squares support vector machine classifier. Contrary to sequential feature selection, online streaming feature selection [11] considers that features may arrive as newly measured variables from time to time, and that classification models need to handle such newly arrived features efficiently. This is in contrast with a more common situation where features are fixed, and only newly
10
Alan Jovic
measured feature vectors (samples) need to be considered by a classifier. Yu et al. [11] have proposed an online feature selection step, capable of selecting and maintaining a pool of effective features, as well as a separate offline step, in which the emerging pattern classifiers [49] are updated based on the newly preselected features. An improvement of online feature selection to handle datasets with extremely high dimensionality and features coming in groups was also later proposed by Yu et al. [50]. The groupSAOLA (scalable and accurate online approach) algorithm efficiently filters out redundant features using pairwise correlation in an online setting. Lastly, extending the approach of online feature selection to efficiently handle imbalanced datasets was proposed by Zhou et al. [51]. They proposed a new algorithm for online feature selection based on the dependency in K-nearest neighbors’ classifier. The algorithm uses the information about the nearest neighbors to select relevant features, which can lead to a higher separability between the majority class and the minority class. This approach is especially significant in biomedicine, since many biomedical datasets suffer from significantly skewed class distributions, where minority classes have a much lower number of samples compared to the majority class (e.g., rare disease patients vs healthy subjects). While limitations on the particular use of online streaming feature selection algorithms may not be evident from current literature, their use may only be justified if the problem at hand, that is, the dataset, has the appropriate characteristics (i.e., all features are not available at the moment when the analysis starts). Otherwise, their time complexity may be too significant compared to the offline methods.
1.3 Feature selection in biomedical signal classification 1.3.1 Biomedical signal classification process overview Biomedical signal classification is a process that usually consists of several steps (Figure 1.2). In the usual scenario, raw signals coming from ECG, EEG, EMG, skin conductance, blood pressure, and/or many other biological signals is first segmented and preprocessed. Segmenting involves defining the length of the signal segment that will be used in the analysis, as well as the number of analysis repetitions and segment overlaps (e.g., only for a single segment, for the whole record with 50% overlap between segments, etc.). Segment lengths may span intervals from just a second or less (e.g., for the detection of heartbeats) up to 24 hours or even more (e.g., for the detection of congestive heart failure from long-term Holter-based HRV series). The choice of segment length depends on the application as well as on feature extraction or modeling methods’ requirements. For example, Hilbert–Huang transform is a wellknown method for preprocessing and feature extraction, which may be used for analysis of various biomedical signals [52, 53]. Hilbert–Huang transform is highly
1 Feature selection in biomedical signal classification process
11
Figure 1.2: Commonly used biomedical signal classification process.
computationally demanding, more so for segments of larger lengths. Some other preprocessing and feature extraction methods from biomedical signals are not so computationally demanding and may use longer segments, for example, most peak detection algorithms in ECG [54], linear time and frequency domain features for HRV analysis [55], and so on. Preprocessing of raw signal segments involves signal filtering, characteristic morphology detection (e.g., ECG peaks and segments), and signal transformation techniques. Filtering is entirely signal dependent and is used to remove various types of noise affecting the biomedical signal. Signal transformation techniques are used to transform the original, raw signal into a form more suitable for characteristic feature extraction. The signal transformation techniques need to be differentiated from feature transformation techniques. While signal transformation techniques prepare the raw data for a more suitable feature extraction process, feature transformations modify the calculated features. The signal transformation methods are based on time domain (e.g., principal component analysis, factor analysis), frequency domain (e.g., discrete Fourier transform, discrete cosine transform), and time–frequency domain (e.g., various discrete and continuous wavelet transforms (CWTs), Hilbert–Huang transform). The feature extraction step follows the preprocessing step. Feature extraction involves numeric quantification of segments using various mathematical formulations, both those devised by domain experts for a biomedical signal, as well as the general time series analysis methods. After the feature extraction is performed, each segment will be described by a set of feature values. A nonexhaustive list of possible features that can be extracted from biomedical signals is listed in Table 1.3, based on literature survey. Most of the features used for biomedical signal analysis reported in literature are statistical, time-domain or frequency-domain based. However, lately, many features from nonlinear domain have been explored in various studies. Current conclusion is that nonlinear features may contribute to the efficiency of classification by combining them with other time and frequency domain features but are rarely sufficiently strong to discern between the classes on their own [82, 84]. Most of the
Feature
Baseline width of the minimum square difference triangular interpolation of the highest peak of the histogram of all NN intervals Mean of the EEG signal Standard deviation of the EEG signal Mean of the first differences of the EEG signal (including normalized version of the feature) Mean of the second differences of the EEG signal (including normalized version of the feature) Hjorth activity, mobility, and complexity higher-order statistics
TINN
Mean
StandardDeviation
MeanFirstDiff
MeanSecondDiff
HjorthParameters
EEG
EEG
EEG
EEG
EEG
HRV
HRV
Total number of all NN intervals divided by the height of the histogram of all NN intervals measured on a discrete scale with bins of / s
HRV
The square root of the mean of the sum of the squares of differences between adjacent NN intervals
HRV_TI
HRV
Standard deviation of all NN intervals
Percentage of the number of pairs of adjacent NN intervals differing by more than HRV ms
HRV
Signal type
Average of all NN intervals
Description
pNN
Statistical and AVNN geometric SDNN time domain features RMSSD
Category
Table 1.3: An overview of commonly used features for biomedical signals classification.
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
Reference
12 Alan Jovic
Time– frequency
Frequencydomaina
Instantaneous frequency of first four IMFs of Hilbert–Huang transform Central tendency measure of first three IMFs in z-plane of Hilbert–Huang transform Slope of the regression line of the power spectrum median frequency of Hilbert–Huang transform
CTMHHT
SLOPEHHT
Spectral power in gamma band ( ≤ f ≤ Hz)
GammaBand
IMFInstFreqHHT
EEG
Spectral power in beta band ( ≤ f ≤ Hz)
BetaBand
Renyi’s entropy of continuous wavelet transform
EEG
Spectral power in alpha band (. ≤ f ≤ . Hz)
AlphaBand
RenyiCWT
EEG
Spectral power in theta band ( ≤ f ≤ Hz)
ThetaBand
Higher-order statistics of wavelet packet decomposition
EEG
Spectral power in delta band ( ≤ f ≤ Hz)
DeltaBand
HOSWPD
EEG
Spectral Shannon’s entropy
SpectEn
Standard deviation of Haar’s wavelet Renyi’s entropy of Wigner–Ville transform
HRV
Ratio LF / HF
LF/HF
HaarWavSD RenyiWignerVille
HRV
Spectral power in HF range (. ≤ f ≤ . Hz)
HF
EMG
VAG
EEG
ECG, EEG
ECG
HRV ECG
HRV
HRV
Spectral power in LF range (. ≤ f ≤. Hz)
LF
(continued )
[]
[]
[]
[, ]
[]
[] []
[]
[]
[]
[]
[]
[]
[]
[]
[]
1 Feature selection in biomedical signal classification process
13
HRV HRV
Central tendency measure Recurrence plot features
Cross-recurrence plot features, multivariate
Phase synchronization mean phase coherence, multivariate
Asymmetry of the distribution of phase differences between two signals Synchronization likelihood, multivariate
Detrended fluctuation analysis short- and long-term complexity Power law /f α exponent of the frequency spectrum
CTM
REC, LMean, DET, RecShEn, Lam
CREC, CLMean, CDET, CRecShEn, Clam
Mean phase coherence
Phase lag index
Synchronization likelihood
DFA α, DFA α
α
Nonlinear, fractal
Spatial filling index
SFI
EEG
EEG
EEG
EEG
HRV
HRV
HRV
HRV
Poincare plot standard deviations ratio
SD/SD
Signal type
Nonlinear, phase space
Description
Feature
Category
Table 1.3 (continued )
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
Reference
14 Alan Jovic
Symbolic transfer entropy for two signals Permutation entropy and weighed permutation entropy
SymbTransEn
PermEn
Heart rate turbulence onset and slope
HRT
Alphabet entropy average, maximum, and variance; rate of occurrence for all letters; alphabet entropy (expanded Carnap’s entropy) averages for all letters
Mutual information for bivariate analysis
Mutual information
AlphEn
Nonlinear Allan factor
Allan factor
Transfer entropy
TransEn
Multiscale asymmetry index, mean, and standard deviation
Multiscale sample entropy, usually on scales up to
MultiSampEn
MultiAIMean, MultiAIStDev
Fuzzy approximate entropy features, also maximum fuzzy approximate entropy feature
FuzzyApEn
Lempel–Ziv complexity
HRV
Sample entropy features, also maximum sample entropy feature
SampEn
LZComp
HRV
Approximate entropy features, also maximum approximate entropy feature
ApEn
EEG
EEG
HRV
HRV
EEG
HRV
HRV
HRV
EEG
HRV
HRV
HRV
Renyi’s entropy
RenyiEn
HRV
Corrected conditional Shannon’s entropy
CCE
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
Frequency domain features may be calculated using a variety of methods for power spectrum density estimation, for example, Fourier transform, Burg autoregressive method, Lomb–Scargle periodogram, and so on.
a
Symbolic dynamics
Nonlinear, other
Nonlinear, entropy
1 Feature selection in biomedical signal classification process
15
16
Alan Jovic
nonlinear features may be applicable to a variety of biomedical signals. However, also many of the methods were only applied to a single type of biomedical signal for classification purposes. Although all the steps of the biomedical signal classification process are optional, except for classification model construction, currently, a vast majority of the published works use segmentation and preprocessing as well as feature extraction steps. The only notable exception to this practice lies in the research that employs deep learning models, especially various deep neural network architectures in constructing accurate biomedical signal classification models [87]. Here, the models are usually built on the segmented raw time series, with little preprocessing involved. We consider this topic in more detail in Section 1.5. Regarding the classification methods, the prevailing machine-learning algorithms for biomedical signal classification are various types of artificial neural networks, support vector machines, K-nearest neighbors, decision trees, and decision tree ensembles such as random forest [52, 88–90]. Since the topic of this chapter is feature selection, we do not go into details of the well-known classification methods applied to biomedical signal classification.
1.3.2 Feature selection as a part of the biomedical signal analysis process Feature selection step may be used in biomedical signal classification to improve classification accuracy for the used classification model as well as to establish more easily comprehensible models. While feature selection has been extensively used in the analysis of biomedical datasets in general [45], the use in biomedical signal analysis domain specifically has been rather limited. The reason may be because the feature sets considered in signal analysis are not as large as in other biomedical domains and because machine-learning approaches have been considered less often, while other, more expert-relying and signal-processing approaches dominate the field [91]. Here, we provide some exemplary research that used feature selection in biomedical signal analysis process and achieved significant improvements in classification accuracy and the size of the models. The reader should note that the overview is not exhaustive and is only meant to highlight the significance of the use of feature selection in this field. Pecchia et al. [92] used simple feature selection based on exhaustive search to find the best subset of features for most accurate classification between healthy subjects and congestive heart failure patients, using short-term HRV signals. The approach was feasible to use due to consideration of only nine features, which lead to 512 possible feature subsets to evaluate. In a paper later published by the same research group [93], a 13-features set led to 8,192 possible feature subsets that needed to be evaluated using exhaustive search for distinction between different classes of congestive heart failure patients. This was
1 Feature selection in biomedical signal classification process
17
again proved to be feasible. Jovic and Jovic [84] used symmetrical uncertainty filter measure to rank relevant alphabet entropy features for arrhythmia classification from short-term HRV records and obtained sensitivity and specificity improvements when combining top 10 out of 138 alphabet entropy features with other commonly used feature combinations in the field. Their work has shown that nonlinear features contribute to the accuracy of the model that could have been achieved by linear features. Isler et al. [94] used a simple t-test filter to remove the irrelevant features in a multistage classification setting to classify congestive heart failure based on short-term HRV. The t-test reduced the feature set by more than 50% when considering 5% statistical significance threshold, as well as improved the accuracy compared to the whole feature set. Kutlu and Kuntalp [62] used genetic algorithm as the search method for K-nearest neighbor-based wrapper feature selection in a multistage automatic arrhythmia recognition and classification system based on ECG records. They managed to improve the accuracy from 81.31% when all the features were used to 93.59% when the selection procedure was completed. Houssein et al. [89] used a wrapper approach based on twin support vector machine and particle swarm optimization coupled with gravitational search algorithm for highly accurate classification of heartbeats from ECG records, improving the accuracy with feature selection from 85.87% to 99.44%. While their method appears to be highly resource demanding compared to not considering feature selection at all, the reported improvements in accuracy are highly significant. Mechmeche et al. [95] have proposed a two-stage feature selection method for a more automated diagnosis of epilepsy. Both stages used wrappers with simple classifiers. In the first stage, each feature was evaluated and ranked according to the accuracy value obtained by linear discriminant analysis classifier. In the second stage, a predetermined subset of the highly ranked features was further evaluated using sequential backward elimination and Mahalanobis distance classifier. The method achieved the perfect score of 100% accuracy in the detection of epileptic episodes. Zoubek et al. [96] proposed a wrapper approach based on forward feature selection and backward elimination that achieved the best results (around 85%) in combination with multilayer perceptron neural network for sleep/wake stages (five classes) classification. The approach used various features from the EEG, EMG, and EOG to perform efficient modeling of the sleep/wake stages. Alonso-Atienza et al. [97] used a hybrid feature selection approach consisting of a combined criterion based on three filter methods (correlation, mRmR, and Fisher score) and backward eliminationbased wrapper with SVM classifier to detect life-threatening arrhythmias from ECG records. The method reduced the feature set from 13 to 9 features, which was shown to reduce the balanced error rate for support vector machine parametrization in detecting shockable arrhythmias. The approach was found to be unsatisfactory for ventricular fibrillation detection, though, as all 13 features appeared to be at least weakly relevant for that purpose.
18
Alan Jovic
1.4 Feature selection software 1.4.1 Available software implementations Feature selection procedures require appropriate software implementations to automate the analysis process. Although there are many research papers available that cover feature selection, the details on the software and the packages used in these researches are seldom disclosed. Some researchers rely on their own software implementations and do not open them to the public. Others use freely available software implementations, but do not mention the details about the software used. Currently, many software solutions exist for performing feature selection as a part of classification process, although rarely do such solutions provide specialized support only for biomedical signal classification or specifically for feature selection. A list of available software implementations for some of the feature selection methods is listed in Table 1.4. The software listed in Table 1.4 are not exclusively used in biomedical signal classification. Since some software contain many implemented methods, we do not provide a detailed list of implemented feature selection packages and methods, but the readers are instead referred to the corresponding literature.
Table 1.4: A list of commonly used software solutions supporting feature selection methods for biomedical signal classification. Software solution
License
Language
Supported methods
Matlab
Commercial
Matlab
Many methods, e.g. FEAST toolbox [], PRTools+ []
scikit-learn
Free
Python
Many methods []
R
Free
R
Many methods []
Weka
Free and commercial
Java
Many methods []
RapidMiner
Commercial
Java
Sequential forward selection, sequential backward elimination, genetic algorithm search []
Feature Selection Toolbox (FST)
Free
C++
FST version supports many methods []
Mlxtend
Free
Python
Exhaustive search, forward selection, forward floating selection, backward elimination, backward floating elimination []
1 Feature selection in biomedical signal classification process
19
1.4.2 MULTISAB platform feature selection implementation We are currently developing an online web platform for automated analysis and diagnosis of disorders from various types of biomedical signals, called the MULTISAB (multiple time series analysis in biomedicine) platform [105]. This platform contains, within its web-based dataflow model [106], a model construction step that has the option to perform feature selection on the previously extracted features from the biomedical signals (Figure 1.3). The platform supports many features for the analysis of ECG, EEG, and HRV. The details regarding the available feature implementations may be found in Jovic et al. [105].
Analysis type selection
Records preprocessing
Scenario selection
Feature extraction
Input data selection
Model construction (including feature selection)
Records inspection
Reporting
Figure 1.3: Feature selection step as a part of the complete MULTISAB web platform’s biomedical signal analysis process.
Currently, the platform supports only the filter-based feature selection. The reason for excluding wrapper, hybrid, and other more complex approaches is degraded platform performance. Online users expect that the execution of the operations within the web platform is fast, in order to avoid prolonged waiting times on the internet. Therefore, symmetrical uncertainty, χ2, and ReliefF filter algorithms (Table 1.1) are currently implemented. Since the platform is in development and under considerable testing, we plan to add some other filter methods as well as different strategies for the selection of appropriate features. The currently implemented feature selection strategies enable: (1) the selection of a percentage of the highest ranked features from a single filter method, and (2) the selection of a percentage of the highest ranked features from two or more filter methods, where all the features selected from all the methods are added cumulatively if they are mutually exclusive. Other strategies might include the selection of a fixed number of features from all the methods, where the highest ranked features are selected sequentially from each method until the percentage threshold is reached, and the selection of only those features that satisfy the predetermined thresholds for each feature selection method involved.
20
Alan Jovic
1.5 Deep learning methods for biomedical signal classification Deep learning models, such as deep convolutional artificial neural network (CNN), most often used for classification problems, may be trained both on feature vectors as well as on raw signal values. However, the most common approach is to use raw signal that is conveniently preprocessed. This approach avoids the domain-based feature extraction step altogether, as well as feature selection step. Here, we provide an overview of the applied approaches of deep learning models in biomedical signal classification. Probably the first use of CNN in biomedical signal classification was done by Kiranyaz et al. [107] with the application in ECG heartbeats classification. They used R peak detection algorithm and centered the analyzed raw ECG segments on the R peak, with 128 samples taken altogether as input to the CNN. Their promising results in heartbeat detection opened a venue for several later studies, including the one by Al Rahhal et al. [108]. They used CNN for classification of heartbeats into five common types, as determined by Association for the Advancement of Medical Instrumentation standard. Their approach used filtering of the raw ECG to remove baseline wandering, detection of R peaks in ECG, transformation of a given number of samples before and after the R peak using CWT, resizing the images to fit a pretrained CNN network, and training of the images to achieve model generalization. The results showed superior performance when compared to previous approaches. Deep learning has also been used in EEG analysis, including brain–computer interface applications [109], as well as sleep stage discerning [110]. Tang et al. [109] used CNN to classify between left- and right-hand movements based on short-term analysis of EEG signals. The signals were filtered to remove noise and the input to the network consisted of 28 channels × 60 samples. The results presented were better than support vector machines trained on features from frequency domain analysis. Sors et al. [110] used raw single-channel EEG to train CNN to differentiate between four sleep stages and the awake state based on 30 s segments with 125 Hz sampling. Their deep neural network consisted of 12 convolutional layers, one fully connected layer and one output layer. They obtained state of the art results with accuracy equal to 87%. Wang et al. [13] used another type of neural network, called cycle deep belief network, on the same problem of sleep stages and awake state classification. However, in their case, heterogeneous multivariate time series consisting of one EEG signal, two EOG signals, and one EMG signal were used for assessment. After noise removal, they segmented the signals to 64 input samples for the network per segment lasting 1 s, thus corresponding to the 64 Hz downsampled signal. The results proved to be better when compared to simple recurrent neural network, deep relief network, and K-nearest neighbors’ classifier.
1 Feature selection in biomedical signal classification process
21
A recent study published in Nature Medicine journal [111] reported a cardiologist-level accuracy in detecting arrhythmia from ambulatory ECGs using deep convolutional neural network. The main contribution of this work was the use of a vast database of 91,232 ECG recordings from 53,549 patients in construction of the model and 328 patients in the test dataset. Records of 30 s duration, sampled at 200 Hz, were not preprocessed prior to the use of CNN. The mean F1 score of 0.837 exceeded that of an average cardiologist in discerning 12 types of arrhythmias. The main drawback of this study appears to be the black-box model of CNN, from which it is unapparent why a diagnosis was reached. While this fact may limit its applicability in practice, still, the achieved accuracy is formidable and challenges the limits of the potential use of artificial intelligence methods in biomedicine.
1.6 Discussion and conclusion Evidently, deep learning approaches offer great benefits and improvements in biomedical signal classification results compared to classical approaches. It remains to be seen whether CNN and other types of deep architectures can replace expert features extraction and feature selection in all applications of biomedical signal classification. Currently, very few studies compared directly a deep learning approach and a sufficiently detailed expert feature-based approach in biomedical signal classification. As shown in this chapter, there are also plenty of current studies that consider feature extraction and various feature selection approaches. Nevertheless, the potential to just train an “off-the-shelf” deep neural network architecture and achieve state-of-the-art classification results is lucrative and certainly within near reach for some applications. Future studies would need to shed more light on the conditions in which deep learning architectures would be the best approach to follow when classifying biomedical signals. In addition, future work would need to directly compare deep learning and detailed expert feature-based approaches in classification of biomedical signals to establish which methods lead to better results. The topic of optimal segment length for discerning between different organism states and disorders needs to be addressed in future studies, both from the perspective of feature extraction and feature selection, as well as from deep learning perspective. The analysis of longer segments, which may be necessary to efficiently detect or predict certain disorders (e.g., epilepsy, congestive heart failure), is certainly a limiting factor for the use of deep learning approaches, since the amount of data needed to learn the models may currently be too large to be efficiently handled. As this issue is domain specific, future work should focus in more detail on the optimality of segmenting approaches for various conditions and various biomedical signals.
22
Alan Jovic
Currently, there are many medical papers that consider individual features for discerning between different organism states and disorders based on information from biomedical signals, for example, see HRV analysis guidelines [55]. The main problems with considering individual features instead of feature combinations and the use of feature selection are: (1) decreased accuracy compared to feature combinations, and (2) limited applicability beyond the currently investigated problem. The second problem is more pronounced for domain-specific features and less so for general time series features. There are two benefits of using the single-feature approach: (1) clear understanding of medical professionals, and (2) avoidance of the use of computationally expensive machine learning methods. Probably, the current approach followed by many medical researchers to isolate a single feature that works best for their problem is suboptimal. Rather low complexity and easily interpretable machine learning models such as decision trees or classification rules may be used instead to achieve higher accuracy, while ensuring clear interpretation by limiting the size of the trees or rules. The approach is followed by biomedical engineering and computer science experts, but rarely by medical professionals [66, 92]. As we have shown earlier, there is also a large potential for combinations of nonlinear features to be applied in various biomedical signal domains, which may lead to significant improvements in the field. Feature selection methods, as described in this work, may thus present the best bridge between mostly uninterpretable, highdimensional machine learning models and single feature approach used by medical professionals. Finally, the topic of interpretability is one of the main reasons why one would employ feature extraction and feature selection, combined with an appropriate interpretable classifier (e.g., a decision tree). While there were many attempts to interpret the decisions made by decision tree ensembles, neural networks, and deep neural networks [112], there is no clear solution to such interpretation currently available. Since medical professionals need to make informed decisions, a black-box-type answer from a deep learning model may not be satisfactory when appropriate diagnosis needs to be made. In the case where model explanation is necessary, and where the best possible interpretable model is sought, feature extraction and feature selection still offer many advantages compared to deep learning approaches. Since the topic of interpretability is very significant in biomedicine, future work would need to (1) include both black-box as well as interpretable models, and (2) compare the difference in accuracies between the two approaches and discuss whether the difference is significant enough to merit the use of black-box models from clinical perspective. Acknowledgments: This work has been fully supported by the Croatian Science Foundation under the project number UIP-2014-09-6889.
1 Feature selection in biomedical signal classification process
23
References [1] [2] [3]
[4] [5]
[6]
[7] [8]
[9]
[10]
[11]
[12]
[13]
[14] [15] [16]
[17]
Tang J, Alelyani S, Liu H. Feature Selection for Classification: A Review. In Aggarwal C., ed. Data Classification: Algorithms and Applications, Boca Raton, FL, USA, CRC Press, 2014. Hira ZM, Gillies DF. A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data. Adv Bioinformatics 2015, 2015, 198363. DOI:10.1155/2015/198363 Xuan G, Zhu X, Chai P, Zhang Z, Shi YQ, Fu D. Feature Selection based on the Bhattacharyya Distance. In Proc. 18th Int. Conf. on Pattern Recognition (ICPR’06), 2006, vol. 3, 1232–5. DOI:10.1109/ICPR.2006.558 Price NW. Big data and black-box medical algorithms. Science Translational Medicine 2018, 10(471), eaao5333. DOI:10.1126/scitranslmed.aao5333 Zemmal N, Azizi N, Dey N, Sellami M. Adaptive Semi Supervised Support Vector Machine Semi Supervised Learning with Features Cooperation for Breast Cancer Classification. Journal of Medical Imaging and Health Informatics 2016, 6(1), 53–62. DOI:10.1166/jmihi.2016.1591 Jovic A, Brkic K, Bogunovic N. A review of feature selection methods with applications. In Proc. 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2015), 2015, 1200–5. DOI:10.1109/ MIPRO.2015.7160458 Dey N, Ashour A. Classification and Clustering in Biomedical Signal Processing. Hershey, PA, USA, IGI Global, 2016. Li J, Fong S, Liu L-S, Dey N, Ashour AS, Moraru L. Dual feature selection and rebalancing strategy using metaheuristic optimization algorithms in X-ray image datasets. Multimed. Tools Appl. 2019. DOI:10.1007/s11042-019-7354-5 Zemmal N, Azizi N, Sellami M, Zenakhra D, Cheriguene S, Dey N, Ashour AS. Robust feature selection algorithm based on transductive SVM wrapper and genetic algorithm: application on computer-aided glaucoma classification. International Journal of Intelligent Systems Technologies and Applications 2018, 17(3), 310–46. DOI:10.1504/IJISTA.2018.094018 Rückstieß T, Osendorfer C, van der Smagt P. Sequential Feature Selection for Classification. In Proc. Australasian Joint Conference on Artificial Intelligence (AI 2011), 2011, Advances in Artificial Intelligence, LNCS, vol. 7106, 132–41. Yu K, Ding W, Simovici DA, Wang H, Pei J, Wu X. Classification with streaming features: An emerging pattern mining approach. ACM Trans. Knowl. Discov. Data 2015, 9(4), 30. DOI:10.1145/2700409 Ameri A, Akhaee MA, Scheme E, Englehart K. Real-time, simultaneous myoelectric control using a convolutional neural network. PLoS ONE 2018, 13(9), e0203835. DOI:10.1371/journal. pone.0203835 Wang S, Hua G, Hao G, Xie C. A Cycle Deep Belief Network Model for Multivariate Time Series Classification. Mathematical Problems in Engineering 2017, 2017, 9549323. DOI:10.1155/ 2017/9549323 Yu L, Liu H. Efficient Feature Selection via Analysis of Relevance and Redundancy. J. Mach. Learn. Res. 2004, 5, 1205–24. Hall MA. Correlation-based Feature Subset Selection for Machine Learning. Doctoral Thesis, University of Waikato, Hamilton, New Zealand, 1998. Peng H, Long F, Ding C. Feature selection based on mutual information criteria of maxdependency, max-relevance, and minredundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27(8), 1226–38. Witten IH, Frank E, Hall M, Pal CJ. Data mining: Practical machine learning tools and techniques. 4th ed. San Francisco CA, USA: Morgan Kaufmann, 2016.
24
[18]
[19]
[20] [21] [22] [23] [24] [25] [26] [27]
[28] [29] [30]
[31]
[32]
[33]
[34] [35]
[36] [37]
Alan Jovic
Yu L, Liu H, Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In Proc. 20th Int. Conf. on Machine Learning (ICML-2003), 2003, Washington DC, USA, AAAI Press, 856–63. Liu H, Setiono R. A Probabilistic Approach to Feature Selection-A Filter Solution. In Proc. 13th International Conference on Machine Learning (ICML-1996), 1996, Bary, Italy, Morgan Kaufmann, 319–27. Duda RO, Hart PE, Stork DG. Pattern classification. Hoboken, NJ, USA, Wiley-interscience, 2012. Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH. Relief-based feature selection: Introduction and review. Journal of Biomedical Informatics 2018, 85, 189–203. Zhao Z, Liu H. Spectral feature selection for supervised and unsupervised learning. In Proc. 24th Int. Conf. on Machine learning (ICML ‘07), 2007, 1151–7 Hoque N, Bhattacharyya DK, Kalita JK. MIFS-ND: A mutual information-based feature selection method. Expert Syst. Appl. 2014, 41(14), 6371–85. DOI:10.1016/j.eswa.2014.04.019 Bennasar M, Hicks Y, Setchi R. Feature selection using Joint Mutual Information Maximisation. Expert Syst. Appl. 2015, 42(22), 8520–32, DOI:10.1016/j.eswa.2015.07.007 Moradi P, Rostami M. Integration of graph clustering with ant colony optimization for feature selection. Knowledge-Based Systems 2015, 84, 144–61. Thangavel K, Pethalakshmi A. Dimensionality reduction based on rough set theory: A review. Appl. Soft Comput. 2009, 9(1), 1–12. DOI:10.1016/j.asoc.2008.05.006 Chakraborty B, Chakraborty G. Fuzzy consistency measure with particle swarm optimization for feature selection. In Proc. 2013 IEEE Int. Conf. on Systems, Man, and Cybernetics, 2013, 4311–5. DOI: 10.1109/SMC.2013.735 Narendra PM, Fukunaga K. A Branch and Bound Algorithm for Feature Subset Selection. IEEE Trans. Computers 1977, C-26(9), 917–22. Molina LC, Belanche L, Nebot A. Feature Selection Algorithms: A Survey and Experimental Evaluation. In Proc. 2002 IEEE Int. Conf. on Data Mining (ICDM 2002), 2002, 9–12. García López FC, García Torres M, Moreno Pérez JA, Moreno Vega JM. Scatter Search for the Feature Selection Problem. In: Conejo R, Urretavizcaya M, Pérez-de-la-Cruz JL, eds. Current Topics in Artificial Intelligence. Springer, Berlin, Heidelberg, LNCS, vol. 3040, 2003, 517–25. Esmat R, Hossein N. Feature subset selection using improved binary gravitational search algorithm. Journal of Intelligent & Fuzzy Systems 2014, 26(3), 1211–21. DOI:10.3233/IFS30807 Xue B, Zhang M, Browne WN, Yao X. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Trans. Evolut. Comput. 2016, 20(4), 606–26. DOI:10.1109/ TEVC.2015.2504420 Drotár P, Gazda J, Smékal Z. An experimental comparison of feature selection methods on two-class biomedical datasets. Comput. Biol. Med. 2015, 66(C), 1–10. DOI:10.1016/j. compbiomed.2015.08.010 Bae C, Yeh WC, Chung YY, Liu SL. Feature selection with intelligent dynamic swarm and rough set. Expert Syst. Appl. 2010, 37(10), 7026–32. DOI:10.1016/j.eswa.2010.03.016 Khushaba RN, Al-Ani A, Al-Jumaily A. Feature subset selection using differential evolution and a statistical repair mechanism. Expert Syst. Appl. 2011, 38(9), 11515–26. DOI:10.1016/j. eswa.2011.03.028 Liu C, Wang W, Zhao Q, Shen X, Konan M. A new feature selection method based on a validity index of feature subset. Patt. Recog. Lett. 2017, 92, 1–8, DOI: 10.1016/j.patrec.2017.03.018 Saeys Y, Abeel T, Van de Peer Y. Robust Feature Selection Using Ensemble Feature Selection Techniques. In: Daelemans W, Goethals B, Morik K, eds. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008, Springer, Berlin, Heidelberg, LNCS, vol 5212, 2008, 313–25.
1 Feature selection in biomedical signal classification process
25
[38] Lal TN, Chapelle O, Weston J, Elisseeff A. Embedded methods. In: Guyon I, Gunn S, Nikravesh M, Zadeh AL, eds., Feature Extraction: Foundations and Applications. Springer, Berlin, Heidelberg, Studies in Fuzziness and Soft Computing, vol. 207, 2006, 137–65. [39] Breiman L. Random Forests. Mach. Learn. 2001, 45(1), 5–32. [40] Guyon I, Weston J, Barnhill S, Vapnik V. Gene Selection for Cancer Classification using Support Vector Machines. Mach. Learn. 2002, 46(1–3), 389–422. [41] Hsu HH, Hsieh CW, Lu MD. Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl. 2011, 38(7), 8144–50. DOI:10.1016/j.eswa.2010.12.156 [42] Cadenas JM, Garrido MC, Martínez R. Feature subset selection Filter–Wrapper based on low quality data. Expert Syst. Appl. 2013, 40, 6241–52. DOI:10.1016/j.eswa.2013.05.051 [43] Ali SI, Shahzad W. A Feature Subset Selection Method based on Conditional Mutual Information and Ant Colony Optimization. Int. J. Comput. Appl. 2012, 60(11), 5–10. DOI:10.5120/9734–3389. [44] Wang X, Yang J, Teng X, Xia W, Jensen R. Feature selection based on rough sets and particle swarm optimization. Patt. Recogn. Lett. 2007, 28(4), 459–71. DOI:10.1016/j. patrec.2006.09.003 [45] Inbarani HH, Azar AT, Jothi G. Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput. Methods Programs Biomed. 2014, 113(1), 175–85. DOI:10.1016/j.cmpb.2013.10.007 [46] Sarafrazi S, Nezamabadi-pour H. Facing the classification of binary problems with a GSASVM hybrid system. Mathem. Comp. Modelling 2013, 57(1–2), 270–8. DOI:10.1016/j. mcm.2011.06.048 [47] Lu H, Chen J, Yan K, Qun J, Yu X, Zhigang G. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 2017, 256, 56–62. DOI:10.1016/j. neucom.2016.07.080 [48] Ghayab HRA, Li Y, Abdulla S, Diykh M, Wan X. Classification of epileptic EEG signals based on simple random sampling and sequential feature selection. Brain Inf. 2016, 3, 85. DOI:10.1007/s40708-016-0039-1 [49] Dong G, Li J. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM, 1999, 43–52. [50] Yu K, Wu X, Ding W, Pei J. Scalable and Accurate Online Feature Selection for Big Data. ACM Trans. Knowl. Disc. Data 2016, 11(2), 16. DOI:10.1145/2976744 [51] Zhou P, Hu X, Li P, Wu X. Online feature selection for high-dimensional class-imbalanced data. Knowledge-Based Systems 2017, 136, 187–99. DOI:10.1016/j.knosys.2017.09.006 [52] Husain SJ, Rao K. An artificial neural network model for classification of epileptic seizures using Huang–Hilbert transform. Int. J. Soft. Comput. 2014, 5(3), 23. DOI:10.5121/ijsc.2014.5301 [53] Ostojic S, Peharec S, Srhoj-Egekher V, Cifrek M. Differentiating patients with radiculopathy from chronic low back pain patients by single surface EMG parameter. Automatika 2018, 59(3–4), 400–7. DOI:10.1080/00051144.2018.1553669 [54] Friganovic K, Kukolja D, Jovic A, Cifrek M, Krstacic G. Optimizing the detection of characteristic waves in ECG based on processing methods combinations. IEEE Access 2018, 6, 50609–26. DOI:10.1109/ACCESS.2018.2869943 [55] Sassi R, Cerutti S, Lombardi F, et al. Advances in heart rate variability signal analysis: joint position statement by the e-Cardiology ESC Working Group and the European Heart Rhythm Association co-endorsed by the Asia Pacific Heart Rhythm Society. Europace 2015, 17, 1341–53. DOI: 10.1093/europace/euv015 [56] Wang XW, Nie D, Lu BL. EEG-Based Emotion Recognition Using Frequency Domain Extracted Features and Support Vector Machines. In Proc. Int. Conf. on Neural Information Processing (ICONIP 2011). Springer, Berlin, Heidelberg, LNCS, vol 7062, 2011, 734–43.
26
[57]
[58]
[59] [60]
[61] [62] [63]
[64] [65]
[66]
[67]
[68] [69] [70]
[71]
[72] [73]
[74]
Alan Jovic
Hamida STB, Ahmed B, Penzel T. A novel insomnia identification method based on Hjorth parameters. In Proc. IEEE Int. Symp. on Signal Processing and Information Technology (ISSPIT 2015), 2015, 548–52. DOI:10.1109/ISSPIT.2015.7394397 Asl BM, Setarehdan SK, Mohebbi M. Support vector machine-based arrhythmia classification using reduced features of heart rate variability signal. Artif. Intell. Med. 2008, 44(1), 51–64. DOI:10.1016/j.artmed.2008.04.007 Stam CJ. Nonlinear dynamical analysis of EEG and MEG: Review of an emerging field. Clinical Neurophysiology 2005, 116, 2266–301. DOI:10.1016/j.clinph.2005.06.011 Teich MC. Multiresolution Wavelet Analysis of Heart Rate Variability for Heart-Failure and Heart-Transplant Patients. In Proc. 20th Ann. Int. Conf. IEEE/EMBS, 1998, 1136–41. DOI:10.1109/IEMBS.1998.747071 Faust O, Acharya RU, Krishnan SM, Min LC. Analysis of Cardiac Signals Using Spatial Filling Index and Time-Frequency Domain. Biomed Eng Online 2004, 3, 30. DOI:10.1186/1475-925X-3-30 Kutlu Y, Kuntalp D. A multi-stage automatic arrhythmia recognition and classification system. Comput. Biol. Med. 2011, 41(1), 37–45. DOI:10.1016/j.compbiomed.2010.11.003 Fraiwan L, Lweesy K, Khasawneh N, Wenz H, Dickhaus H. Automated sleep stage identification system based on time–frequency analysis of a single EEG channel and random forest classifier. Comput. Methods Programs Biomed. 2012, 108(1), 2012, 10–19. DOI:10.1016/j.cmpb.2011.11.005 Oweis RJ, Abdulhay EW. Seizure classification in EEG signals utilizing Hilbert-Huang transform. Biomed. Eng. Online 2011, 10, 38. DOI:10.1186/1475-925X-10-38 Nalband S. Valliappan CA, Prince RGAA, Agrawal A. Feature extraction and classification of knee joint disorders using Hilbert Huang transform. In Proc. 14th Int. Conf. on Electrical Engineering/ Electronics, Computer, Telecommunications and Information Technology (ECTI-CON 2017), 266–9. DOI:10.1109/ECTICon.2017.8096224 Jovic A, Bogunovic N. Electrocardiogram Analysis Using a Combination of Statistical, Geometric, and Nonlinear Heart Rate Variability Features. Artif. Intell. Med. 2011, 51(3), 175–86. DOI:10.1016/j.artmed.2010.09.005 Dos Santos L, Barroso JJ, Macau EEN, De Godoy MF. Assessment of heart rate variability by application of central tendency measure. Med. Biol. Eng. Comput. 2015, 53(11), 1231–7. DOI:10.1007/s11517-015-1390-8 Acharya RU, Joseph KP, Kannathal N, Lim CM, Suri JS. Heart Rate Variability: A Review. Med Biol Eng Comput 2006, 44(12), 1031–51. DOI:10.1007/s11517-006-0119-0 Zbilut JP, Giuliani A, Webber Jr CL. Detecting deterministic signals in exceptionally noisy environments using cross-recurrence quantification. Phys Lett A 1998, 246, 122–8. Mormann F, Lehnertz K, David P, Elger CE. Mean phase coherence as a measure for phase synchronization and its application to the EEG of epilepsy patients. Physica D 2000, 144, 358–69. Stam CJ, Nolte G, Daffertshofer A. Phase lag index: Assessment of functional connectivity from multi channel EEG and MEG with diminished bias from common sources. Hum. Brain Mapp. 2007, 28(11), 1178–93. DOI:10.1002/hbm.20346 Stam CJ, van Dijk BW. Synchronization likelihood: an unbiased measure of generalized synchronization in multivariate data sets. Physica D 2002, 163, 236–41. Porta A, Baselli G, Liberati D, Montano N, Cogliati C, Gnecchi-Ruscone T, Cerutti S. Measuring regularity by means of a corrected conditional entropy in sympathetic outflow. Biol. Cybern. 1998, 78(1), 71–78. DOI:10.1007/s004220050414 Wessel N, Malberg H, Bauernschmitt R, Kurths J. Nonlinear methods of cardiovascular physics and their clinical applicability. Int. J. Bifurcation Chaos 2007, 17(10), 3325. DOI:10.1142/S0218127407019093
1 Feature selection in biomedical signal classification process
[75] [76] [77]
[78]
[79]
[80] [81]
[82]
[83] [84]
[85] [86]
[87]
[88]
[89]
[90]
[91]
[92]
27
Pincus SM, Goldberger AL. Physiological time-series analysis: what does regularity quantify? Am. J. Physiol. Heart. Circ. Physiol. 1994, 266, H1643–56. Richman JS, Moorman JR. Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart. Circ. Physiol. 2000, 278, H2039–49. Xie HB, Guo YJ, Zheng YP. Fuzzy approximate entropy analysis of chaotic and natural complex systems: detecting muscle fatigue using electromyography signals. Ann. Biomed. Eng. 2010, 38(4), 1483–96. DOI:10.1007/s10439-010-9933-5 Ho YL, Lin C, Lin YH, Lo MT. The prognostic value of non-linear analysis of heart rate variability in patients with congestive heart failure – a pilot study of multiscale entropy. PLoS One 2011, 6(4), e18699. DOI:10.1371/journal.pone.0018699 Vicente R, Wibral M, Lindner M, Pipa G. Transfer entropy – a model-free measure of effective connectivity for the neurosciences. J. Comput. Neurosci. 2011, 30, 45. DOI:10.1007/s10827010-0262-3 Zhang XS, Zhu YS, Thakor NV, Wang ZZ. Detecting ventricular tachycardiaand fibrillation by complexity measure. IEEE Trans. Biomed. Eng. 1999, 46(5), 548–55. DOI:10.1109/10.759055 Costa M, Goldberger AL, Peng CK. Broken Asymmetry of the Human Heartbeat: Loss of Time Irreversibility in Aging and Disease. Phys. Rev. Lett. 2005, 95, 198102. DOI:10.1103/ PhysRevLett.95.198102 Jovic A, Bogunovic N. Evaluating and Comparing Performance of Feature Combinations of Heart Rate Variability Measures for Cardiac Rhythm Classification. Biomed. Signal Process. Control 2012, 7(3), 245–55. DOI:10.1016/j.bspc.2011.10.001 Pereda E, Quiroga RQ, Bhattacharya J. Nonlinear multivariate analysis of neurophysiological signals. Prog. Neurobiol. 2005, 77(1–2), 1–37. DOI:10.1016/j.pneurobio.2005.10.003 Jovic A, Jovic F. Classification of cardiac arrhythmias based on alphabet entropy of heart rate variability time series. Biomed. Signal Process. Control 2017, 31, 217–30. DOI:10.1016/j. bspc.2016.08.010 Staniek M, Lehnertz K. Symbolic Transfer Entropy. Phys. Rev. Lett. 2008, 100, 158101, DOI:10.1103/PhysRevLett.100.158101 Deng B, Cai L, Li S, Wang R, Yu H, Chen Y, Wang J. Multivariate multi-scale weighted permutation entropy analysis of EEG complexity for Alzheimer’s disease. Cogn. Neurodyn. 2017, 11(3), 217–31. DOI:10.1007/s11571-016-9418-9 Acharya RU, Fujita H, Lih OS, Adam M, Tan JH, Chua CK. Automated detection of coronary artery disease using different durations of ECG segments with convolutional neural network. Knowledge-Based Systems 2017, 132, 62–71. DOI:10.1016/j.knosys.2017.06.003 Orhan U, Hekim M, Ozer M. EEG signals classification using the K-means clustering and a multilayer perceptron neural network model. Expert Syst. Appl. 2011, 38(10), 13475–81. DOI:10.1016/j.eswa.2011.04.149 Houssein EH, Ewees AA, ElAziz MA. Improving Twin Support Vector Machine Based on Hybrid Swarm Optimizer for Heartbeat Classification. Pattern Recognit. Image Anal. 2018, 28(2), 243–53. DOI:10.1134/S1054661818020037 Fallet S, Lemay M, Renevey P, Leupi C, Pruvot E, Vesin JM. Can one detect atrial fibrillation using a wrist-type photoplethysmographic device? Med. Biol. Eng. Comput. 2019, 57(2), 477–87. DOI:10.1007/s11517-018-1886-0 Phinyomark A, Phukpattaranont P, Limsakul C. Feature reduction and selection for EMG signal classification. Expert Syst. Appl. 2012, 39(8), 7420–31. DOI:10.1016/j. eswa.2012.01.102 Pecchia L, Melillo P, Sansone M, Bracale M. Discrimination Power of Short Term Heart Rate Variability Measures for CHF Assessment. IEEE Trans. Inf. Technol. Biomed. 2011, 15(1), 40–6. DOI:10.1109/TITB.2010.2091647
28
Alan Jovic
[93] Melillo P, De Luca N, Bracale M, Pecchia L. Classification Tree for Risk Assessment in Patients Suffering from Congestive Heart Failure via Long-Term Heart Rate Variability. IEEE J-BHI 2013, 17(3), 727–33. DOI: 10.1109/JBHI.2013.2244902 [94] Isler Y, Narin A, Ozer M, Perc M. Multi-stage classification of congestive heart failure based on short-term heart rate variability. Chaos, Solitons & Fractals 2019, 118, 145–51. DOI:10.1016/j.chaos.2018.11.020 [95] Mechmeche S, Salah RB, Ellouze N. Two-Stage Feature Selection Algorithm Based on Supervised Classification Approach for Automated Epilepsy Diagnosis. J. Bioengineer. & Biomedical Sci. 2016, 6, 183. DOI:10.4172/2155-9538.1000183 [96] Zoubek L, Charbonnier S, Lesecq S, Buguet A, Chapotot F. Feature selection for sleep/wake stages classification using data driven methods. Biomed. Signal Process. Control 2007, 2(3), 171–9. DOI:10.1016/j.bspc.2007.05.005 [97] Alonso-Atienza F, Morgado E, Fernandez-Martinez L, Garcia-Alberola A, Rojo-Alvarez JL. Detection of Life-Threatening Arrhythmias Using Feature Selection and Support Vector Machines. IEEE Trans. Biomed. Eng. 2014, 61(3), 832–40. DOI:10.1109/ TBME.2013.2290800 [98] Brown G, Pocock A, Zhao M, Lujan M. Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection. J. Mach. Learn. Res. 2012, 13, 27–66. [99] van der Heijden F, Duin RPW, de Ridder D, Tax DMJ. Classification, parameter estimation and state estimation, an engineering approach using Matlab. Hoboken, NJ, USA, John Wiley & Sons, 2004. [100] Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–30. [101] The R project for statistical computing. The R Foundation, 2019. (Accessed March 14, 2019, at http://www.r-project.org/) [102] RapidMiner. RapidMiner, Inc., 2019. (Accessed March 14, 2019, at https://rapidminer.com/) [103] Somol P, Vácha P, Mikeš S, Hora J, Pudil P, Žid P. Introduction to Feature Selection Toolbox 3 – The C++ Library for Subset Search, Data Modeling and Classification. UTIA Tech. Report No. 2287, 2010, 1–12. [104] Raschka S. MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. J. Open Source Software 2018, 3, 638. [105] Jovic A, Kukolja D, Friganovic K, Jozic K, Cifrek M. MULTISAB: A Web Platform for Analysis of Multivariate Heterogeneous Biomedical Time-Series. In Proc. Int. Conf. IUPESM 2018, Prague, Czech Republic, Springer Nature Singapore, IFMBE Proceedings Volume 68/1, 2018, 311–5. [106] Jovic A, Jozic K, Kukolja D, Friganovic K, Cifrek M. Challenges in Designing Software Architectures for Web Based Biomedical Signal Analysis. In: Hassanien AE, Dey N, Surekha B, eds. Medical Big Data and Internet of Medical Things: Advances, Challenges and Applications, Boca Raton, FL, USA, CRC Press Taylor & Francis Group, 2018, 81–112. [107] Kiranyaz S, Ince T, Hamila R, Gabbouj M. Convolutional Neural Networks for patient-specific ECG classification. In Proc. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC 2015), 2015, 2608–11. DOI:10.1109/EMBC.2015.7318926 [108] Al Rahhal MM, Bazi Y, Al Zuair M, Othman E, BenJdira B. Convolutional Neural Networks for Electrocardiogram Classification. J. Med. Biol. Eng. 2018, 38(6), 1014–25. DOI:10.1007/ s40846-018-0389-7 [109] Tang Z, Li C, Sun S. Single-trial EEG classifcation of motor imagery using deep convolutional neural networks. Optik 2017, 130, 11–8. DOI:10.1016/j.ijleo.2016.10.117 [110] Sors A, Bonnet S, Mirek S, Vercueil L, Payen JF. A convolutional neural network for sleep stage scoring from raw single-channel EEG. Biomed. Signal Process. Control 2018, 42, 107–14. DOI:10.1016/j.bspc.2017.12.001
1 Feature selection in biomedical signal classification process
29
[111] Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019, 25, 65–9. [112] Montavon G, Samek W, Müller KR. Methods for interpreting and understanding deep neural networks. Digital Signal Processing 2018, 73, 1–15. DOI:10.1016/j. dsp.2017.10.011
My Abdelouahed Sabri, Youssef Filali, Assia Ennouni, Ali Yahyaouy and Abdellah Aarab
2 An overview of skin lesion segmentation, features engineering, and classification Abstract: Melanoma is the most perilous form of skin cancer. It is curable if diagnosed and treated at an early stage. Computer-aided diagnosis is widely used in detecting and diagnosing skin lesion. This chapter presents an overview of image-processing algorithms applied in all the steps involved in the skin lesion classification. This study includes image segmentation, features engineering, and classification algorithms. A preprocessing is generally applied to improve the segmentation result. In the feature extraction phase, features used are the ones used by dermatologist in their clinical diagnosis. They are based on the lesion texture and the well-known ABCD (asymmetry, border, color, and diameter) rules. One hundred and thirteen features will be initially extracted. A feature selection procedure will be presented with a comparison of some features selection algorithms to select only the relevant features. Selected features will be used in the classification, and a comparison between the most used classifiers in literature is conducted. Keywords: melanoma, CAD, PDE multi-scale decomposition, segmentation, features engineering, classification
2.1 Introduction Skin cancer is considered very dangerous to humans and is among the most deadly cancers in the world [1]. It can be treated and cured only if diagnosed at an earlier stage [1]. In general, dermatologists classify it into two main categories: melanoma (malignant lesion) and nonmelanoma (benign lesion), where melanoma is the most dangerous one and very difficult to cure. In the nonmelanoma category, there are several types: actinic keratosis, atypical moles, and basal cell carcinoma. Early diagnosis is highly desirable and allows total healing of the lesion. This diagnosis can be done clinically by visually observing the lesion. Nonmelanoma lesions show a highly organized structure than those by melanoma lesions. If not treated in time, nonmelanoma lesions can become a melanoma. Lesions can be in a macroscopic form, captured by cameras, or dermoscopic form, captured by dermatoscope [2]. Due their visual aspect, various imageprocessing approaches have been used to help dermatologist in their diagnosis in detecting and classifying skin lesions. Macroscopic images are basically used in https://doi.org/10.1515/9783110621105-002
32
My Abdelouahed Sabri et al.
computational analysis of skin lesions. However, captured images are under variable illumination conditions, different distances, and may have weak resolution. Dealing with small images is a challenge in computer-aided diagnosis (CAD) and especially in the segmentation step. Another problem for both types of images is the presence of artifacts, reflections, such as hair, and skin lines that might make the analysis more difficult. A number of works in literature [3, 4] propose to use image processing for skin lesion detection and analysis. The whole process can be divided into three principal stages: lesion identification based on image segmentation, features extraction, and lesion classification. Segmentation and features extraction are the key steps and significantly influence the outcome of the classification results. Segmentation will help in extracting the lesion from the image under analysis. Generally, skin images contain many type of noises and their segmentation is a very difficult assignment and need some preprocessing. Several approaches based on color space transformation, contrast enhancement, and artifact removal have been suggested to enhance the segmentation step [5–7]. Features extraction is usually based on the rules adopted by dermatologists in their clinical routine diagnosis. They use the ABCD rules as a visual classification criteria based on the lesion asymmetry, border, color, and diameter [8–10]. An asymmetric lesion is automatically classified as malignant. The border criterion matches the lesion’s form measurement; malignant lesion has an irregular border. For the color criterion, melanoma lesion contains a nonuniform colors. The diameter of the lesion is generally correlated with the size of the lesion. In the case of melanoma lesion, the diameter is greater than or equal to 6 mm. Texture analysis helps in discriminating benign from malignant lesions by valuing the harshness of their structure [10]. The last step is the lesion classification, which consists of the use of the extracted features to predict the type of the pigmented skin lesion. The purpose of this chapter is to present an overview of image-processing algorithms used at all stages from pretreatment to classification and diagnosis of dermoscopiques skin images. It starts with a presentation of the approaches applied for image segmentation by presenting various problems encountered and thus providing an effective solution to improve the segmentation result. Subsequently, various possible features will be detailed followed by a selection procedure to keep only the most relevant ones. At the end, a comparative study of different classification methods will be presented to identify the best to adopt for a particular case. The remainder of the chapter is organized as follows: in Section 2.2, a review of skin lesion segmentation, features extraction, and engineering and classification approaches will be presented. Then the results of this overview with comparative studies will be presented, followed by the conclusion.
2 An overview of skin lesion segmentation, features engineering, and classification
33
2.2 Overview of related works To help dermatologists, CAD systems are used for an earlier skin cancer diagnosis. It generally consists of three major steps: (i) segmentation, (ii) features extraction, and (iii) classification. The segmentation is considered as the key for the classification success; it directly influences the identification of the lesion area and then the quality of the extracted features and so also the classification result.
2.2.1 Preprocessing The presence of textures in skin lesion images is considered as a major problem and will influence the results of the segmentation. Generally, a preprocessing step is always performed to improve the accuracy of the segmentation process. A median filter is applied to smooth and remove artifacts while preserving the border and keeping pertinent information about the lesion [11, 12]. Anisotropic diffusion filters are also used to smooth the skin lesion and to remove artifacts to get better results. Morphological filters help remove noise and may also be applied to improve segmentation and to encompass an area with a more regular border [13]. In the case of classification of cutaneous lesions, the problem with this type of pretreatment is that it affects the edge of the lesion and thus a very sensitive part is lost that can be used in the classification via extracted features. Several works proposed to use the multiscale decomposition based on the partial differential equations (PDEs), whose purpose is to divide an image into two components: the first one contains only textures and the second one only the shape of the objects [14]. Thus, the segmentation of only the object component is guaranteed to be successful. The results obtained in the experimental section show the great interest of this intermediate operation in identifying the region of interest. Several decomposition models of the PDE have been proposed for image analysis [15]. In our work, we are only concerned with the Aujol model, which consists in minimizing the function: F גðu; vÞ ¼ JðμÞ þ
1 jjf u vjj 2 2ג
(2:1)
where u is the object component, v is the texture component, and גis a regularization parameter. The functions v and u can both be seen as oscillators. The function u will be taken in the bounded variation space. Figure 2.1 presents an example of PDE decomposition of an image randomly selected from the dataset. The image is divided into two components: the first one contains all the texture, artifact, and noise, while the second component contains only the lesion shape. The segmentation will be performed only on the object component,
34
My Abdelouahed Sabri et al.
Figure 2.1: The multiscale decomposition results.
which will guarantee a good identification of the ROI (Region Of Interest) that is the lesion
2.2.2 Segmentation Segmentation is an indispensable step in any image-processing approaches. This is a low-level step and usually comes before the measurement phase. It consists of dividing an image into a set of regions (areas) of interest to distinguish them from the background or from other structures (lesion and skin in our case). These regions must verify the following conditions: – The elements (pixels) of each region should be homogeneous according to the specified criteria. – Their union must make it possible to reconstruct the image without loss. – The intersection of the regions must be an empty set without overlapping between them and each pixel of the image must belong to one and only one region. There are many segmentation approaches that can be divided into two broad categories. The first type are region-based approaches based on detecting similarities in images to group pixels into suitable regions. These methods include region growing and split and merge algorithms. The second type combines approaches based on edges that aim to detect discontinuity in the image to identify the contours delimiting regions in the image.
2.2.2.1 Region-based segmentation approaches – Thresholding: The most basic of the segmentation approaches are those based on thresholding and those are used in several works [16]. The histogram has
2 An overview of skin lesion segmentation, features engineering, and classification
35
been widely used to calculate the threshold(s) to be used to extract the regions of interest from the background. Otsu proposed an algorithm to calculate in an automatic way the threshold to be used to segment an image into two regions [17]. In the case of presence of textures and where the ROI is very small than the size of the background, thresholding is ineffective. – Regions growing approaches: They are based on the verification of the neighborhood of a set of initially chosen pixels. Neighboring pixels will be added to the region if they verify some conditions. This process is repeated on all the pixels of the image to group the homogeneous pixels in appropriate regions. This type of algorithms are very effective in the case of high variations of illumination and color but remain ineffective in the case of images with low contrast as in the case of lesion segmentation and where the background is the human skin. Split and merge and regions fusion are examples of segmentation approaches based on region growing [18, 19]. – Clustering: Approaches derived from data analysis for grouping neighboring pixels with similar properties in adjacent regions. Examples of clustering methods used in image segmentation are K-means, FCM, EM, and so on [18, 20, 35].
2.2.2.2 Edge-based approaches Essentially it is based on the detection of discontinuity of the edge of objects to isolate objects from their neighbors. The discontinuity of the edge can be evaluated either by a maximum value of the first derivative of the intensity of the image or by a zero crossing of the second derivative or by optimizing the energy. – First-order derivative: This identifies whether there is a discontinuity in the intensity between two adjacent pixels. Among them operators used can be cited: Sobel and Prewitt. – Second-order derivative: The edge is identified by zero crossing of the second derivative of the intensity values in the image. Canny and Laplacian of Gaussian are examples of this type of operators used for contour detection. – Energy based: These are approaches used to segment images based on an energy optimization. Active contour models are the most used example of this type [21].
2.2.3 Features extraction The features used in literature are especially based on the ABCD rules used by dermatologists. In this chapter, we will present a maximum number of features used in the literature (almost 113 features), divided into three main categories: textural, color, and asymmetry features [29].
36
My Abdelouahed Sabri et al.
2.2.3.1 Textural features The Gray level co-occurrence matrix, local binary pattern, histogram of oriented gradient, and Gabor filter are the most commonly used as textural features. – Gray level co-occurrence matrix: These are presented as a co-occurrence matrix that characterize the spatial distribution of gray levels in a limited surface. Haralick [22] offers 14 different features, which represent the characteristics of texture statistics in images. It involves the following measures: energy, contrast, correlation, homogeneity, and entropy [32]. – Gabor filter: This is usually applied in edge detection, fingerprint recognition, and texture analysis [23, 24]. The aim of Gabor filters is to split an image into different components corresponding to different orientations and scales. It allows for an optimal joint localization in spatial and frequency space. In our study, we use three scales (4, 8, and 12) and eight orientations (0°, 23°, 45°, 68°, 90°, 113°, 135°, 158°). – Histogram of oriented gradient: This is a descriptor used in object detection and features extraction. It calculates the occurrences of gradient orientation. The image is first fragmented into small-connected regions (cells), and a histogram of gradient directions is calculated for the pixels in each cell. The histogram of oriented gradient descriptor is obtained by concatenation of these histograms. – Local binary patterns: This is a visual descriptor used for classification in computer vision. The features are created by first dividing the image into cells (e.g., 16 × 16) and then compare each pixel in a cell to its eight neighbors. The new value will be 0 if the value of the pixel is greater than the neighbor’s value; otherwise, it will be 1. At the end, the histogram is computed and normalized over the cell. 2.2.3.2 Color features According to dermatology studies, skin lesions containing six different colors such as white, red, light brown, dark brown, blue-gray, and black are considered as a melanoma [30]. The color features used in this work are based on the R, G, and B components and are given as follows:
Black =
∑ (NR. & NR>. & NB >.)>) */NT
(.)
Light brown = ∑ (NR>. & NR. & NG < . & NB >. & NB) */NT Dark brown =
∑ (NR>. & NR . & NG & NB)*/NT
(.) (.)
2 An overview of skin lesion segmentation, features engineering, and classification
37
where NT is the number of pixels in the lesion. NR, NG, and NB are consecutively the number of pixels in red, green, and blue components.
2.2.3.3 Asymmetry feature The skin lesion asymmetry is the most important feature used in the classification. An asymmetry lesion in automatically classified as melanoma. In this chapter, the asymmetry is evaluated by calculating the overlapping degree of the two subregions of the image by an axis of symmetry.
2.2.4 Features engineering Features engineering is considered as the key of a successful classification system. It first consists in extracting information that is able to effectively describe the ROI (113 features will be used). A normalization step is mandatory to make all features value in the same scale and to improve the classification result. Then we will present some algorithms used to select only the features that will be used to correctly classify the lesion as a melanoma or not.
2.2.4.1 Features normalization Features used in the classification are generally in different range of values, and to have a good classification accuracy, features should be normalized. Many normalization approaches can be used, but the Z-score remains the simplest one to use in such a case [5]. The Z-score transformation to make the values of the features matrix X range between 0 and 1 is as follow: Z=
X− μ σ
(2:8)
where σ is the standard deviation of the feature values with the feature average μ.
2.2.4.2 Features selection The number of features used in our study is 113 (textural, color, and asymmetry features). But, all these features are not significant and do not have a great influence on the classification accuracy. The main idea of features selection is to select the least number of features and only the ones that can precisely represent the skin lesion diagnosis. This step is very essential to first eliminate the inappropriate features and to
38
My Abdelouahed Sabri et al.
use only the relevant ones and then to accelerate and improve the classification score. This part will focus on the most commonly used features selection algorithms [31, 33]. – RelieF: The main idea of this algorithm is to calculate the quality (score) of an attribute with regard to how well it can distinguish between neighbors data samples [25]. – Correlation-based feature selection (CFS): This calculates the correlation of between each feature individually to find the features that correlate well with the output class [26]. – Recursive features elimination (RFE): This algorithm initially includes all the features and then iteratively ignores the ones that do not contribute in discriminating the predicted class. The RFE can be seen as an optimization process to find the best features subset [27]. – χ2 method: This features selection approach is based on the χ2 statistic to test whether a feature is important or not. The selected features are those with a high score obtained using the χ2 statistic.
2.2.5 Classification The classification is the last step that classifies whether each skin lesion image is melanoma or nonmelanoma based on their extracted, normalized, and selected features. In literature, many classification approaches of skin cancer have been proposed [5, 34]. This chapter presents a comparison study between the most and best classifiers used in skin lesion classification: decision trees, logistic regression, K-nearest neighbors (KNN), and support vector machine (SVM). The decision trees, SVM, and KNN are used in different forms: – linear, quadratic, and Gaussian kernels for the SVM; – fine, medium, cubic, coarse, cosine, and weighted distances for KNN; – simple, medium, and complex models for decision trees.
2.3 Experimental results In the experimentation part, we will implement and evaluate each step in the skin lesion classification. We will start by evaluating the most known segmentation algorithms to demonstrate their weakness without preprocessing. After that, features engendering will be conducted to retain only the most relevant features from 113 initially extracted features. At the end, a comparison study will be conducted between the most known classifiers from the literature.
2 An overview of skin lesion segmentation, features engineering, and classification
39
2.3.1 Dataset To evaluate and validate all the parts in the classification of skin lesion, the experimental image database used is the one used in the skin lesion detection and analysis ISIC challenge [28]. This database contains 2,150 images that are initially classified into three classes and a segmentation ground of truth (GT) for each image: – melanoma: malignant tumor; – nevus: benign tumor; – seborrheic keratosis: benign tumor. In this study, the database will be categorized into two main parts: melanoma and nonmelanoma (combining the nevus and seborrheic type). This database is divided into training dataset containing 374 melanoma images and 1,626 nonmelanoma images and 150 images for validation dataset. Figure 2.2 presents eight images selected randomly from the ISIC dataset, four are melanoma and four are nonmelanoma.
Figure 2.2: Eight images randomly selected from the used dataset.
2.3.2 Lesion segmentation The segmentation of each of the ISIC dataset images is conducted for each of the well-known algorithms: Otsu, split and merge, active contour, FCM, K-means, and EM. The ground of truth will be used to evaluate these algorithms. The sensitivity and specificity are used as evaluation metrics. The segmentation procedure is done with and without preprocessing.
40
My Abdelouahed Sabri et al.
2.3.2.1 Without preprocessing Figure 2.3 presents the segmentation results without any preprocessing. It is clear that the lesion is not segmented correctly, which will affect the extracted features and so the classification accuracy. This is due to the presence of texture, artifacts, reflections, and skin lines that annoy the segmentation result. The sensitivity and specificity values presented in Table 2.1 show that segmentation without preprocessing failed to correctly detect the lesion which will reduce the rate of classification.
2.3.2.2 With preprocessing To improve the segmentation result and increase the accuracy of the lesion identification, a preprocessing is always performed. Our preprocessing will be by multiscale decomposition based on the PDEs. The main idea is to divide the initial image into two separate components: the first with only textures and the second one with the objects shape. Therefore, to guarantee the success of the segmentation and so the lesion identification, only the object component will be segmented. Figure 2.4 presents the results of the segmentation by six algorithms applied on only the object component. The sensitivity and specificity are listed in Table 2.2 to evaluate the incidence of the PDE preprocessing on the segmentation. In comparison with the segmentation results without preprocessing, the multiscale decomposition significantly improves the segmentation results, especially by using the K-means algorithm.
2.3.3 Features engineering Once the segmentation is correctly done, the next step is to extract the features to normalize them and then perform a features selection to select only the relevant ones to use for the classification.
2.3.3.1 Features extraction As discussed previously in Section 2.3, 113 features will be initially used. These features are categorized into three groups: textural, color, and asymmetry feature. Ninety-five features were extracted by applying Gabor filter on the texture component [5], 10 features were extracted from the texture component [5], and seven color features from lesion in the original image as discussed in Section 2.3.2 and one asymmetry feature.
Original image
GT
Otsu
Split and merge Active contour
FCM
K-means
EM
2 An overview of skin lesion segmentation, features engineering, and classification
Figure 2.3: Segmentation results using the well-known algorithms from literature.
41
42
My Abdelouahed Sabri et al.
Table 2.1: Average of sensitivity and specificity measure of the segmentation of the whole dataset. Split and merge
Active contour
FCM
K-means
EM
Sensitivity
.
.
.
.
.
.
Specificity
.
.
.
.
.
.
Original image
GT
Otsu
Split and merge Active contour
FCM
K-means
EM
Otsu
Figure 2.4: Segmentation results of the object component after decomposition by the PDE.
2 An overview of skin lesion segmentation, features engineering, and classification
43
Table 2.2: Average of the sensitivity and the specificity measure of the segmentation of the whole dataset using preprocessing. Otsu
Split and merge
Active contour
FCM
K-means
EM
Sensitivity
.
.
.
.
.
.
Specificity
.
.
.
.
.
.
2.3.3.2 Features normalization All the features are combined into a 113 × 2,150 matrix where each row contains the features values for each image from the dataset. The features values are not in the same range and we normalize these using the Z-score that is in the range of 0–1.
2.3.3.3 Features selection The initial number of features used in our study was 113. All of these features are not significant and will not influence the accuracy of the classification. Features selection is very important to eliminate the inappropriate features and to select only the relevant ones that will accelerate and improve the classification score. Table 2.3 lists a number of features selected by the features selection approaches mentioned in Section 2.4.1 (χ2, RFE, RelieF, and CFS) and also presents the classification accuracy based on the corresponding selected features using the logistics regression classifier. We can conclude that only five features selected by RelieF (number of dark brown color in lesion, number of white color in lesion, mean of Gabor real part with scale = 12 and orientation = 68, mean of Gabor real part with scale = 8 and orientation = 158, and root mean square) are relevant and can be used instead of the initial 113 features of the classification study. These five features will be used in the classification instead of the 113 initial features. This will improve the classification accuracy and the computation time.
2.3.4 Classification The last step is to use features extracted in the last section to classify each image in the dataset into melanoma and nonmelanoma. The dataset is divided into two parts: training and test. Training dataset, containing 374 melanoma images and 1,626 nonmelanoma images, is used to train the classifier and to fit the model, and the test dataset, containing 150 images, is used to evaluate the classifier accuracy.
44
My Abdelouahed Sabri et al.
Table 2.3: Features selection results and their classification accuracy. FS approaches
Features selected
Classification accuracy (%)
Number
Name
χ
Three features
1. 2. 3.
Number of colors in the lesion Inverse difference moment (IDM) Mean of Gabor real part with scale = 12 and orientation = 68°
.
RFE
Four features
1.
Variance of Gabor real part with scale = 8 and orientation = 135° Mean of Gabor real part with scale = 12 and orientation = 68° Number of colors in the lesion Contrast
.
Number of dark brown color in lesion Number of white color in lesion Mean of Gabor real part with scale = 12 and orientation = 68° Mean of Gabor real part with scale = 8 and orientation = 158° Root mean square (RMS)
.
Mean of Gabor real part with scale = 12 and orientation = 68° Mean of Gabor real part with scale = 8 and orientation = 135° Number of colors in the lesion Contrast
.
2. 3. 4. RelieF
Five features
1. 2. 3. 4. 5.
CFS
Four features
1. 2. 3. 4.
Several classifiers have been proposed in literature [5]; the aim here is to present a comparison study between the most and the best ones used in skin lesion classification: decision trees, logistic regression, KNN, and support vector machine (SVM). The sensitivity (true positive rate), specificity (true negative rate), and accuracy measures are used to evaluate and to compare each of the classifiers: Sensitivity =
TP TP + FN
(2:9)
Specificity =
TN FP + TN
(2:10)
TP + TN TP + TN + FP + FN
(2:11)
Accuracy =
2 An overview of skin lesion segmentation, features engineering, and classification
45
where TP (true positives) is the number of melanoma correctly classified, TN (true negatives) is the number of nonmelanoma correctly classified, FP (false positives) is the number of melanoma classified as nonmelanoma, and FN (false negatives) is the number of nonmelanoma classified as melanoma. Table 2.4 presents the results of classification sensitivity, specificity, and accuracy of the different classifiers with and without preprocessing, and it clearly shows the efficiency of the multiscale decomposition on the classification results. Figures 2.5–2.7 present the sensitivity, specificity, and accuracy classification comparison: – Sensitivity measures the true positive classification that gives us an idea about the percentage of the melanoma classification rate. The SVM with quadratic kernel is the best because it classify correctly 95.79% melanoma from the dataset. – Specificity measures the true negative classification that gives us an idea about the percentage of the nonmelanoma classification rate. The KNN with cubic distance, with 91.90% of specificity, is the best nonmelanoma classifier. – Accuracy measures the classification rate of both melanoma and nonmelanoma. With 91.33%, the SVM with quadratic kernel is the best among the 10 studied classifiers. The sensitivity and specificity measure the percentage of TP and TN rates. To have a detailed classification rate, Table 2.5 presents the confusion matrix that shows, for each of classifier, the number of melanoma correctly classified as melanoma, melanoma misclassified as nonmelanoma, nonmelanoma misclassified as melanoma, and nonmelanoma correctly classified as nonmelanoma. The SVM with quadratic kernel is the best, with a classification accuracy equal to 91.33% with only 13 misclassified lesions from 150, followed by KNN with cubic distance with a classification accuracy equal to 87.33% with 19 misclassified lesions from 150, and at the end the simple decision trees with a classification accuracy equal to 84.00% with 24 misclassified lesions from 150. At the end of this study, we can clearly conclude that the classifier SVM with quadratic kennel remains the best one in skin lesion classification; the performance is much better when a multiscale preprocessing is used. To evaluate the proposed approach in comparison with recent approaches from literature, Table 2.6 presents the classification accuracy in comparison with approaches proposed by Dalila et al. [34] and Waheed, applied on the same database. From results presented in Table 2.6, we can conclude that our proposed approach gives the best result and exceeds all other approaches with a 91.33% classification accuracy rate.
. . .
. . .
.
.
.
.
.
.
Specificity
Accuracy
Sensitivity
Specificity
Accuracy
Simple
Sensitivity
Logistic regression
.
.
.
.
.
.
Medium
Linear
.
.
.
.
.
.
.
.
.
With preprocessing
.
.
.
.
.
.
.
.
.
Quadratic
.
.
.
.
.
.
Gaussian
Support vector machine (SVM)
Without preprocessing
Complex
Decision trees (DT)
Table 2.4: Sensitivity, specificity, and accuracy measures with and without preprocessing.
.
.
.
.
.
.
Fine
.
.
.
.
.
.
Medium
.
.
.
.
.
.
Cubic
K-nearest neighbors (KNN)
46 My Abdelouahed Sabri et al.
2 An overview of skin lesion segmentation, features engineering, and classification
47
Classification accuracy (%) 94 92 90 88 86 84 82 80 78 76
r c e e e e n n ex tic ed bi rs pl ia in um um io ea fin pl ra ht di di in ss cu ss os oa im l d g N m e e i c e u c s a N m m M co N gr N KN we Ga qu DT KN re SV DT N KN DT KN NN M M N ic K V V t K S S s gi Lo Figure 2.5: Comparison of the classification accuracy of the 10 classifiers with preprocessing.
Classification specificity (%) 100 90 80 70 60 50 40 30 20 10 0
r c e e e e n d ic m ex on bi rs pl ia in um ea te at iu fin pl si di ss cu os oa im lin dr gh N m ed e i c es u c s a N r N o m m M c N g N K we Ga qu DT KN re SV DT N KN DT KN NN M M N ic K V V t K S S s gi Lo Figure 2.6: Comparison of the classification specificity of the 10 classifiers.
2.4 Discussion and conclusion Skin cancer is the most deadly cancers in the world but can be cured if diagnosed at an earlier stage. It can be classified into two main categories: melanoma (malignant lesion) and nonmelanoma (benign lesion). Due their visual aspects, image processing has been used to help dermatologist in their diagnosis in detecting and classifying skin lesions. The process can be divided into three main stages: lesion identification based on image segmentation, features extraction, and lesion
48
My Abdelouahed Sabri et al.
Classification sensitivity (%) 100 95 90 85 80 75 70
e ic m m ar ex se tic an ne ed ne on pl iu cub iu pl si N fi si si ar ra ne ht m d d i s s o o i l d g m e e i c e u N c s a N m m M co N gr we Ga K qu DT NN KN N re SV DT N KN DT K N M M N it c K K SV SV s gi Lo Figure 2.7: Comparison of the classification sensitivity of the 10 classifiers.
Table 2.5: Confusion matrix of the best classifiers.
Melanoma Nonmelanoma
Melanoma
Nonmelanoma
Melanoma
Nonmelanoma
Melanoma
Nonmelanoma
SVM quadratic
KNN cubic
Decision trees simple
Table 2.6: Comparison of the proposed approach with recent approaches from literature.
Classification accuracy (%)
Dalila ()
Waheed ()
Proposed approach
.
.
.
classification. Segmentation results directly impact the features extracted, which in turn influence the classification results. Input images contain artifacts, reflections, such as hair, and skin lines that might be annoying make the segmentation more difficult. Multiscale preprocessing, which decompose skin lesion images into two components separately containing textures and objects shape, is used to improve the segmentation accuracy and enhance the extracted features quality. The Kmeans applied on the object component gives us the best segmentation result. One hundred and thirteen features categorized into three types are initially extracted; 95 features were extracted by applying Gabor filter on the texture component (three
2 An overview of skin lesion segmentation, features engineering, and classification
49
scales and eight orientations), 10 features were extracted from texture component, 7 color features were from lesion in original image, and 1 feature for the asymmetry. Not all the 113 features are significant and not all will influence the classification accuracy. Features selection is very important to keep only the relevant features, to accelerate, and to improve the classification score. Four features selection approaches have been tested and the RelieF algorithm shows that only four features are relevant and can be used instead of the 113 extracted features in the classification. A comparison study was conducted between the most and the best ones used in skin lesion classification: decision trees, logistic regression, KNN, and SVM have been conducted to identify the best for skin lesion classification. The ISIC challenge dataset, which contains 2,150 images, is used to evaluate the classification rate. Sensitivity, specificity, and accuracy measures are used to evaluate the classification algorithms. The SVM with quadratic kernel, with 91.33% as the classification accuracy, is the best one in comparison with logistic regression, decision trees, and KNN classifiers. The classification accuracy varies from one classifier to other but remains insufficient. Features used are the key in such a system to enhance the classification rate. Therefore, in our future works, we will concentrate more on searching for relevant features that can effectively represent the skin lesion.
References [1] [2]
[3] [4] [5]
[6]
[7]
[8]
William Higgins Ii, Md, Mbe, H., and David Leffell, Md, “Melanoma at its most curable”, The Skin Cancer Foundation Journal. 2017 Fernández, H. C.- and López, O., “An Intelligent System for the Diagnosis of Skin Cancer on Digital Images taken with Dermoscopy,” Acta Polytechnica Hungarica, vol. 14, no. 3, pp. 169–185, 2017. Korotkov, K. and Garcia, R., “Computerized analysis of pigmented skin lesions: A review,” Artif. Intell. Med., vol. 56, no. 2, pp. 69–90, 2012. Smith, L. and MacNeil, S., “State of the art in non-invasive imaging of cutaneous melanoma,” Ski. Res. Technol., vol. 17, no. 3, pp. 257–269, 2011. Filali, Y., Ennouni, A., Sabri, M. A., and Aarab, A., “A study of lesion skin segmentation, features selection and classification approaches,” International Conference on Intelligent Systems and Computer Vision (ISCV). Fez, Morocco, pp. 1–4. 2–4 April, 2018. Filali, Y., Ennouni, A., Sabri, M. A., and Aarab, A., “Multiscale approach for skin lesion analysis and classification”, International Conference on Advanced Technologies for Signal and Image Processing (ATSIP). 22–24 May 2017. Fez, Morocco, 2017. Filali, Y., Sabri, M. A., and Aarab, A., “An improved approach for skin lesion analysis based on multiscale decomposition,” 2017 International Conference on Electrical and Information Technologies (ICEIT). 15–18 Nov., 2017. Leguizam, D. N., “Computerized Diagnosis of Melanocytic Lesions Based on the ABCD Method,” XLI Latin American Computing Conference (CLEI) Computerized, 2015.
50
[9]
[10] [11]
[12]
[13]
[14] [15]
[16]
[17] [18]
[19] [20]
[21]
[22]
[23] [24]
My Abdelouahed Sabri et al.
M. J. M. Vasconcelos and L. Rosado, “A New Risk Assessment Methodology for Dermoscopic Skin Lesion Images based on the independent analysis of each ABCD rule criterion,” IEEE Instrumentation and Measurement, 2015. Skin Cancer Foundation, https://www.skincancer.org/ Norton, K., Iyatomi, H., Celebi, M. E., Schaefer, G., Tanaka, M., and Ogawa, K., “Development of a Novel Border Detection Method for Melanocytic and Non-Melanocytic Dermoscopy Images”, Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society pp. 5403–5406, 2010. Abbas, Qaisar et al. “A perceptually oriented method for contrast enhancement and segmentation of dermoscopy images.” Skin research and technology: official journal of International Society for Bioengineering and the Skin (ISBS) [and] International Society for Digital Imaging of Skin (ISDIS) [and] International Society for Skin Imaging, vol. 19, no. 1, pp. e490-7, 2013. Celebi ME, Aslandogan YA, Stoecker WV, Iyatomi H, Oka H, Chen X. Unsupervised border detection in dermoscopy images. Skin Research and Technology: Official Journal of International Society for Bioengineering and the Skin (ISBS) [and] International Society for Digital Imaging of Skin (ISDIS) [and] International Society for Skin Imaging (ISSI), vol. 13, no. 4, pp. 454–462, 2007. Rudin, L. I., Osher, S., and Fatemi, E., “Nonlinear total variation based noise removal algorithms,” Phys. D Nonlinear Phenom., vol. 60, no. 1–4, pp. 259–268, 1992. Aujol, J. F., Aubert, G., Blanc-Féraud, L., and Chambolle, A., “Image decomposition into a bounded variation component and an oscillating component,” J. Math. Imaging Vis., vol. 22, no. 1, pp. 71–88, 2005. Garnavi, R., Aldeen, M., Celebi, M. E., Varigos, G., Finch, S., Bhuiyan, A., and Dolianitis, C., “Automatic segmentation of dermoscopy images using histogram thresholding on optimal color channels,” Computerized Med. Imaging Graph., vol. 35, no. January, pp. 105–115, 2011. Nobuyuki Otsu, « A threshold selection method from gray-level histograms », IEEE Trans. Sys., Man., Cyber., vol. 9, pp. 62–66, 1979. Mohamed, A. I., Ali, M. M., Nusrat, K., Rahebi, J., and Sayiner, A., “Melanoma Skin Cancer Segmentation with Image Region Growing Based on Fuzzy Clustering Mean,” International Journal of Engineering Innovations and Research, vol. 6, no. 2, pp. 91–95, 2017 Chopra and B. R. Dandu, “Image Segmentation Using Active Contour Model,” Int. J. Comput. Eng. Res., vol. 2, no. 3, pp. 819–822, 2012. Sabri, M. A., Ennouni, A., and Aarab, A., “Automatic estimation of clusters number for Kmeans,” 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt), pp. 450–454, DOI: 10.1109/CIST.2016.7805089. Electronic ISSN: 2327–1884. 24–26 Oct. 2016. Garnavi, R., Aldeen, M., Celebi, M. E., Varigos, G., Finch, S., A. Bhuiyan, and C. Dolianitis, “Automatic segmentation of dermoscopy images using histogram thresholding on optimal color channels,” Conputerized Med. Imaging Graph., vol. 35, no. January, pp. 105–115, 2011. R. M. Haralick, K. Shanmugam and I. Dinstein,“Textural Features for Image Classification,” in IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-3, no. 6, pp. 610–621, Nov. 1973.doi: 10.1109/TSMC.1973.4309314. U. T. a. Rahman, “Gabor filters and gray level co occurrence matrices in texture classification,” Citeseer, 2007 Ruiz L., a Fdez-Sarría, and Recio, J., “Texture feature extraction for classification of remote sensing data using wavelet decomposition: a comparative study,” Int. Arch. Photogramm. Remote Sens., vol. XXXV, no. 1, pp. 1682–1750, 2004.
2 An overview of skin lesion segmentation, features engineering, and classification
51
[25] Kira, Kenji and Rendell, Larry. The Feature Selection Problem: Traditional Methods and a New Algorithm. AAAI-92 Proceedings, 1992. [26] Michalak, K., Kwasnicka, H., “Correlation based feature selection method”, International Journal of Bio-Inspired Computation archive, vol. 2, no. 5, October, pp. 319–332, 2010. [27] Guyon, I., Weston, J., Barnhill, S., and Vapnik, V., “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, Volume 46, Issue 1–3, pp 389–422, DOI:10.1023/A:1012487302797, 2002 [28] Gutman, David; Codella, Noel C. F.; Celebi, Emre; Helba, Brian; Marchetti, Michael; Mishra, Nabin; Halpern, Allan. “Skin Lesion Analysis toward Melanoma Detection: A Challenge at the International Symposium on Biomedical Imaging (ISBI) 2016, hosted by the International Skin Imaging Collaboration (ISIC)”. eprint arXiv:1605.01397. 2016. [29] Barata, C. F., Celebi, E. M. and Marques, J., “A Survey of Feature Extraction in Dermoscopy Image Analysis of Skin Cancer”, IEEE J. Biomed. Heal. Informatics, vol. PP, no. 8, p. 1, 2018. [30] Nezhadian, F. K. and Rashidi, S., “Melanoma skin cancer detection using color and new texture features,” 2017 Artif. Intell. Signal Process. Conf., pp. 1–5, 2017. [31] Chandrashekar, G. and Sahin, F., “A survey on feature selection methods,” Comput. Electr. Eng., vol. 40, no. 1, pp. 16–28, 2014. [32] Mohanaiah, P., Sathyanarayana, P., and Gurukumar, L., “Image Texture Feature Extraction Using GLCM Approach,” Int. J. Sci. Res. Publ., vol. 3, no. 5, pp. 1–5, 2013. [33] Oliveira, R. B., “features selection for the classification of skin lesions from images,” 6th European Conference on Computational Fluid Dynamics (ECFD VI) July 20–25, 2014, Barcelona, Spain. [34] Dreiseitl, S., Ohno-Machado, L., Kittler, H., Vinterbo, S., Billhardt, H., and Binder, M., “A comparison of machine learning methods for the diagnosis of pigmented skin lesions,” J. Biomed. Inform., vol. 34, no. 1, pp. 28–36, 2001. [35] Mathew, S. and Sathyakala, D., “Segmentation of skin lesions and classification by neural network,” International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE), vol. 4, no. 2, February, 2015.
Banerjee Ishita, P. Madhumathy and N. Kavitha
3 Brain tumor image segmentation and classification using SVM, CLAHE, and ARKFCM Abstract: With the modern lifestyle and environmental changes, many life-threatening diseases are stealing the normal livelihood of human society. One of these diseases is brain tumor, which if not detected and treated on time may even cause loss of life. For any disease to be treated on time, early detection is the key factor. A tumor is generally an abnormal or a malignant growth of tissues in any organ of human body and does not contribute toward any physiological functionality. Magnetic resonance imaging (MRI) is one of the most widely used scanning techniques where magnetic and radio waves are used to create a detailed study of the bones and tissues of the targeted area for detecting any abnormal growth. To detect the MRI images efficiently and to locate the tumor position accurately, a support vector machine (SVM) technique is used. The imaging classification totally depends on the quality of the image in terms of contrast, illumination, blurring, and so on. Thus, an equalization technique called contrast limited adaptive histogram equalization is used for improving the image contrast. Along with this pixel adjustment techniques are also used for quality enhancement of the image. Tumor image segmentations are done using adaptively regularized kernel fuzzy C-means clustering algorithm, threshold algorithm, and morphological operations. These techniques rely on the image gray level intensity. This segmentation procedure helps to extract the information about the abnormal growth of the tissues related to some physical parameters such as area, perimeter, and the overall size of the tumor. The extracted parameters are taken into account to dictate whether the tumor condition is normal or malignant. High level of accuracy is required to classify and segment the image that could give an accurate result to segregate a normal tumor and a malignant tumor. The effective proposed method decreases the variance of the prediction error and increases the flexibility to detect the nature of the tumor, which extend helping hands to the medical practitioners. The majority of patients affected from brain tumor throughout the world do not come from affluent financial background. The proposed technique for detecting and classifying brain tumor cells proves to be affordable and efficient. Keywords: image segmentation, image classification, support vector machine, MRI images.
https://doi.org/10.1515/9783110621105-003
54
Banerjee Ishita, P. Madhumathy and N. Kavitha
3.1 Introduction Brain is the most important and delicate part of the central nervous system. Each cell of human body is supposed to do specific functionalities. When the cells lose their control and cell division happens in an uncontrolled manner, then tumor may be formed [1]. Like any other tumor in the body, brain tumor is also an abnormal or irregular growth of tissues inside the brain, which could be either primary brain tumor or metastatic brain tumor [2]. Primary brain tumor is originated inside the brain itself where as the metastatic tumor originate or expand from other body parts. Tumors are malignant or cancerous and noncancerous or benign. Malignant tumors are more life threatening than benign tumors, but even benign tumors are harmful in many cases due to its presence in a delicate part of the nervous system of human body. According to different grades of tumors, the severity of the condition are judged [3, 4]. Grade I tumors are normal in nature unless the position of the tumor makes serious blocking in other system functionalities. Grade II are slightly abnormal and are curable. Grade III is of malignant type, whereas grade IV is the most severe type of malignant tumor. Therefore, detecting the tumor at an early stage opens many possibilities to treat it on time and most of the times cure it completely if in initial grades. Despite improvement in technology in the medical field, detection of the stage of tumor is a difficult, time consuming, and critical task [5]. A patient diagnosed with the symptoms of brain tumor undergoes magnetic resonance imaging (MRI) and biopsy according to the requirement [6]. To detect the grade of the tumor through traditional procedures of MRI scan report and biopsy takes months to come to a conclusion and many a times there is a lag in the accuracy level [7]. Thus in collaboration to improvised computer-aided techniques such as machine learning, deep learning, hybrid intelligent techniques along with neural network, genetic algorithms, and so on give better accuracy in grade detection [8–12]. Therefore to identify and classify brain tumor, computer-aided diagnostic (CAD) is used [13, 14]. For the CAD to be implemented efficiently, MRI is the most important tool since it produces the images of the soft tissues. From the medical images received by MRI, the effective growth of the tumor, that is, size and shape and the degree of abnormality is found. To read or interpret the image received form MRI, image enhancement is the primary requisition. Image enhancement techniques improve the perception of information or increase the interpretability of the images by modifying the attributes of the image without spoiling its information content. In the proposed system used image enhancement and pixel adjustment techniques are used to preserve the image characteristics and increase the contrast of the image. Two types of image enhancements are popular. In spatial domain the image is manipulated using the pixel values and in frequency domain method the image is converted to frequency domain by Fourier transform, applied enhancement techniques, and then again brought back to original form by inverse Fourier transform.
3 Brain tumor image segmentation and classification using SVM, CLAHE, and ARKFCM
55
The digitizer is used to convert the image so that it is used for processing by a digital computer. Image-processing techniques take place as image acquisition, storage, preprocessing, segmentation, and so on, which finally represent the resulting image that is used for the application specific purposes. Image sensor and digitizer digitize the image to be analyzed. After preprocessing the image is fed to different processors. By means of preprocessing the image quality is enhanced, noises are removed, and different regions are isolated. Then segmentation takes place where image is segmented into objects and produces pixel values of the image or the boundary of the image. Representation makes this raw pixel data suitable to be processed by subsequent steps. By description the features of the image are extracted. Image recognition deals with attaching a label to object-oriented information provided by the application requirement. By interpretation the meaning of the object is recognized. Knowledge base is responsible for communication between the modules and handles the working and operations of each module. Again the steps entirely depend on the applications specified. The choice of which image enhancement technique to be applied completely depends on which specific task is to be performed. Power law transformation, log transformation, and creating image negative are some techniques used for adjusting the contrast of the image, but they all are application specific. Contrast limited adaptive histogram equalization (CLAHE) is a techniques that is widely used in the field of biomedical engineering for the purpose of image processing. In this process, image splitting and equalization take place. This process enhances the image contrast. Next image segmentation is done that helps the image to be partitioned or segmented into smaller parts either depending on the similar attribute values or depending on the application [15]. Here grayscale image is converted in bit pattern, which is more readable. Adaptively regularized kernel fuzzy C-means (ARKFCM) algorithm processes the pixel values, and binary conversion is performed. Feature extraction (area, perimeter, etc.) is done from the segmented image. From the extracted images image classification is done [16].
3.2 Intelligent prediction and decision support systems If the image illumination is uneven or the image has got incorrect focus and the contrast quality is poor, there is a loss of important information that is important for analyzing the tumor condition. This may result in incorrect diagnosis since segmentation and CAD require effective pixel contrast. Thus to overcome this problem of quality of image contrast, illumination or focus need to overcome. The objective of the proposed work is (i) to enhance the images received from MRI by preprocessing technique or CLAHE and to adjust pixel intensity; (ii) to use ARKFCM clustering
56
Banerjee Ishita, P. Madhumathy and N. Kavitha
algorithm for image segmentation; (iii) to adjust only the tumor image and to discard the unwanted objects from the image; and (iv) to predict and decide from the image whether it is benign or malignant.
3.3 Literature survey Ahirwar addressed the issues in medical image segmentation for classifying regions of brain MRI [17]. In this chapter the statistical test data of the images obtained from MRI are collected and are used for finding the categories of brain tumor using different algorithms. Intensity information or edges help to find the information from the connected parts of the image. The key drawback of this method lies with the manual interaction to find the seed points. Here different techniques for the classification and segmentation of images are investigated and applied. There are several methods of segmentation of image to visualize the brain tissue. The brain image mainly has three parts: White matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). When tumor that is not normal is observed in the image, then the fourth region comes into picture. Self-organizing map and neuro fuzzy schemes are applied here to obtain the information from the MRI image by segmentation and characterization. These techniques are useful to study the different regions of the brain image such as WM, GM, CSF, and tumor regions [18]. Here they have also classified the axial view to compare to Keith’s database. The sensitivity of the proposed work is decided by several relevant parameters. The axial view is obtained by a scheme discussed in a paper [18]. It classifies the regions of brain tumor images and stores the results in tables. These results are later compared with the results available on the web. A confusion matrix is created with column values of predicted class and row values of actual class. Keith’s database projects a total of 37 malignant cases, whereas this scheme find 29 true positive images, that is, images that contain tumor region, where eight are true negative, that is, with no tumors present in the images. Twelve images as analyzed by Keith’s database are normal axial view images. But the later scheme ensures that there are 11 false-negative cases and one false-positive case. Richika et al. used support vector machine (SVM), K-means, and PCA algorithm to detect the class of the brain tumor [19]. Some correlation on the pixel value of the MRI image is done by principal component analysis algorithms with K-means to define the class of the tumor. This method is time consuming and results lag accuracy to some extent. Here also the image is the main key to find the volume of the brain tumor by detecting brain clustering and segmentation-based approaches. The volume of the tumor plays a major role in detecting the class or stage of the malignancy. The problems with interoperator variance calculation and partial volume effects are overcome to achieve a good result for extraction, clustering, and
3 Brain tumor image segmentation and classification using SVM, CLAHE, and ARKFCM
57
segmentation. The tumor region is viewed as ellipse shaped and the mean of volumes of the area in the image is found. The result gives 96% of the tumor growth. A novel scheme for segmenting and classifying the image has been described here to diagnose the tumor more accurately. The energy, correlation, contrast, and homogeneity of the image result in more accurate results for feature extraction methods and many types of adaptive models for the detection of brain tumor are designed. Generally the characteristic features are used to locate and measure the large tumors. To find the appearance of small tumor, many critical features are to be considered minutely. Narkbuakaew et al. proposed m odified K-means clustering algorithms to segment the image along with morphological operations [20]. The classification method with K-means clustering applied on CT image and brain MRI give the information with proper thresholding. The K-value is limited to three if any gray level intensity exceeds that. Here two different approaches are proposed to segment multiorgans in the scanned images. First, the K-means clustering is modified to reach a hierarchical concept. The clustered index and correlation of several types of tissues help to create this hierarchical model. Secondly, the segmentation is done on multiple organ images with simple techniques. The tissue types are studied; few morphological operations are performed and the vital information of the anatomical structure is found. The four-dimensional CT (4DCT) image is obtained for liver. Here many clustering indexes are created such as five-region-based clustering and four-tissue-based clustering indexes. The drawback of K-means or Fuzzy C-means alone had some losing regions, which could be improvised using this scheme. The multiorgan segmentation is done on different organs such as liver, spleens, and kidney. These image regions were compared manually by radiologists, which could hardly give an accuracy level of 87.4%. Here the idea of the researcher was to introduce some modified K-means clustering for enhancing the accuracy level of clustering segmentation results. The modified approach was applied on 4DCT liver images and proved to be more effective than K-means or Fuzzy C-means in finding the clustering indexes for several tissue types. Since the K-means and Fuzzy C-means depend upon random marking of clustering indexes for different merged tissues, the regions to be distinguished cannot be identified if it is inside the ribcage or if they have different intensities in the overlapping regions. The modified approach gives a region correction method that could overcome the mentioned drawbacks of K-means or Fuzzy C-means method. The target regions are segmented using simple template-based regions according to the shape and location of the tumor. The liver regions are studied, and the segmented region and manual process are compared thoroughly to judge the accuracy level of the method. The proposed method was performed on 3DCT images and gave an accuracy result of 87.4%. In the presence of high convex region in the image, the proposed method efficiency fails, same is the case for high-concave areas. Lugina et al. proposed a similar work for brain tumor detection and classification of MRI images with region growing, fuzzy symmetric measure, and artificial neural
58
Banerjee Ishita, P. Madhumathy and N. Kavitha
network backpropagation (ANN-BP) [21]. The paper proposed to find the seed point in the region growing by modifying some existing algorithms. Converging square with split and merge, performs automatic selection of seed points after modification by this proposed work. Here a study is conducted on how threshold region growing value affects the results of segmentation of the image. The brain tumor image is classified as normal or malignant based on Finite state machine ( FSM) threshold values. This classification is done based on ANN-BP. In the medical field, the diagnosis of tumors by processing MRI digital image is an emerging field. The unwanted cell divisions of brain tumors are studied by the radiologists using CAD by featuring the two major processes: image segmentation and feature extraction of the segmented image. Region growing is one of the commonly known segmentation method, which is based on the pixel values. Fuzzy symmetric method or first- and second-order statistics help to extract the features for classifying the image. Fuzzy symmetric method is used to access the symmetry if the brain tumor image. To solve the nonlinear complex problems for classification, ANN-BP method is implemented. The drawback of the process is that the optimal parameter values cannot be clearly determined. No practical procedure is carried out to find the optimal parameter values. Deng clarified some of the image-processing techniques by using unsharp masking algorithm [22]. This work mainly focuses on increasing the image sharpness and contrast by using some sharpness enhancement tools. Unsharp masking is one of such tools that is used to increase the sharpness of an image for extracting maximum information from it. Here the proposed algorithm uses the exploratory data model to address few issues that arise while processing an image for data extraction. The model and the residual components are treated individually to increase the contrast and to finally enhance the sharpness of the image. Here an edge-preserving filter is used that minimizes the halo effect. Tangent operations are performed to find the targets that were considered out of range. Log ratio is also considered for this purpose. The general linear systems and Bregman divergence have a connection that is established by log ratio operation. For system development, the geometrical analysis proved to be crucial. The tangent operation helps to find such connections with Bregman divergence that give a clear insight of the system geometry. The proposed algorithm increases the contrast and sharpness of the image as proved by the experimental results. For this purpose, only changing or adjusting two parameters is sufficient for practical use. This work in the field of image processing would help in many applications including in the field of medical imaging. Selvakumar et al. proposed a method that calculates the area of the brain tumor using K-mean clustering and fuzzy C-mean algorithms [23]. Similar work has also been carried out in this chapter [24]. Here the researchers have denoted the tumors as primary and secondary according to their level of malignancy. The primary tumor is located only at its place of origin, whereas the secondary tumor spreads to other places of the body and grow at its own rate. The cerebral spine fluid is majorly affected by the tumor and this results in several complications
3 Brain tumor image segmentation and classification using SVM, CLAHE, and ARKFCM
59
including strokes. The detection of the tumor on time is the most important factor. With early detection of the tumor, there is a higher probability of successful treatment and cure. The proposed procedure enlightens on few factors of detecting malignant tumors. A three-dimensional image of the brain is taken and analyzed, and a 3D analyzer technology is used to detect the mass tumor; the shape of the tumor can be detected more accurately. The image of the brain obtained initially gives information only with regard to the arrangement of pixel values in columns and rows. This pixel format of an image that contains information is processed by several image-processing techniques to extract the features or set of required parameters for obtaining the information from the image. One procedure of this kind of image processing takes place by considering the image as a 2D signal then performs the image processing tasks onto it. The elements of a digital image comprise of particular values in the respective image. Through image processing, the improvisation of the picture quality takes place. Since understanding the pixel values is difficult, image processing makes it more adaptable for machine interpretation and finally for human understanding. Images have brightness from 0 to 255, which represents colors from black to white. Consider an image as a combination of many dots, and then assume that each dot has a particular brightness associated with it. The job of a radiologist is to read the information from such images to diagnose the disease properly. Since this kind of manual reading depends only on experience and knowledge of the radiologist, many a times it might not give proper and accurate reading. It all depends on when and how the radiologist analyses the image, with how much expertise and wisdom, with how much experience of pattern recognition, and so on. Therefore with the advent of technology, CAD systems are the helping hands for the radiologists to confirm their opinion about the information obtained from the image. The most active research area for this kind of CAD systems requires gentle and sensible use of machine learning and different pattern recognition techniques. Harati et al. proposed a fully automatic segmentation method of brain MRI images using improvised fuzzy connectedness (FC) algorithm [25]. Since the size and position of the brain tumor is one of the most important information for analyzing the stage of the malignance of the tumor, a completely automated and highly accurate segmentation method is always in demand. In this chapter a fully automated procedure of tumor region specification and segmentation is suggested and implemented. The main algorithm that is followed here is improved FC algorithm. This proposed and improved algorithm finds the seed points automatically. If we consider the pixel values, this algorithm does not consider the tumor types according to that value. This proposed method claims to give a more accurate result in image segmentation with low contrast in comparison to the other traditional methods discussed in the chapter. Here the tumor size, position, area, and so on are estimated automatically, which help the medical practitioners for further treatment to be carried out [26, 27].
60
Banerjee Ishita, P. Madhumathy and N. Kavitha
Shally et al. calculated the brain tumor volume from the MRI images by few techniques mentioned earlier [28]. A sagittal, axial, and coronal view or orientation of the tumor image is studied thoroughly to estimate the tumor volume. Brain tumor as it appears at any age and any stage of a person’s life have different complications for different people. It can be of any shape and size and can be present in any location; when an MRI image is taken, it gives different image intensities. As diffuse growth and presence of tissue are different for each tumor, MRI appearances also vary a lot. Here 2D visualization of the MRI image is done and spectral clustering algorithm is used for segmentation. To get the area result more accurately, the image segmentation is done. Then from the segmented image, the convex hull region is marked. From this convex hull image, the area of the tumor is calculated. Once the area is known, using frustum method the volume of the tumor can be determined. To find the exact value of the tumor volume, there are several segmentation methods and out of which the most suitable one is carried out to identify the affected area [29]. For a detected benign tumor, the growth controls automatically, which most of the times do not object any of the normal functionalities of the brain. But in the case of brain malignancies, image assessment plays a major role for planning and further treatment of the tumor [30, 31]. Out of many imaging processes, MRI is preferred due to its dependency on the water molecule density-based approach to capture the image of the soft tissues. The information from the images is extracted by some of the efficient methods.
3.4 Limitations In the presence of high-convex region in the image, the K-means fuzzy system method efficiency fails; same is the case for high-concave region. In ANN algorithm, the main drawback is that the optimal parameter values cannot be clearly determined. No practical procedure is carried out to find the optimal parameter values. The halo effects reduce the accuracy of the algorithm. The calculation of the tumor area is very critical task since it does not posses any regular shape and size. Therefore, the volume calculation accuracy totally depends upon the delicacy of the technology to calculate the area. From the discussions it is concluded that no technique is perfect; for optimal results the individual methods should minimize its limitations.
3.5 Proposed methodology Few discontinuities are spotted with the existing system. The presence of noise content in the image could lead to misinterpretation of the MR image. The contrast of
3 Brain tumor image segmentation and classification using SVM, CLAHE, and ARKFCM
61
the output images could be enhanced more. The selection of features and classification strategies are difficult. Segmentation results are less accurate that led to incorrect classification. To overcome the above-mentioned difficulties, a new method is proposed to keep the image natural and to have pleasing perceptual quality without distortion, noise, or overenhancement. Illumination and contrast adjustment are done to improve the clarity of the image in terms of contrast using CLAHE and ARKFCM clustering algorithms for segmentation in the proposed system. Depending on the gray level intensity of the brain image portion, the segmentation is done using threshold algorithm. The tumor image is segregated from the other part of the image by morphological operations. From this image features are extracted related to the area and perimeter of the tumor. Later classification is done to find whether the image is normal or the tumor is present and is further analyzed for the tumor grade. The proposed system aims at achieving high accuracy of classifying normal and brain tumor images. The proposed system block diagram is shown in Figure 3.1.
Input MRI image
RGB to grayscale conversion
Classification using SVM
Region/ shape feature extraction
Normal Benign
Pixel intensity adjustment
Morphological operations
CLAHE-based image enhancement
ARKFCM clustering algorithm thresholding
Diseased Malignant
Figure 3.1: Block diagram of proposed system.
SVM is used to detect the tumor region from the MRI image from the extracted features of the segmented image. A feature subset selection method is used for the whole model. Pattern classification is achieved to maintain a class of objects. For the proposed work, SVM is used as a classification technique. SVM was researched and developed by Cortes and Vapnik in the AT&T laboratories. The theoretical characteristics of SVM contain two different classes. Training feature set, trained group, and testing features are the inputs for the SVM classifier. Here data points of different classes are separated for making boundary decisions. The concept of hyper planes is used. SVM handles both nonlinear and linear classification problems. Two classified data are used
62
Banerjee Ishita, P. Madhumathy and N. Kavitha
for processing and experimenting the proposed method: one of them is normal and other one is tumorous brain image. Hyperplanes classify these data and result comes either as normal image or as tumor image. This ANN algorithm classifies whether the tumor is benign or malignant. ANN-BP is artificial intelligence with supervised learning. The use of propagation algorithm reduces the probability of occurrence or error in the classification. With this error value the neurons are weighted for backpropagation. This way ANN is able to encounter the resolving problem. ANN-BP accepts the input as extracted features from the segmented image as first- and second-order statistics. This helps ANN to produce a result whether the tumor is malignant or not. The statistical features form the segmented images. SVM is used for the classification of tumor and nontumor images and further classification of the stages are done by ANN. Various techniques for the purpose of segmentation and classification of the brain tumor image are used. CLAHE, ARKFCM, thresholding, morphological operations, region/shape-based feature extraction, SVM, and ANN are various segmentation and classification algorithms. a. RGB to GRAY conversion: The input is the MRI brain image. Since it is RGB image, it has three channels; the output is the grayscale image. Syntax: b=rgb2gray(a); a-RGB input image; b-grayscale output image RGB to gray does the conversion by summing R, G, and B components: 0.2989*R + 0.5870*G + 0.1140*B b.
(3:1)
Image enhancement: Adaptive histogram equalization techniques improve the pixel intensity that enhances the contrast of the image. The CLAHE technique is used to make the salient features of the image more prominent and readable. The image is initially split into several disjoint regions. Then local histogram equalization is applied to all of these regions separately. Bilinear interpolation is applied to reduce the presence of boundary regions. The local window used for point transformation remains unaffected by the change in intensity variation of the image centers and edges. The point transformation distribution is a local distribution function of the mean intensity of the window and targets to span over the whole intensity range of the image. Consider W as an image of N × N pixels, which is centered on P (i,j). This image after filtration creates a subimage P of (N × N) pixels using the following equation: pn = 255 ð½∅wðpÞ − ∅wðMinÞ=½∅wðMaxÞ − ∅w ðMinÞÞ
(3:2)
3 Brain tumor image segmentation and classification using SVM, CLAHE, and ARKFCM
63
Where ∅w ð pÞ = ½1 + expfðμw − pÞ=σwg − 1
(3:3)
Max: maximum intensity of entire image and Min: intensity of the whole image. σw is the standard deviation, given as (3:4) μw = 1=N 2 Σ pði, jÞ for ði, jÞ belonging to ðk, lÞ h i p (3:5) σw = 1=N 2 Σ ðpðI, jÞ − μwÞ2 for ði, jÞ belonging to ðk, lÞ
c.
After adaptive histogram equalization, the dark area of the original image is illuminated so that the image appears to be brighter in comparison to the original image. Similarly, the portion of the image that had more illumination was reduced to match the contrast of the whole image. Image segmentation The gray image is converted to binary. The tumor area is represented in white color and the remaining area in black color. ARKFCM clustering algorithm for image segmentation is used. In the case of pure FCM algorithm, each pixel receives one membership value for all the clusters of the image. For an image ***I that has grayscale pixel xi at i (i = 1, 2, . . ., N), X = {x1, x2, . . ., xN} ⊂ Rk and cluster centers V = {V1, V2, . . ., Vc} with c value as (2 1, and ‖xi− Vj‖2 is the Grayscale Euclidean distance between pixel i and center Vj. The membership is ∀i 2 ½1, N , j 2 ½1, c: c X
uij = 1, uij 2 ½0, 1, 0 ≤
j=1
N X
uij ≤ N
(3:7)
i=1
The membership function equation and the position of cluster centers are given as uij =
1 1=ðm − 1Þ 2 2 jjx − v jj =jjx − v jj i j i k k=1
Pc
(3:8)
64
Banerjee Ishita, P. Madhumathy and N. Kavitha
PN
i=1
vj = P N
um ij xi
i=1
um ij
(3:9)
The parameter α controls contextual information. α-Value depends on noise and needs adaptive calculation. Local variation coefficient (LVC) estimation is done: P LVCi =
k2Ni
ðxk − xi Þ2
NR *ðxi Þ2
(3:10)
where xk is the grayscale of any pixel of the local windowNi around the pixel i, NR is the cardinality of Ni, and xi is its mean grayscale. LVCi is used to find an exponential function to derive the weights within the local window: ! X ζ i = exp LVCk (3:11) k2Ni , i≠k
ωi = P
ζi
k2Ni
ζk
The final weight assigned to each pixel is 8 > < 2 + ωi , x i < x i ’i = 2 − ωi , xi > xi > : xi = xi 0,
(3:12)
(3:13)
φi decides pixel values according to the LVC. For φi equal to zero, the algorithm is same as standard FCM algorithm. The proposed adaptively regularized kernel-based FCM framework is known as ARKFCM. The desired function is calculated as " # N X c N X c X X m m uij 1 − K xi , vj + ’i uij 1 − K xi , vj (3:14) JARKFCM = 2 i=1 j=1
i=1 j=1
The following conditions suggest the minimization of JARKFCM (u, V) as 1=ðm 1Þ 1 K xi ; vj þ ’i 1 K xi ; vj uij ¼ 1= Pc xi ; vk ÞÞÞ ðm 1Þ k¼1 ð1 K ðxi ; vk Þ þ ’i ð1 K ð
(3:15)
3 Brain tumor image segmentation and classification using SVM, CLAHE, and ARKFCM
PN vj =
65
m xi , vj xi i = 1 uij K xi , vj xi + ’i K PN m i = 1 uij K xi , vj + ’i K xi , vj
(3:16)
When x indicates the grayscale of the average/median filter of the original image, the algorithm is known as ARKFCM1/ARKFCM2. When xi defines weighted image ξi, the algorithm becomes ARKFCMw. Considering the mean value of all pixel, 1 or 0 is assigned to each value for pixel for binary conversion. If pixel value is greater than the mean value, then it is assigned as 1; else it is assigned as 0. An image with 256 gray levels is converted to black and white image by image binarization. Thus, the tumor region appears as white and the other region appears as black. This is a good way to segment the tumor image. After choosing the threshold value with Otsu threshold algorithm, the classification is done to assign white or black, that is, 1 or 0 to the image pixels for binary or bit pattern conversion. Morphological operations such as connected component removal are used to remove unwanted objects and segment tumor in the MR image. d. Feature extraction From the segmented image, region property-based features (area, major and minor axis, perimeter etc.) are obtained for further classification. e. Classification The theoretical characteristics of SVM have two different classes. Training feature set, trained group, and testing features are the inputs for the SVM classifier. Here data points of different classes are separated for making boundary decisions. The concept of hyperplanes is used. SVM handles both nonlinear and linear classification problems. Hyperplanes classify these data and result in either normal image or tumor image. This ANN classifies whether the tumor is benign or malignant. ANN-BP is artificial intelligence with supervised learning. The use of propagation algorithm reduces the probability of occurrence of error in the classification. With this error value the neurons are weighted for backpropagation. This way ANN is able to relate and adjust the resolving problem. ANN-BP accepts the input as extracted features from the segmented image as first- and second-order statistics. This helps ANN to give a result whether the tumor is benign or malignant.
3.6 Results and discussion MATLAB is a mathematical tool used for the analysis of different kinds of data in many research areas, including communications, signal processing, image processing, data modeling and analysis, control system, and so on.
66
(a)
Banerjee Ishita, P. Madhumathy and N. Kavitha
(b)
(c)
(d)
(e)
Figure 3.2: (a) Input brain MRI image 1; (b) CLAHE-enhanced brain tumor images; (c) thresholdbased segmented image; (d) cluster-segmented image; and (e) region of interest of brain MRI.
(a)
(b)
(c)
(d)
(e)
Figure 3.3: (a) Input brain MRI image 2; (b) CLAHE-enhanced brain tumor images; (c) thresholdbased segmented image; (d) cluster-segmented image; and (e) region of interest of brain MRI.
A dataset of 90 images consisting of 40 benign, 20 malignant, and 30 normal brain images are considered. ARKFCM algorithm is used for segmentation and their results are shown in Figures 3.2 and 3.3. The CLAHE method enhances the clarity of the images based on illumination and contrast adjustment. The MRI images undergo ARKFCM segmentation process, and features of the image are extracted, which are statistical- and region property-based features. The statistical-based features include entropy, contrast, homogeneity, and correlation. This parameter determines whether the extracted image is normal or tumorous brain, which is classified by the SVM algorithm. In these methods, two kinds of data are used: normal and abnormal images for classification. The proposed method
Table 3.1: Data analysis of proposed algorithm with existing algorithm. Algorithms
Sensitivity (%)
Specificity (%)
Accuracy (%)
BER
ANN
.
.
.
K-means
.
.
.
TKFCM
.
.
.
ARKFCM
.
.
.
3 Brain tumor image segmentation and classification using SVM, CLAHE, and ARKFCM
67
shows a better result in terms of sensitivity, specificity, accuracy, and bit error rate (BER). ANN helps classify the image as a benign tumor or malignant tumor; as shown in Figure 3.2(e) the first figure shows benign tumor and Figure 3.3(e) shows malignant tumor. The proposed method is compared with the existing methods (Table 3.1).
3.7 Conclusion and future scope SVM-, CLAHE-, and ANN-based image segmentation and classification methods are proposed. The morphological operations help to remove the unwanted objects from the image. These segmented tumor images are used to extract region propertybased features to calculate the tumor area, perimeter, and so on. In the classification process, these feature values help in determining the stage of the tumor. The accuracy of determining the stage of the tumor from the image using our proposed method is high. This method can be further enhanced by using it for larger set of data in other parts of the brain using optimal computation. Additional feature information can also be added to increase the accuracy furthermore. Following the same line, more interesting adaptive models for feature extraction and classification processes will be developed.
References [1] [2] [3]
[4] [5] [6]
[7]
[8] [9]
Wen PY, Kesari S., “Malignant gliomas in adults”, N Engl J Med, 359(5), pp. 492–507, 2008. Fox, Michael D. and Raichle, Marcus E, “Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging”, Nature Publishing Group, vol. 8, 2007. Louis DN, Perry A, Reifenberger G, Von Deimling A, Figarella-Branger D, Cavenee WK, et al., “The 2016 World Health Organization classification of tumors of the central nervous system: a summary”, Acta Neuropathol, 131 (6), pp. 803–820, 2016. Mohan G, Subashini MM, “MRI based medical image analysis: survey on brain tumor grade classification,” Biomed Signal Process Control, 39, pp. 139–61, 2018. Kelly PJ., “Gliomas: survival, origin and early detection”, Surg Neurol Int, 2010. Chaplot, S., Patnaik, L.M., Jagannathan, N.R. “Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network”, Biomed. Signal Process Control, 1, pp. 86–92, 2006. M. Maitra, A. Chatterjee, “A Slantlet transform based intelligent system for magnetic resonance brain image classification”, Biomed. Signal Process Control, 1, pp. 299–306, 2011. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. “A survey on deep learning in medical image analysis”, Med Image Anal, 2017. Zhang, Y., Wu, L., Wang, S. “Magnetic resonance brain image classification by an improve artificial bee colony algorithm”, Progress Electromagnetic Resolution, 116, pp. 65–79, 2011.
68
Banerjee Ishita, P. Madhumathy and N. Kavitha
[10] Rasti R, Teshnehlab M, Phung SL., “Breast cancer diagnosis in DCE-MRI using mixture ensemble of convolutional neural networks”, Pattern Recogn, 72, pp. 381–90, 2017. [11] Pan Y, Huang W, Lin Z, Zhu W, Zhou J, Wong J, et al., “Brain tumor grading based on neural networks and convolutional neural networks”, Engineering in Medicine and Biology Society (EMBC), 37th Annual International Conference of the IEEE. pp. 699–702, 2015. [12] Pereira S, Pinto A, Alves V, Silva CA., “Brain tumor segmentation using convolutional neural networks in MRI images”, IEEE Trans Med Imaging, 35(5), pp. 1240–51, 2016. [13] Saritha M, Joseph KP, Mathew AT., “Classification of MRI brain images using combined wavelet entropy based spider web plots and probabilistic neural network”, Pattern Recogn Lett, 34(16), pp. 2151–2156, 2013. [14] He K, Zhang X, Ren S, Sun J., “Delving deep into rectifiers: surpassing human-level performance on imagenet classification”, Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034, 2015. [15] El-Dahshan E-SA, Hosny T, Salem A-BM., “Hybrid intelligent techniques for MRI brain images classification”, Digit Signal Process, 20(2), pp. 433–41, 2010. [16] Zhang Y, Dong Z, Wu L, Wang S., “A hybrid method for MRI brain image classification”, Expert Syst Appl, 38 (8), pp. 10049–53, 2011. [17] Ahirwar, A., “Study of Techniques used for Medical Image Segmentation and Computation of Statistical Test for Region Classification of Brain MRI”, International Journal on Information Technology and Computer Science, pp. 44–53, 2013. [18] Joshi, J. et al., “Feature Extraction and Texture Classification in MRI” In Special Issue of IJCCT, pp. 130–136, 2010. [19] Richika et al., “A Novel Approach for Brain Tumor Detection Using Support Vector Machine, K-Means and PCA Algorithm” International Journal of Computer Science and Mobile Computing, vol. 4 (8), pp. 457–74, August– 2015. [20] Narkbuakaew, W., Nagahashi, H., Aoki, K., and Kubota, Y., “Integration of Modified K-Means Clustering and Morphological Operations for Multi-Organ Segmentation in CT Liver-Images,” Recent Advances in Biomedical & Chemical Engineering and Materials Science, pp. 34–39, March, 2014. [21] Lugina, M., Retno, N. D., and Rita, R., “Brain Tumor Detection and Classification in Magnetic Resonance Imaging (MRI) using Region Growing, Fuzzy Symmetric Measure, and Artificial Neural Network Back propagation”, International Journal on ICT, vol. 1, pp. 20–28, December, 2015. [22] Deng, G., “A Generalized Unsharp Masking Algorithm”, IEEE Transactions on Image Processing, vol. 20, no. 5, pp. 1249–1261, May, 2011. [23] Selvakumar, J., Lakshmi, A., and Arivoli, T., “Brain tumor segmentation and its area calculation in brain MR images using K-mean clustering and fuzzy C-mean algorithm,” International Conference on Advances in Engineering, Science and Management (ICAESM), pp. 186–190, 30–31 March, 2012. [24] Ahmmed, R. and Hossain, M. F., “Tumor stages detection in brain MRI image using Temper based K-means and Fuzzy C-means Clustering Algorithm”, Proceeding of 11th Global Engineering, Science and Technology Conference, pp. 1–10, 18–19 December, BIAM Foundation, Dhaka, 2015. [25] Harati, V., Khayati, R., and Farzan, A. R., “Fully Automated Tumor Segmentation Based on Improved Fuzzy Connectedness Algorithm In Brain MR Images,” Article in Computers in Biology and Medicine, vol. 41, pp. 483–492, April, 2011. [26] Kannana, S. R., Ramathilagam, S., Devi, R., and Hines, E., “Strong Fuzzy C-Means I Medical Image Data Analysis,” Article in The Journal of Systems and Software, vol. 85, pp. 2425–2438, December, 2011.
3 Brain tumor image segmentation and classification using SVM, CLAHE, and ARKFCM
69
[27] Szilagyi, L., Benyo, Z., Szilagyi, S. M., and Adam, H. S., “MR brain image segmentation using an enhanced fuzzy C-means algorithm,” in Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 724–726, Cancun, Mexico, September 2003. [28] Shally, H. R., and Chitharanjan, K., “Tumor volume calculation of brain from MRI slices”, International Journal of Computer Science & Engineering Technology (IJCSET), vol. 4, no. 8, pp. 1126–1132, 2013. [29] Li, C., Xu, C., Anderson, A., and Gore, J. C., “MRI tissue classification and bias field estimation based on coherent local intensity clustering: a unified energy minimization framework,” in Information Processing in Medical Imaging: 21st International Conference, IPMI2009, Williamsburg, VA, USA, July 5–10, 2009. Proceedings, vol. 5636 of Lecture Notes in Computer Science, pp. 288–299, Springer, Berlin, Germany, 2009. [30] Paul JS, Plassard AJ, Landman BA, Fabbri D. Deep learning for brain tumor classification. Proc of SPIE, pp. 1013710–1, 2016. [31] Pedano N, Flanders AE, Scarpace L, Mikkelsen T, Eschbacher JM, Hermes B, et al. Radiology data from the cancer genome atlas low grade glioma [TCGA-LGG] collection. Cancer Imaging Archive 2016.
Reddi Sivaranjani, Vankamamidi S. Naresh and Nistala V.E.S. Murthy
4 Coronary Heart Disease prediction using genetic algorithm based decision tree Abstract: Heart disease prediction is a burning issue, irrespective of age, work pressure, stress, and food habits, which can disturb the heart functionality. Classification of heart disease can be a value addition to doctors; this chapter aims at supporting doctors in taking decision to classify healthy and coronary heart disease (CHD) patients using popular modified decision tree by using genetic algorithm. Performance analysis of the proposed method is compared against data-mining approach, probability rule base classification; Five machine-learning algorithms include K-Nearest Neighbor (KNN), artificial neural network, support vector machine (SVM), decision tree, and modified decision tree using genetic algorithm. Analysis was performed with reference to accuracy, execution, and sensitivity. Results show that the decision tree using genetic approach predicts the CHD patient more accurately than other existing algorithms. Keywords: coronary heart disease, genetic algorithm, machine learning, decision tree
4.1 Introduction Today, machine learning helps the researchers in the process of tracing the unclear patterns and arrangements in databases, which are also used to build the predictive models. Healthcare is one of the dominating areas, where machine-learning algorithms were helpful in disease analysis and prediction. Medical industry generates huge amounts of complex patient diagnostic data, hospital resources, electronic records of the patients, details about doctors, diagnostic devices, and so on. This voluminous data is the key source for doing research in data analysis, knowledge extraction, and decision making. Coronory Heart Disease (CHD)[1] is the current alarming topic to do research now a days, from published statistics [2-9] 23% of loss of life in USA during 2008 is due to CHD. Also, in consonance with CDC (Center for Disease Control and prevention), around 735 thousand American citizens are effected with CHD. This motivated us to do the research in this area. Heart is a muscle, which is of the size of a human fist and is responsible for pumping blood to lungs, primarily collects blood, and transmits this rich-in-oxygen blood to the entire body via arteries.It receives blood through the vascular system named coronary circulation, mainly consists of arota, which is bifurcated into arteries called left and right coronary arteries. These coronary https://doi.org/10.1515/9783110621105-004
72
Reddi Sivaranjani, Vankamamidi S. Naresh and Nistala V.E.S. Murthy
arteries [10] diverse into smaller arteries, responsible to provide oxygen-rich blood to the entire heart muscle. The right coronary artery is responsible to deliver blood to the right part of the heart or to the lungs. The left coronary artery branches into two parts: left anterior descending artery and circumflex artery, responsible to supply blood to the rest of the body. In general, CHD starts with a small vandalization to the intima and causes deposit of pinguid scourge on injury. This deposit consists of cholesterol and other cellular waste products. An increase in the deposit on the damaged area can block artery and can sometimes block blood flow that leads to heart attack. Traces of heart attack [11] include chest ache and mild pain, coughing, giddiness, breath inadequacy, gray face, sickness in stomach and vomiting, insomnia, perspiration, and clammy skin. Few of the tests, which can help in CHD diagnosis, include electrocardiogram, Holter monitor, coronary catheterization, CT scan, nuclear ventriculography, and blood test. Controlling blood cholesterol levels will help in reducing the risk of CHD, which can be achieved by physical activity, reducing alcohol intake, avoiding tobacco usage, and taking healthy diet; diabetic people must follow doctors’ recommendations.
Objectives of the chapter – Understand and analyze statics, causes, and symptoms of CHD. – Elaborate the features causing the CHD. – Design and compare prediction analysis of CHD using support vector machine (SVM), KNN, artificial neural network (ANN), probability classifier, decision tree, and modified decision tree algorithm using genetic approach. – Performance analysis of CHD in above-mentioned algorithms. This chapter aims at addressing CHD prediction using various machine-learning algorithms. Section 4.2 deals with literature survey about CHD prediction; features of CHD dataset are given in Section 4.3; summary of the existing machine learning algorithms is in Section 4.4; a detailed theoretical explanation of the proposed algorithms are discussed in Section 4.5; performance and result analysis are in Section 4.6; and finally conclusion and future work are given in Section 4.7.
4.2 Literature survey The growth rate of CHD is the main motivation for people doing research in this field, who target to mine useful patient data from the medical diagnosis database. Data mining is a research area, facilitating many algorithms that are preferred to find patterns [12] in patients’ data and extracting knowledge to provide better
4 Coronary Heart Disease prediction using genetic algorithm based decision tree
73
diagnosis and better taking care of patients. It has many automatic pattern recognition techniques that perform much better than the traditional statistical methods. Robert [13] developed a logistic regression algorithm to predict the disease called ischemic heart disease, and attained an accuracy of 77%. Cheung [14] derived a prediction technique based on C4.5 method with Naïve Bayes classifier to predict heart- and blood vessel-related diseases, and achieved an accuracy of approximately 81% for both the diseases. Later, Rajkumar and Reena [15] presented a comparative analysis of Naïve Bayes and K-nearest neighbor (K-NN) algorithm in the prediction of CHD and achieved an accuracy of 53%. Sitar-Taut [16] developed a decision tree-based CHD prediction method using the popular data mining tool WEKA. Kemal [17] proposed random forest-based classification and achieved an accuracy of 97%. Prathiban [18] presented a paper based on a neural and fuzzy logic, aimed at the heart disease investigation. Ordonez et al. [28] adopted C4.5 decision tree and various regulation algorithms for CHD precision. Nihat and Onur [19] used K-means technique for clustering and SVM for CHD prediction. From the literature, authors used various clustering techniques such as SVM, neural networks, Naïve Bayes, decision tree, and so on to predict the CHD by considering body mass index (BMI) and patient’s personal details such as the features. From research analysis, accuracy obtained by decision tree is more compared to those obtained by other algorithms. It generates a tree-structured condition rule, which would be helpful in patient classification with and without CHD. The limitation of the decision tree algorithm is that tree construction is done using the greedy approach. Greedy algorithms usually run fast, but do not result in optimal decision tree. The generation of an optimal decision tree is an NP complete problem, which is impossible to solve by using existing approaches. To produce optimal decision tree, a genetic algorithm is used for the results obtained using the decision tree algorithm.
4.3 CHD dataset The South African heart disease dataset is used, which describes a retrospective sample of males in a high-risk heart-disease region of the Western Cape in South Africa. Each high-risk patient has been monitored and the following patient attributes were obtained [20], Table 4.1 lists features of the CHD, which causes the disease. The narration of each feature is as follows: 1. Systolic blood pressure (sbp): It happens when the heart contracts, resulting in maximum arterial pressure and also in contraction of the left ventricle of the heart. The time during which the contraction happens is termed as systole, which is record as the systolic pressure. Generally sbp represents first recoding of systolic pressure recording. For example, from 130/90 recording, sbp is 130 meaning 130 mm Hg.
74
Reddi Sivaranjani, Vankamamidi S. Naresh and Nistala V.E.S. Murthy
Table 4.1: CHD features information. SI. no.
Feature notation
Feature meaning
Type
Min value
Max value
.
sbp
Systolic blood pressure
Integer
.
Tobacco
Tobacco consumption (kg)
Real
.
.
ldl
Low-density lipoprotein (LDL-cholesterol)
Real
.
.
.
Adiposity
Adiposity
Real
.
.
.
famhist
Family record
Boolean
Absent
present
.
Type a
Type A behavior
Integer
.
Obesity
Obesity
Real
.
.
.
Alcohol
Current alcohol consumption
Real
.
.
.
Age
Age during onset of condition
Integer
.
chd
CHD response
Integer
2. 3.
Tobacco (tobacco): The amount of tobacco consumed by the patient in kilograms. Low-density lipoprotein cholesterol (ldl): The amount of cholesterol passing through blood in proteins is called “lipoproteins,” categorized into two classes, namely, ldl and HDL (high-density lipoprotein). In this chapter ldl in CHD prediction is considered. The normal and abnormal readings of ldl are mentioned in table. The ldl cholesterol levels can build up layers on blood vessel valves (called plaque), resulting in narrowing down of the blood flows between the heart and other organs and causing angina or cardiac arrest. 4. Adiposity: This feature is used to determine the body fat percentage. This is calculated by estimating Body Adiposity Index (BAI) of a human using the following formulae: BAI =
100*HC p − 18 H* H
(4:1)
where HC is the hip circumference in meters, and H is the height in meters. 5. Family history of heart disease (famhist): This feature represents the ancestors’ history by considering whether their family members had CHD. 6. Type-A behavior (type A): Personality hypothesis categorizes the human personalities into two categories: type A and type B. Personalities with characteristics of more competitiveness, ambitious, impatient, time managerial skills, and aggressive attitude fall under type A. The ones with relaxed, neurotic, frantic, explainable characteristics are labeled as type B. Research reveals that
4 Coronary Heart Disease prediction using genetic algorithm based decision tree
7.
75
personalities falls under type A have more chances of having CHD. The behavior of the type A patients is assessed in two ways: SI and JAS survey. SI assessment is use to measure the person emotions by asking questions by an interviewer. In JAS, patients will attempt the questions in three domains: impatience, job involvement, and competitiveness. Obesity: This is calculated by estimating BMI: BMI =
W ðKg=m2 Þ H2
(4:2)
where W and H symbolize weight and height, respectively. 8. Alcohol: Ethanol is the primary recreational substance present in a majority of beverages including beer, wine, and spirit. An increase in the amount of ethanol in beverages intensifies the levels of HDL, resulting in carrying cholesterol through the blood. This may lead to heart attack. In general, the amount of ethanol in the body is estimated by measuring the Blood Alcohol Content (BAC), which is given as MW*SD*1.2 − MR*DP *10 (4:3) BAC = BW*W where MW: 80.6% (water body in the blood); SD: the count of standard drinks; BW: body water (males: 0.58, females: 0.49); W: body weight (kg); MR: metabolism constant (male: 0.015, female: 0.017); and DP: drinking time (hours). 9. Age: In this feature we take the age range from 15 to 64 years. 10. CHD: Two values (1 and 2) were assigned to this feature to represent whether a patient has CHD; value 1 represents that the patient does not have CHD and the value 2 shows that the patient has CHD. Table 4.2 lists the ranges of readings and their severity in terms of individual feature.
4.4 Machine-learning classifiers After finalization of the selection of features, which will help in effective prediction of CHD, we need to design an appropriate machine-learning algorithm that will help classify healthy people from CHD people. In this chapter, we discuss the experimentation on different existing algorithms including linear SVM, K-NN, A-NN, and decision tree. SVM is a supervised machine-learning algorithm [21–25], used for linear and nonlinear data classification. Generally this plots the data items as a point in n-dimensional space where the value of each feature is the value of a coordinate. Usually this classifies the data into two classes by estimating a hyperplane that divides it into two partitions (classes). The SVM computes a linear hyperplane using
76
Reddi Sivaranjani, Vankamamidi S. Naresh and Nistala V.E.S. Murthy
Table 4.2: CHD features range description. SI. no
Feature
Reading range
Meaning
.
sbp ( mmHg)
(< ) ( –< ) ( –< ) ( –< ) (≥)
Very low normal Low normal High normal High Incredibly high
.
ldl (mg/dL)
> – – – < <
Extremely high High Marginal high Near to ideal Ideal for people at risk Ideal for people at very high risk
.
Obesity
< . .−