309 57 6MB
English Pages 293 Year 2020
Advances in Intelligent Systems and Computing 1124
Shruti Jain Sudip Paul Editors
Recent Trends in Image and Signal Processing in Computer Vision
Advances in Intelligent Systems and Computing Volume 1124
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/11156
Shruti Jain Sudip Paul •
Editors
Recent Trends in Image and Signal Processing in Computer Vision
123
Editors Shruti Jain Department of Electronics and Communication Engineering Jaypee University of Information Technology Waknaghat, Himachal Pradesh, India
Sudip Paul Department of Biomedical Engineering North-Eastern Hill University Shillong, Meghalaya, India
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-981-15-2739-5 ISBN 978-981-15-2740-1 (eBook) https://doi.org/10.1007/978-981-15-2740-1 © Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
The aim of publishing the book is to serve for undergraduate graduate and postgraduate students, researchers, academicians, policymakers, various Government officials, academicians, technocrats, and industry research professionals who are currently working in the field of academic research and research industry to improve lifespan of general public in the area of recent advances and upcoming technologies utilizing computational intelligence in signal processing, computing, imaging science artificial intelligence, and their applications. As the book includes recent trends in research issues and applications, the contents will be beneficial to professors, research scholars, researchers, and engineers. This book will provide support and aid to the researchers involved in designing image/signal processing problems in computer vision that will permit the societal acceptance of ambient intelligence. Computer vision is a recently emerging area in fundamental and applied research, exploiting a number of advanced information processing technologies that mainly embody neural networks, fuzzy logic, machine learning, and evolutionary computation. The “Recent Trends in Image/Signal Processing in Computer Vision” book encompasses all branches of artificial intelligence and machine learning which are based on computation at some level such as artificial neural networks, evolutionary algorithms, fuzzy systems, and medical automatic identification intelligence system. It presents the latest research being conducted on diverse topics in intelligence technologies with the goal of advancing knowledge and applications in this rapidly evolving field. Recent research showed that there is a strong relation between acetone concentration and ketone level available in human body in prediction of diabetes. There are several ways to diagnose diabetic patient and most of the conventional methods use invasive method. Chapter covering “Microcontroller based detection of Diabetes and Ketosis State using Breath Sensors” gives a glimpse of non-invasive method in prediction and diagnosis of diabetic patient. It takes human breath from sensitive glucose sensor to get as input to estimate acetone concentration and ketone levels to give the prediction of blood glucose level as result output. As an end result it can provide fast and real-time facility with great comfort in treatment and further in their more healthy life. v
vi
Preface
Traffic management and maintaining efficient and effective vehicle information is not an easy task in populated cities and countries. This is one of the biggest challenges whatever whole world is facing nowadays. It needs a novel approach that can fulfill the requirement of increasing traffic density in peak hours and also increasing number of vehicles day by day. One of the possible solutions that can solve this problem at certain level is “An Effective Graph-cut Segmentation Approach for License Plate Detection” that is briefly described in this chapter with implementation of segmentation algorithm and comparison of other methods that are currently in use. Electromyogram (EMG) signals are useful in diagnosis and prevention of several life threatening diseases. It is one of the basic tests done in the diagnosis of Amyotrophic Lateral Sclerosis (ALS). ALS is one of the neuromuscular and progressive motor neuron diseases. Gradual degradation of neuronal activity affects electrical potential of motor points leading to decreased muscular mobility of body parts. Destruction of nerve cells in the brain and spinal cord causes decreased control over voluntary muscle nerves. ALS detection can be done in several ways by analyzing parametric changes in their EMG signals. One of the novel methods that can be utilized in this is described in the chapter named “Iterative Filtering Based Automated Method for Detection of Normal and ALS EMG Signals” that can provide a better diagnostic result. Solar and renewable energies are becoming more and more important as we know fossil fuels are limited and its reserves are decreasing day by day. These resources are used in production of electricity and different types of other utilities. As we know new technologies always face several challenges in early stages, in the same way solar energy generation and transmission is also facing in terms of high production cost, low efficiency, and several environmental issues. Among all these issues solar monitoring is also playing a major role in remote areas power generation. An effective monitoring technology and development of a novel monitoring system for solar power generation and transmission can sort out power cut problem up to a certain level in remote areas and hilly terrains. Registration through thermal and visible light images can increase the early detection of disease. Visible images registration can support in various forms of medical image processing and also in industrial image processing. Various parameters that are crucial in processing, detection, and predictions are contrast variation, number of pixels, and texture. To cope with all these parameters an automatic system calibrated in précised manner can give a better result with high accuracy. “An automatic thermal and visible image registration using a calibration rig” chapter is covering this aspect in detailed and descriptive way. Heavy Duty Gas Turbines (HDGT) which ensures clean and efficient electrical power generation in grid connected operation experiences load disturbances on regular basis. Proportional plus Integral plus Derivative (PID) controller has been introduced to simple cycle gas turbines rated from 18.2MW to 106.7MW. In addition, fuzzy gain scheduled PID controller has been proposed and their dynamic behavior is analyzed. The simulation results in terms of time-domain parameters
Preface
vii
and error criteria reveal that the fuzzy gain scheduled PID controller yields better response during dynamic and steady-state period. Smartphone application for colorimetric quantification of bimolecular samples may give good results in terms of potential usage in biomedical imaging and analysis. Android and iOS being the most popular and versatile operating systems (OS) have been used to develop the software application. The developed app can take the images of the sample through the smartphone camera and thereby analyze and display the concentration of the given biomolecule with good accuracy. Effect of type 2 diabetes mellitus is incomplete without the evaluation of the Doppler ultrasonography, as it indicates the fluid pressure in the walls of the major blood vessels. Doppler USG is a non-invasive imaging technique that detects the vessel blockage and blood clots in the arteries. The technique of Doppler electrocardiography uses high-frequency sound waves to create an image of the heart while the use of Doppler technology allows determination of the speed and direction of blood flow by utilizing the Doppler Effect. Quality of individuals’ life depends on the advancements of the engineering applications to the healthcare technologies which has a major significance in economic, scientific, and societal terms. All health economies are facing challenges worldwide in composing an effective and economical system that is fully sufficient for the financial, clinical, and ancillary needs in health care. Due to this critical challenge, it is sometimes unsatisfactory and unsustainable and needs more attention. Over the last few decades, the life expectancy has increased significantly, so major area of concern is the elderly people. They often need assistance due to difficulties in mobility, dementia, or other health problems. In such cases, an autonomous modern advanced healthcare supporting system is helpful in these cases. Biomaterials can transform future of surgery and can enhance not only the tissue regeneration but also minimize immune responses and inhibit any kind of infection. Promises of developing materials that have the ability to promote regeneration of the tissue for the whole body have not yet become a solid reality. Due to emergence of nanotechnology, tissue engineering has been able to experience a great deal of progress in recent years. By the use of nano-textured surface features, tissue regeneration on a grand scale can be achieved. Biomaterial development strategies can be broadly classified into two sectors—primarily being through altered chemistry and the second method involves altering the physical implant properties like generating nano-meter surface features and in return varying the roughness of surface. Consequently, selective tailoring of biomaterials is done by varying chemical as well as physical factors to optimize favorable cellular interactions. This chapter’s focus is to understand the essence of nanotechnology in tissue engineering applications. Road traffic management is of importance to traffic engineers and road users. Several attempts at the management of traffic and traffic congestion have come short of reaching the desired goals because of the lack of suitable techniques for allocating time to various intersecting traffic routes. A suitable model for analyzing road network congestion and the concept of busy-hour from telecommunication theory can fulfill the need of the hour. To overcome this, algorithm is developed in
viii
Preface
Microsoft Studio Developer Platform with Fortran. The numerical results obtained in this show that the telecommunications approach can be used to indicate the state of congestion on a traffic route. Cardiovascular Diseases (CVDs) are now one of the main reasons for death, due to poor lifestyle management. Population living in rural and distant places is devoid of access to medical experts, especially in the field of cardiology, which leads to worst healthcare services. A novel diagnostic system for real-time predictive analysis based on ECG data and related health parameters/symptoms can provide instant suggestion to cardiologists across the globe. Tele-cardiology may provide a better possible solution by providing timely diagnosis and medication to rural populations and hence can save human lives. As the medical field is moving toward the digital world, the security of the medical data had raised a concern to the people. The medical data like patient information, medical history being stored in digital image. Medical images are regarded as important and sensitive data in the medical informatics systems. For transferring medical images among physicians over an insecure network, developing a secure encryption algorithm is necessary. Steganography is the method of protection of files such as images, videos, or text messages by concealing their information from unauthorized users using methods of encryption and masking of data and embedding them into different image or text file. Often medical records contain specific images regarding diagnosis, videos of research and special authorized experiments, physical examination, and other important visual details perhaps required for research. Breast cancer is one of the most life frightening diseases in women. It arises due to the uncontrolled growth of cells in the breast. The area suffering from damage is known as a lesion that is classified as benign and malignant. Classification can be done on breast lesions using a ratio texture feature obtained from the texture features calculated inside the lesion (IAI) and the texture feature calculated on the upper side of the lesion (UAI). Statistical texture features like EDGE, SFM, NGTDM, FOS, GLCM, GLRLM, and GLDS are calculated. The SVM classifier is used to classify the lesions on the basis of ratio texture feature. Software-Defined Radio (SDR) is a radio in which numeral physical layer functions are performed by using the software. SDR system design is a complex process. Performance of Bit Error Rate (BER) for different signals, i.e., image, video, and random signals and describe the essential concept for the development of the SDR-based transceiver model for QPSK and BPSK modulation scheme and analyze the performance of the system in MATLAB. The performance of the coherent receiver is evaluated at the certain delay for QPSK modulation scheme and for multiple transmissions BER is evaluated at an instantaneous delay. Shifting behavior of farmers from agriculture to other fields is alarming. Major factors affecting farming are initiatives taken by central and state governments from time to time in this context should be taken into consideration. Based on several factors and dimensions, innovative ways to manage agri-preneurship should be applied. Besides this, other areas such as farm tourism must also be explored.
Preface
ix
An attempt has to be made to motivate youth and women in family of farmers to provide support in this initiative. In today’s world, Internet is an emerging technology with exponential user growth. A major concern with that is the increase of toxic online content by people of different backgrounds. With the expansion of deep learning, quite a lot of researches have inclined toward using their deep neural networks for abundant discipline. Even for Natural Language Processing (NLP)-based tasks, deep networks specifically Recurrent Neural Network (RNN) and its types are lately being considered over the traditional shallow networks. The Wigner–Ville Distribution (WVD) gives a very high resolution timefrequency distribution but diminishes due to the existence of cross-terms. The suppression of cross-terms in WVD is crucial to get the actual energy distribution in Time-Frequency (TF) plane. The variational mode decomposition is applied to decompose a multi-component signal into corresponding mono-components and inter cross-terms are suppressed due to separation of mono-components. Thereafter, segmentation is applied in time domain to remove intra cross-terms present due to nonlinearity in frequency modulation. The obtained components are processed to get WVD of each component. Finally, all the collected WVDs are added to get complete time-frequency representation. Waknaghat, India Shillong, India
Shruti Jain Sudip Paul
Acknowledgements
At first, we would like to extend our gratitude to all the chapter authors for their sincere and timely support to make this book in grand success. We are equally thankful to all executive board members of Springer Nature for their kind approval and granted permission for us as Editors of this book. We would like to extend our sincere thanks to Mr. Aninda Bose, Senior Editor-Hardsciences, Springer Nature and Raashmi Ramasubramanian (Ms.), Production Coordinator (Books) and Silky Abhay Sinha (Ms.), Project Coordinator, Books Production, Springer Nature for their valuable suggestions and encouragement throughout project. It is with immense pleasure; we express our thankfulness to our colleagues for their support, love, and motivation in all our efforts during this project. We are grateful to all the reviewers for their timely review and consent which helped us lot to improve the quality of book. There are so many others whom we may have inadvertently left out and we sincerely thank all of them for their help.
xi
Contents
Microcontroller-Based Detection of Diabetes and Ketosis State Using Breath Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. H. Renumadhavi, N. Jamuna, A. N. Chandana, Vinay Balamurali, Praveen Kumar Gupta, Ryna Shireen Sheriff, Anushree Vinayak Lokur, Rhutu Kallur and R. Sindhu
1
An Effective Graph-Cut Segmentation Approach for License Plate Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ayodeji Olalekan Salau
19
Iterative Filtering-Based Automated Method for Detection of Normal and ALS EMG Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Richa Singh and Ram Bilas Pachori
33
A Study of Remote Monitoring Methods for Solar Energy System . . . . Gurcharan Singh and Amit Kumar Manocha
55
An Automatic Thermal and Visible Image Registration Using a Calibration Rig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lalit Maurya, Prasant Mahapatra, Deepak Chawla and Sanjeev Verma
67
Simple Cycle Gas Turbine Dynamic Analysis Using Fuzzy Gain Scheduled PID Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohamed Iqbal Mohamed Mustafa
77
An Image-Based Android Application for Colorimetric Sensing of Biomolecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sibasish Dutta
91
Doppler Ultrasonography in Evaluation of Severe Type 2 Diabetes Mellitus: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Saurav Bharadwaj and Sudip Paul
xiii
xiv
Contents
Advancements of Healthcare Technologies: Paradigm Towards Smart Healthcare Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Swati Sikdar and Sayanti Guha Artificial Intelligence Applications in Nanosized Biomaterial Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Rhutu Kallur, Praveen Kumar Gupta, R. Sindhu, Ryna Shireen Sheriff and R. Reshma A Probabilistic Approach to Time Allocation for Intersecting Traffic Routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Ayodeji Olalekan Salau and Thomas Kokumo Yesufu A Study of Telecardiology-Based Methods for Detection of Cardiovascular Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Nisha Raheja and Amit Kumar Manoacha An Update on Medical Data Steganography and Encryption . . . . . . . . . 181 Sindhu Rajendran, Varsha Kulkarni, Surabhi Chaudhari and Praveen Kumar Gupta Texture Ratio Vector Technique for the Classification of Breast Lesions Using SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Shruti Jain and Jitendra Virmani Error Control Coding for Software Defined Radios Using Soft Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Nikhil Marriwala Factor-Based Data Mining Techniques in Determining Agri-preneurial Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Monika Gupta Vashisht and Vishal B. Soni Detection of Hate Speech and Offensive Language in Twitter Data Using LSTM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Akanksha Bisht, Annapurna Singh, H. S. Bhadauria, Jitendra Virmani and Kriti Enhanced Time–Frequency Representation Based on Variational Mode Decomposition and Wigner–Ville Distribution . . . . . . . . . . . . . . . 265 Rishi Raj Sharma, Preeti Meena and Ram Bilas Pachori
About the Editors
Dr. Shruti Jain is an Associate Professor in the Department of Electronics and Communication Engineering at the Jaypee University of Information Technology, Waknaghat, H.P, India and has received her Ph.D. in Biomedical Image Processing. She has a teaching experience of around 15 years. Her research interests are Image and Signal Processing, Soft Computing, Bio-inspired Computing and ComputerAided Design of FPGA and VLSI circuits. She has published more than 10 book chapters, 60 papers in reputed journals, and 40 papers in International conferences. She has also published five books. She is a senior member of IEEE, life member and Editor in Chief of the Biomedical Engineering Society of India and a member of IAENG. She has completed one externally funded project and one in the pipeline. She has guided 01 Ph.D. student and now has 06 registered students. She is a member of the Editorial Board of many reputed journals. She is also a reviewer of many journals and a member of TPC of different conferences. She was awarded by Nation Builder Award in 2018-19. Dr. Sudip Paul has been an Assistant Professor at the Department of Biomedical Engineering, School of Technology, North-Eastern Hill University (NEHU), Shillong, India, since 2012. He holds a B.Tech. and M.Tech. degrees in Biomedical Engineering, and a Ph.D. from the Indian Institute of Technology (Banaras Hindu University), Varanasi, with a specialization in electrophysiology and brain signal analysis. He was selected as a Post-Doc Fellow under the Biotechnology Overseas Associateship for Scientists Working in the North Eastern States of India (2017–2018), supported by the Department of Biotechnology, Government of India. Dr. Sudip has published more than 90 international journal and conference papers and also filed four patents. Recently, he has completed three book projects and is currently serving as editor for a further two. Dr. Sudip is a member of various societies and professional bodies, including the APSN, ISN, IBRO, SNCI, SfN, and IEEE. He received first prize in the
xv
xvi
About the Editors
Sushruta Innovation Award 2011, sponsored by the Department of Science and Technology, Government of India, and numerous other awards, including the World Federation of Neurology (WFN) travelling fellowship, Young Investigator Award, and IBRO and ISN Travel Awards. Dr. Sudip has also served as an editorial board member for a variety of international journals, and has presented his research in the USA, Greece, France, South Africa, and Australia.
Microcontroller-Based Detection of Diabetes and Ketosis State Using Breath Sensors C. H. Renumadhavi, N. Jamuna, A. N. Chandana, Vinay Balamurali, Praveen Kumar Gupta, Ryna Shireen Sheriff, Anushree Vinayak Lokur, Rhutu Kallur and R. Sindhu
Abstract Researchers have demonstrated that breath acetone is an effective biomarker of type 2 diabetes which a habitual form of diabetes. Conventional way for the detection of glucose levels is through invasive technique which involves pricking the finger and collecting blood samples. This is not only painful and blood consuming but also time-consuming and expensive. Therefore, there has been a great demand for the non-invasive techniques of blood glucose determinations in the commercial market. Researchers have been attempting to develop a number of non-invasive techniques where the diabetes is detected by different methods outside the body, without puncturing the skin or without taking the blood sample. Keywords Biomarker · Type 2 diabetes · Acetone · Non-invasive · Glucose determination
1 Introduction 1.1 Diabetes Mellitus Diabetes Mellitus is a result of excessive blood sugar levels for a lengthy stretch of time as a result of metabolism complications [1]. Three types of Diabetes Mellitus are known to exist—type 1, type 2 and gestational diabetes.
C. H. Renumadhavi (B) · N. Jamuna · A. N. Chandana · V. Balamurali Department of Electronics and Instrumentation Engineering, R V College of Engineering, Bengaluru 560059, India e-mail: [email protected] P. K. Gupta · R. S. Sheriff · A. V. Lokur Department of Biotechnology, R V College of Engineering, Bengaluru 560059, India R. Kallur · R. Sindhu Department of Electronics and Communication Engineering, R V College of Engineering, Bengaluru 560059, India © Springer Nature Singapore Pte Ltd. 2020 S. Jain and S. Paul (eds.), Recent Trends in Image and Signal Processing in Computer Vision, Advances in Intelligent Systems and Computing 1124, https://doi.org/10.1007/978-981-15-2740-1_1
1
2
C. H. Renumadhavi et al.
– Type 1 diabetes mellitus is caused by beta cells of the pancreas losing its function which in turn results in lower insulin output than required by the body for metabolism. It is also known as Juvenile Diabetes [2]. – Type 2 diabetes mellitus occurs when cells fail to produce insulin appropriately [3]. – Gestational diabetes occurs in pregnant women without prior diabetic history. The affected woman has high blood sugar throughout her pregnancy [4]. Diabetes can be treated and prevented with a proper diet, sufficient exercise, avoidance of tobacco and maintenance of body weight [5]. Type 1 diabetes mellitus is controlled through insulin injections. Type 2 diabetes mellitus is usually controlled with or without insulin, but involving medication [6]. Gestational diabetes usually disappears post-childbirth [7].
1.2 Ketosis and Metabolism Normally, the body undergoes glycolysis through which blood glucose provides energy. In glycolysis, glucose is converted to pyruvate [8]. If glucose level in the body is unsatisfactory for glycolysis to take place, the body develops alternative strategies to release energy. In these cases, the body starts breaking down fats, existing in the state of triglycerides, to form glucose. Ketones are formed as a byproduct in this process [9]. The breakdown of fats in the liver produces ketones such as acetone. This procedure takes place due to a prolonged period of fasting, such as dieting, or overnight fasting. In this point of time, glucagon and epinephrine levels are usually within permissible limits, but insulin levels drop. This combination causes fats to be broken down in the liver into ketone units [10].
1.2.1
Energy Production via Metabolism
Metabolism occurs in the mitochondria and energy is produced, but fatty acids have a negative charge, and hence, cannot penetrate through the biological membranes. To solve this, coenzyme A binds to the fatty acid, and produces acyl-CoA. Acyl-CoA then penetrates the biological membranes to reach the mitochondria [11]. β oxidation then takes place, and the Acyl-CoA molecule then loses two carbon atoms as they get cleaved. Acetyl-CoA is thus formed [12]. Acetyl-CoA then undergoes aldol condensation with oxaloacetate to form citric acid when it enters the citric acid cycle (also known as the TCA cycle) and energy is produced [11] (Fig. 1).
Microcontroller-Based Detection of Diabetes and Ketosis …
3
Fig. 1 TCA cycle flowchart [13]
1.2.2
Ketogenesis
In this procedure, the two Acetyl-CoA molecules condense with the help of thiolase to produce Acetoacetyl-CoA. This merges temporarily for the formation of hydroxy-β-methylglutaryl-CoA using an additional Acetyl-CoA to form hydroxy-βmethylglutaryl-CoA using HMG-CoA synthase. Using HMG-CoA lyase hydroxyβ-methylglutaryl-CoA forms acetoacetate, a ketone. Acetoacetate then can spontaneously break down to form acetone as well as carbon dioxide. On a drop in the blood glucose levels of the body, these ketone bodies are transported from the liver to act as an energy supplier [14] (Fig. 2).
1.3 Ketosis and Ketoacidosis Ketosis occurs while serum concentration of ketone bodies crosses 0.5 mM, coupled with stable levels of insulin and blood glucose [16]. β-hydroxybutyrate and acetoacetate are released for the production of energy [9]. Glucagon and insulin aid in the regulation of levels of ketones in the body [10]. There are two kinds of ketoacidosis: alcoholic and diabetic.
4
C. H. Renumadhavi et al.
Fig. 2 Ketogenesis [15]
1.3.1
Diabetic Ketoacidosis
Diabetic ketoacidosis occurs in individuals that have a lack of insulin in their body. Normally, ketosis takes place. But when ketosis takes place in conditions of prolonged fasting, over a longer period of time, the ketones make the blood acidic in nature, since they have a low pKa. In the incentive stage, this change in blood pH is buffered by the body with the bicarbonate buffering system. Eventually, this system gets exhausted and other methods are adopted by the body to try to control the acidosis [17]. One of these methods include Kussmaul Respiration [18], which can be simply defined as extreme hyperventilation that reduces the carbon dioxide content in blood as a consequence of the increased depth of breathing [19]. Other symptoms include a drop in alertness, and in worst-case scenarios, a coma [20] (Fig. 3).
Microcontroller-Based Detection of Diabetes and Ketosis …
5
Fig. 3 Diabetic ketoacidosis (DKA) [21]
1.4 Breath Acetone: Importance Volatile Organic Compounds (VOC’s) are present in a large number in humans. Acetone is one of these VOC’s, and it is known to have a classic odour of decaying apples [22]. There are two major points to keep in mind when dealing with the effects and behaviour of acetone in the human body: (a) Acetone forms covalent bonds with peptides and other macromolecules [23]. (b) Acetone has free access to membranes in the body, like the barrier between the blood and the brain. This owes to its ability to be miscible with lipids [24].
6
C. H. Renumadhavi et al.
However the development of diabetes Mellitus goes about, there are two main metabolic variations: rigorous lipolysis and a spike in blood sugar level [25]. Breath acetone concentration levels are quite low for normal people, and elevated in diabetic patients [26]. Diagnostic methods are aiming to be more non-invasive as time and technology progresses.
2 Design and Implementation Figure 4 elaborates about the methodology used in the project. The MQ-135 gas sensor data is collected and uploaded on to the controller board. This provides the system with data using which the acetone concentration may be calculated. ArduinoCode performs all the calculations regarding this. The monitor then displays the calculated output. The data collected is analysed by determining subject state over a wide range of acetone concentrations.
2.1 Arduino Arduino is a popular platform for designing and building electronics or instrumentation projects. Arduino not only consists of a programmable physical circuit board (microcontroller) but also an IDE (Integrated Development Environment) that is run on a computer, used to develop and dump computer code on to the board itself. The Arduino platform has become quite popular with people just starting out with electronics, and for good reason. Unlike most previous programmable circuit boards, the Arduino does not need a separate piece of hardware (called a programmer) in order to load new code onto the board—you can simply use a USB cable. Additionally, the Arduino IDE uses a simplified version of C++, making it easier to learn to program. Finally, Arduino provides a standard form factor that breaks out the functions of the microcontroller into a more accessible package (Fig. 5).
Fig. 4 Block diagram to implement Arduino board
Microcontroller-Based Detection of Diabetes and Ketosis …
7
Fig. 5 Arduino chip
Fig. 6 MQ-135 gas sensor
2.2 MQ-135 MQ-135 is a gas sensor whose sensitive material is SnO2 and has lower conductivity in clean air. When the target combustible gas is present, the sensor’s conductivity is higher along with the gas concentration rising. MQ-135 gas sensor has higher sensitivity to Ammonia, Acetone, smoke and other harmful gases. It is with low cost and suitable for different applications (Fig. 6).
2.3 Software Implementation The code is written in Embedded C on the Arduino IDE and checked for any errors and debugged accordingly. Once the code is successfully debugged with no errors, the code is then ready to be fed into the Arduino board via a connecting cable. The connections will be elaborated using a circuit connection diagram discussed in a section below. The sensor is connected to the Arduino board. Once these connections
8
C. H. Renumadhavi et al.
Fig. 7 Circuit connections of the Arduino controller and MQ135 sensor
are made, the code is fed into the board and the program is run. The output window shall display the acetone levels.
2.4 Circuit Diagram of the Module The connections of the Arduino and MQ135 with the inbuilt functions are used. The sensor has the load resistance of 10 K ohms with an input source voltage of 5 V supplied to it. The input is given through the A0 pin of the analog input. The GND0, GND1 and GND2 are all grounded in addition to the gas sensor ground pin (Fig. 7).
2.5 MQ135 Calculation MQ-135 sensor provides analog voltage which is mapped into 1024 integer values and as MQ-135 has sensitivity material SnO2 . This when exposed to air, calculates the ppm of the respective gas (Fig. 8). The above graph shows an exponential function for each of the gases. The x-axis represents the concentration of gases from 10 to 1000 ppm and the y-axis represents the measured resistance from the analog output of the sensor and the resistance zero value (rs/ro) from 0.1 to 10. From the above graph, coordinate values of gases are calculated with the help of WebPlotdigitizer. From the laws of logarithmic, we know that
Microcontroller-Based Detection of Diabetes and Ketosis …
9
Fig. 8 Sensitivity chart of MQ-135 sensor (Rs/Ro vs. ppm)
m=
log log
2 y y1
2 x x1
Then we can calculate the ppm of gas as f (x) = y =
y1 ∗ x1 x1m
∗ xm
We need the final expression with respect to x, as x is rs/ro ratio so on solving the above equation with respect to x, we get the acetone concentration.
2.6 Algorithm Flow During Run-Time See Fig. 9.
3 Results and Analysis 3.1 Ensuring the Working of Sensor Initially, the sensor is tested for any fault in design or wiring to ensure optimum result. We tested certain products that were chiefly composed of the acetone. They were exposed to air near the sensor. For experimental purposes, let us take the nail
10
C. H. Renumadhavi et al.
Fig. 9 Run-time algorithm flow
polish remover, naphthalene balls, Surgical Spirit, Dettol or any antiseptic liquid and the hand sanitizer into our consideration (Table 1). Inference: From Fig. 10, it is observed that the nail polish remover has the highest composition of acetone when compared with the other products whereas naphthalene Balls have the least amount.
Microcontroller-Based Detection of Diabetes and Ketosis … Table 1 Level of acetone in different products containing varying levels of acetone
11
Name of the product
Acetone level (ppm)
Nail Polish Remover [27]
7.99
Hand Sanitizer [28]
5.9
Surgical Spirit [29]
5.16
Dettol Antiseptic Liquid [30]
1.87
Naphthalene Balls [31]
1.72
Fig. 10 Comparison of the acetone levels in various products
3.2 Normal Ranges of Acetone The acetone levels are compared to the ranges of the different patient groups. The values are measured in parts per million (ppm). The classifications are done accordingly. Inference: The normal ranges of the subjects will usually be in the ranges as shown in Table 2. The acetone levels in different patient groups are quite distinct and therefore can be used to distinguish the subjects. Table 2 Breath acetone ranges [32]
Breath acetone (ppm)
Patients group
14–168
Children in ketogenic diet
15–68
Fasting (36 h)
>1.8
Diabetic
0.2–1.8
Healthy
12
C. H. Renumadhavi et al.
3.3 Comparisons of Data Collected Four subjects belonging to different patient groups were chosen for this study. These subjects belonged to each of the different patient groups as defined in Table 2. They were made to exhale through their mouth in the vicinity of the sensor. The acetone levels in each subject were noted down and comparisons were drawn as shown in Table 3. Inference: From Fig. 11, we observe that the acetone levels are high in diabetic patients whereas in normal healthy subjects, the acetone content is low. Table 3 Data collected by different subjects Subject 1
Subject 2
Subject 3
Subject 4
Normal
0.8
1.08
0.83
0.99
Children
1.12
0.98
1.01
1.09
After glucose consumption
0.99
1.08
1.22
1.13
After sugar consumption
1.83
1.89
1.99
2.05
From nose during breath
1.2
1.15
1.1
1.09
From ear during breath
0.88
0.92
0.82
0.86
Diabetic patients
1.82
1.98
2.21
2.02
Fig. 11 Comparison of acetone levels of the four subjects coming under various patient groups
Microcontroller-Based Detection of Diabetes and Ketosis …
13
3.4 Displayed Result The acetone levels displayed on the monitor (serial terminal) are shown in the following screenshots. These results have been noted down when two consecutive values remain stable, that is, constant. These values were measured at a Temperature of 31 °C and at 53% Humidity level (Fig. 12). Inference: The initial values of acetone being measured at the considered temperature and humidity will usually be less. The initial values are dependent on both the temperature as well as humidity. When the temperature is less and humidity is more, the initial values will be comparatively high (Fig. 13). Inference: The normal values of acetone in a healthy individual ranges from 0.2 to 1.8 ppm. When the temperature is high and the humidity is less, the value will usually be less when compared to the values obtained at low temperature and high humidity (Fig. 14). Inference: As a direct consequence of consumption of glucose, insulin levels in the body increases and therefore, the fat cells breaking down the secreted fats stops and as a result, Ketone production inside the liver is stopped. Hence, the acetone level comes down to the normal range after consumption of glucose (Fig. 15). Inference: As the glucose level is less in a subject who is fasting, the insulin level also reduces gradually. As a result, the stored fat cells break the fat down which is then circulated to the liver organ, wherein it is processed into Ketone units and is circulated back into the bloodstream. These Ketone units are then picked up by the muscles as well as other tissues which are used for the body’s metabolism. Therefore, it can be concluded that the acetone levels are higher than normal when Ketone units are being produced.
Fig. 12 Initial values belonging to a healthy subject
14
C. H. Renumadhavi et al.
Fig. 13 Acetone levels in a healthy subject
Fig. 14 Acetone levels after consumption of glucose
4 Conclusion and Future Scope In recent times, it has to light that acetone cannot be regarded just as a waste product of metabolism as there are several channels through which acetone can be produced or broken down. Methods have emerged which makes the detection of acetone in exhaled breath possible, thereby offering an attractive alternative to the investigation
Microcontroller-Based Detection of Diabetes and Ketosis …
15
Fig. 15 After fasting for 9 h
of blood and urine samples [33]. The MQ-135 sensor used for the detection of acetone has high longevity and is low in cost. It requires a simple drive circuit to operate [34]. The technology used in this project is extremely simple and the module as a whole is highly portable which makes it a perfect solution for the detection of Diabetes Mellitus in remote areas like villages. The aim of this device in the future would be for the detection of diabetes in any patient. Potential future alterations of this device for further development would be in the field of differentiation, between patients who are actually diabetic, patients on special diets, starvation or diseases which have been inherited, like certain tumours of the lungs that cause acetone production in the body [33].
References 1. World Health Organization, Diabetes action now: an initiative of the World Health Organization and the International Diabetes Federation (2004) 2. Diabetes Prevention Trial-Type 1 Diabetes Study Group, Effects of insulin in relatives of patients with type 1 diabetes mellitus. N. Engl. J. Med. 346(22), 1685–1691 (2002) 3. J. Tuomilehto, J. Lindström, J.G. Eriksson, T.T. Valle, H. Hämäläinen, P. Ilanne-Parikka, S. Keinänen-Kiukaanniemi, M. Laakso, A. Louheranta, M. Rastas, V. Salminen, Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. N. Engl. J. Med. 344(18), 1343–1350 (2001) 4. P.J. Donovan, H.D. McIntyre, Drugs for gestational diabetes. Aust. Prescriber 33(5), (2010) 5. J. Lindström, P. Ilanne-Parikka, M. Peltonen, S. Aunola, J.G. Eriksson, K. Hemiö, H. Hämäläinen, P. Härkönen, S. Keinänen-Kiukaanniemi, M. Laakso, A. Louheranta, Sustained reduction in the incidence of type 2 diabetes by lifestyle intervention: follow-up of the Finnish Diabetes Prevention Study. The Lancet 368(9548), 1673–1679 (2006)
16
C. H. Renumadhavi et al.
6. S. Bolen, L. Feldman, J. Vassy, L. Wilson, H.C. Yeh, S. Marinopoulos, C. Wiley, E. Selvin, R. Wilson, E.B. Bass, F.L. Brancati, Systematic review: comparative effectiveness and safety of oral medications for type 2 diabetes mellitus. Ann. Intern. Med. 147(6), 386–399 (2007) 7. J.C. Cash, C.A. Glass (eds.), Family practice guidelines (Springer Publishing Company, 2017) 8. A.H. Romano, T. Conway, Evolution of carbohydrate metabolic pathways. Res. Microbiol. 147(6–7), 448–455 (1996) 9. P.C. Champe, R.A. Harvey, D.R. Ferrier, Biochemistry (Lippincott Williams & Wilkins, 2005) 10. D.G. Johnston, A. Pernet, A. McCulloch, G. Blesa-Malpica, J.M. Burrin, K.G. Alberti, Some hormonal influences on glucose and ketone body metabolism in normal human subjects, in Ciba Foundation Symposium, vol. 87 (1982), pp. 168–191 11. L. Stryer, Biochemistry, 4th edn (1995) 12. G.F. Cahill Jr., R.L. Veech, Ketoacids? Good medicine? Trans. Am. Clin. Climatol. Assoc. 114, 149 (2003) 13. J.W. Pelley, Citric acid cycle, electron transport chain, and oxidative phosphorylation. Elsevier’s Integrated Review Biochemistry, 2nd edn (WB Saunders, Philadelphia, PA, 2012), pp. 57–65 14. T. Fukao, G. Mitchell, J.O. Sass, T. Hori, K. Orii, Y. Aoyama, Ketone body metabolism and its defects. J. Inherit. Metab. Dis. 37(4), 541–551 (2014) 15. C.R. Barnett, Y.A. Barnett, Ketone Bodies (2003) 16. S.M. Phinney, J. Volek, The Art And Science Of Low Carbohydrate Performance (Beyond Obesity LLC, 2011) 17. A.E. Kitabchi, G.E. Umpierrez, J.M. Miles, J.N. Fisher, Hyperglycemic crises in adult patients with diabetes. Diab. Care 32(7), 1335–1343 (2009) 18. K.C. Bilchick, R.A. Wise, Paradoxical physical findings described by Kussmaul: pulsus paradoxus and Kussmaul’s sign. Lancet 359(9321), 1940–1942 (2002) 19. A. Kußmaul, Zur lehre vom diabetes mellitus. Dtsch. Arch. Klin. Med. 14, 1–46 (1874) 20. N.H.S. Diabetes, Joint British Diabetes Societies Inpatient Care Group. The Management of Diabetic Ketoacidosis in Adults (2011). www.diabetologists-abcd.org.uk/JBDS_DKA_ Management.pdf. Accessed 7 April 2014 21. S. Misra, N.S. Oliver, Diabetic ketoacidosis in adults. BMJ 351, h5660 (2015) 22. V. Ruzsányi, M.P. Kalapos, C. Schmidl, D. Karall, S. Scholl-Bürgi, M. Baumann, Breath profiles of children on ketogenic therapy. J. Breath Res. 12(3), 036021 (2018) 23. A. Kuksis, A. Ravandi, M. Schneider, Covalent binding of acetone to aminophospholipids in vitro and in vivo. Ann. N. Y. Acad. Sci. 1043(1), 417–439 (2005) 24. S.S. Likhodii, I. Serbanescu, M.A. Cortez, P. Murphy, O.C. Snead III, W.M. Burnham, Anticonvulsant properties of acetone, a brain ketone elevated by the ketogenic diet. Ann. Neurol. 54(2), 219–226 (2003) 25. R. Davies, Studies on the acetone-butanol fermentation: 4. Acetoacetic acid decarboxylase of Cl. acetobutylicum (BY). Biochem. J. 37(2), 230 (1943) 26. V. Saasa, T. Malwela, M. Beukes, M. Mokgotho, C.P. Liu, B. Mwakikunga, Sensing technologies for detection of acetone in human breath for diabetes diagnosis and monitoring. Diagnostics 8(1), 12 (2018) 27. W.H. Hofmann, Vi-Jon Laboratories Inc, Nail polish remover. U.S. Patent 4,824,662 (1989) 28. G. Mansour, D. El-rafey, Ethyl glucuronide, ethyl sulfate and acetone as biomarkers for alcohol based hand sanitizers chronic exposure in health care workers. Ain Shams J Forensic Med. Clin. Toxicol. 33(2), 80–91 (2019) 29. I.P. Dick, P.G. Blain, F.M. Williams, The percutaneous absorption and skin distribution of lindane in man: I. in vivo studies. Hum. Exp. Toxicol. 16(11), 645–651 (1997) 30. S.B. Azam, Comparative study on the antibacterial activities of four commercially available antiseptics-Dettol, Hexisol, Oralon and Betadine against Staphylococcus aureus, Klebsiella pneumoniae, Bacillus cereus, and Pseudomonas aeruginosa. Doctoral dissertation, BRAC Univeristy, 2017 31. G. Shi, G. Xue, C. Li, S. Jin, Layered poly (naphthalene) films prepared by electrochemical polymerization. Polym. Bull. 33(3), 325–329 (1994)
Microcontroller-Based Detection of Diabetes and Ketosis …
17
32. G. Neri, A. Bonavita, G. Micali, N. Donato, Design and development of a breath acetone MOS sensor for ketogenic diets control. IEEE Sens. J. 10(1), 131–136 (2009) 33. V. Ruzsányi, M. P. Kalapos, J. Breath Res. 11 024002 (2017) 34. ElProCus—Electronic Projects for Engineering Students. MQ135 Alcohol Sensor Circuit and Its Working (2019). Available at: https://www.elprocus.com/mq-135-alcohol-sensor-circuitand-working/. Accessed 14 Sep 2019
An Effective Graph-Cut Segmentation Approach for License Plate Detection Ayodeji Olalekan Salau
Abstract Despite the successes of license plate detection (LPD) methods in the past decades, only a few methods can effectively detect multi-style license plates (LPs), especially those from different countries. This paper addresses the challenge of LPD by using an automatic graph-cut-based segmentation approach to effectively detect LPs of varying sizes, colors, backgrounds, distances, and orientations. To evaluate our proposed approach, a developed algorithm was tested on 1050 vehicle images. An accuracy and average processing time of 98.67% and 0.1 s were achieved for the detection of LPs, respectively. Experimental results show that the proposed method can detect LPs from both the front and back view of vehicles and also vehicles with skew orientation. Toward the end, a comparison of results with existing methods is also reported. Keywords Graph cut · Image · Segmentation · License plate · Detection
1 Introduction License plate recognition (LPR) systems are effectively used today in many applications such as traffic monitoring and surveillance, traffic law enforcement, access control, automatic toll collection, and criminal pursuit [1]. License plate detection (LPD) is a major step in LPR [2]. The nonuniformity of license plates to any particular country-specific standard causes a problem of detection and extraction of LPs. This is due to varying plate size, plate color, and different orientations of LPs [3–5]. LPs can be detected either by boundary-based or region-based techniques. The boundary-based techniques use features associated with license plate (LP) edges, borders, and shape features to detect LPs, while the region-based techniques use features associated with LP regions such as plate color and pixel intensity to detect LPs in an image. Among these detection techniques, graph-based A. O. Salau (B) Department of Electrical/Electronics and Computer Engineering, Afe Babalola University, Ado-Ekiti, Nigeria e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 S. Jain and S. Paul (eds.), Recent Trends in Image and Signal Processing in Computer Vision, Advances in Intelligent Systems and Computing 1124, https://doi.org/10.1007/978-981-15-2740-1_2
19
20
A. O. Salau
segmentation techniques have gained considerable attention over the years for solving various problems in the field of computer vision [6]. Image segmentation techniques are extensively used in image analysis and processing. Image segmentation has been cited as one of the most important issues and stages in automatic image analysis, image processing, and pattern recognition [7]. By definition, it is an aspect of image processing which deals with the process of diving, partitioning, or grouping an image into different types of classes, parts or regions [8]. A number of authors have applied segmentation techniques for LPD [9] and character extraction [10]. The segmentation techniques used for LPD are categorized into three types, namely: region-based, boundary-based, or both features are used to extract LPs such as blob analysis, color matching, and edge detection techniques. The remainder of this paper is organized as follows. Section 2 presents a review of related works. The proposed method is described in Sect. 3. Section 4 presents the experimental results and therein, we give an evaluation of the performance of the proposed method with existing methods while the conclusion is presented in Sect. 5.
2 Related Works A computer on its own has no means to intelligently detect an object in an image without a set of algorithms to work with. For this reason, so many new algorithms are being developed for image segmentation. Image segmentation is an important preprocessing step for numerous applications in computer vision such as object recognition, scene analysis, automatic traffic control, and medical imaging [11]. Techniques used for segmentation use basic features found in the image to segment the image. This might be information about the pixels that indicate an edge, boundary or texture information, or color information that is used to create histograms. Techniques for LPD are divided into edge-based segmentation or region-based segmentation techniques. The edge-based segmentation techniques deal with edge extraction while the region-based segmentation techniques deal with determining the object boundary through the spatial partial characteristic of the image, such as gray, texture, and other statistical properties of pixels. Thresholding is the simplest and oldest technique used for segmenting images [12]. A threshold is set for pixels within the image, and once the threshold is exceeded, the set of pixels are set as foreground (fg ) and vice versa. The threshold is often a color value or the value of the pixel intensity. An example of a segmentation tool which has been incorporated into most graphics packages is the Magic Wand found in Photoshop 7. Over the past decade, numerous authors have proposed various techniques for LPD. In [13], Harris corner algorithm was used to detect LPs from captured vehicle images. The authors achieved a detection accuracy of 93.84% in the segmentation phase. Taiwan LPs were detected in [14], the authors used a You Only Look Once (YOLO)-darknet deep learning framework to effectively detect LPs. A detection
An Effective Graph-Cut Segmentation …
21
accuracy of 98.22% and recognition accuracy of 78% was recorded for LP recognition. In [15], a boundary-based contour algorithm which uses the area and aspect ratio of the vehicle to detect its LP was developed. Four templates were used for matching LPs to detect Iranian LPs in [16]. 120 vehicle images of size 1280 × 960 were used for the test. In [17], wavelet transform was used to detect LPs from cluttered images. 315 images of size 600 × 450 were tested and an accuracy of 92.4% was achieved.
2.1 Graph-Cut Technique There are various methods which have been used to obtain the best segmentation of an image. These range from the simple thresholding methods to more complex methods such as graph cut. Segmentation methods used in other areas of research such as medicine are the Snake and Livewire techniques which make use of edge detection methods [18]. In these methods, segmentation of the boundary is carried out by using an energy function which is created from the boundary information and certain constraints. This approach is also adopted by graph cut. Segmentation using graph cut minimizes the energy function to give the best segmentation using Min-Cut/Max-Flow algorithm [7]. This technique is used to separate the foreground (fg ) from the background (bg ) in an image by using graph theory. Most segmentation techniques use either contour or edge information to perform segmentation but graph cut uses both. The graph-cut technique was first used in 2001 by Boykov and Jolly for image segmentation [19]. The method requires user interaction. The user needs to draw a rectangle around the area of interest before segmentation can take place. This brought about the need for a method which could perform favorably with automated processes and which would use minimal user interaction to perform foreground extraction. The developed technique was the GrabCut technique [20]. GrabCut was developed based on the graph-cut segmentation approach in [19]. In graph cut, the image is treated as a graph made up of pixels that represent graph nodes. Each graph node has two labels which represent either a 1 (foreground (fg )) or a 0 (background (bg )). To separate the fg pixels connected to the source (s) from the bg pixels connected to the sink (t), a cost function is required. This cost function can efficiently be computed using the Max-Flow/Min-Cut algorithm. Max-Flow/Min-Cut algorithm segments the image by minimizing the cost function and by providing a globally optimal solution. A minimum cut of a graph is a cut which has minimum cost called min-cut and each edge of the graph is assigned a nonnegative weight denoted as We . A min-cut is the boundary of the segmented fg and bg image which is a subset of edges (e) and is denoted as C. The cost of the cut |C| is the sum of weights on edges C which can be expressed as follows:
22
A. O. Salau
|C| =
eC
We
(1)
A min-cut of the S-T graph provides an energy function which can be minimized. The energy function E(L) is given by Eq. (2). E(L) = αR(L) + B(L)
(2)
R(L) represents the regional information of the segmentation while B(L) represents the boundary information. α is the relative importance factor between the regional and boundary term. When α is zero, it means only the boundary information is required and the region information is ignored. The regional term R(L) and boundary term B(L) are given by Eqs. (3) and (4), respectively. R(L) =
Rp(lp)
(3)
(4) The remaining terms and their respective equations are shown and defined in [5].
2.1.1
Existing Challenge with the Graph-Cut Technique
Graph cut was developed as an interactive segmentation approach by Boykov and Jolly in 2001 [19]. Although the traditional graph-cut technique has been quite successful over the years, it has not performed efficiently in many cases where there are multiple and diffuse edges or similar objects at a close distance to one another. A semi-automatic graph-cut approach was proposed in [21] to solve this challenge by the introduction of shape prior information in graph cut to segment natural and medical images. Although this approach has not been singularly adopted to detect LPs before to the best of our knowledge, it has also not been fully automated to detect LPs except with modifications with GrabCut [5]. In literature, graph cut has been used for learning how to segment cloths [22], in humans for face or pose recovery, in medical imagery [23], and in video and natural image segmentation. In Fig. 1, S represents the source node, T represents the sink node in (b) and (c), while B and O represent regions of the background and foreground image as shown in Fig. 1(a). This type of graph is called an S-T (Source-Sink) graph, where S represents source node (object), and T represents the sink (background terminal). The interactive graph-cut segmentation technique can be improved in the following ways [4, 5, [8]: i.
By developing a more powerful and iterative form of the optimization technique.
An Effective Graph-Cut Segmentation …
23
Fig. 1 Pictorial representation of graph-cut theory [26]
ii. By using the strength of the iterative algorithm to simplify the user interaction which is essential to obtain good quality of results. iii. By developing an improved algorithm for “border-matting” to make the process fully automated. This will be useful for the simultaneous estimation of both the alpha-matte around the object boundary and the colors of fg pixels. In this paper, in contrast with the traditional interactive graph-cut segmentation method, we present a fully automatic graph-cut segmentation method for license plate detection (LPD).
3 Proposed Method In this work, the proposed graph-cut segmentation algorithm is implemented in open computer vision (OpenCV). Python programming language was used to write the program code. The Python edition of OpenCV used was JetBrains Pycharm Community Edition 2016.3.2.4. An overview of the proposed methodology is shown in Fig. 2. The proposed method comprises of the following steps: data acquisition, vehicle image preprocessing, feature extraction, and segmentation using graph cut.
24
A. O. Salau
Fig. 2 Overview of the proposed methodology
3.1 Data Acquisition The data acquired for this work were vehicle images. Precisely, 1050 vehicle images were acquired for the purpose of experimentation. These vehicle images were categorized into primary (dataset 1) and secondary data (dataset 2). The primary data consist of 200 vehicle images captured via a digital camera (Nixon D7000) with a size of 640 × 480, while the remaining 850 vehicle images were obtained from online computer vision (CV) databases (see [24]). The acquired 850 vehicle images comprise vehicle images of different image sizes. All the acquired images are stored in the system directorate (“C:/Python27/vehicleimages”).
An Effective Graph-Cut Segmentation …
25
3.2 Image Preprocessing In this stage, the images are resized. The differences in image size are as a result of the difference in the camera resolution of the capturing device used to capture the acquired images (primary data) as compared to those we obtained from online databases (secondary data). Since the graph-cut problem is a problem of how to achieve the best cut at a minimum cost and energy function, we did not need to perform grayscaling, binarization, and noise filtering to detect the LP except when the results are processed further or when using other methods (see Figs. 6 and 9).
3.3 Feature Extraction This stage describes the feature extraction (FE) approach adopted in this work to achieve accurate results. In most object detection and recognition problems, FE is a major step in solving such problems [25]. Apart from the boundary (edge and lines) features or region-based features (color and pixel intensity), geometric features such as aspect ratio have been used in LP-related problems. Aspect ratio (Ar ) is the ratio of the width to the height of an object (w/h). In the proposed algorithm, we have used an Ar within the range of 2 ≤ Ar ≤ 5. This range was carefully chosen after several simulations of varying the Ar .
3.4 Segmentation Using Graph Cut In this section, the developed graphcut-based segmentation algorithm is presented. The proposed algorithm for automatic LP detection using graph cut is presented in Table 1.
4 Results and Discussion This section presents the experimental results obtained from the simulations carried out using OpenCV. The results show the performance of the proposed method on multi-styled LPs and its ability to detect skew LPs.
26 Table 1 LP detection algorithm using graph cut
A. O. Salau License plate detection algorithm Requirement: multi-style vehicle images 1. Read vehicle image from image directorate (“C:/Python27/vehicleimages”) 2. Import image (img) to be processed from directorate to OpenCV using cv2.imread 3. Resize img of dataset 1 using cv2.resize (img, (newx, newy)) 4. Convert img to a graph 5. Extract geometric feature of rectangles using Ar (w/h) of range 2 ≤ Ar ≤ 5 6. Label rectangle width as x and rectangle height as y 7. Create a trimap (separate foreground, background, and object) with minimum cost (C) and energy function (E(L)) 8. Show resulting image 9. License plate accurately segmented (LP detected) 10. End if 9 (stop program) 11. Else return to stage 5 and use Ar of range 3 ≤ Ar ≤ 5 12. Continue until the classification converges
4.1 Experimental Results The proposed method was tested using images from dataset 1 and dataset 2. The designed algorithm was implemented in OpenCv (JetBrains Pycharm Community Edition 2016.3.2.4) and was run on an Intel Core i5, 8 GB RAM, 2.6 GHz CPU with Windows 8 operating system. We evaluate our results by using multi-style LPs and LPs of varying orientations. Accurate results for accurate LP detection are obtained using the proposed graphcut algorithm when the aspect ratio (Ar ) is in the range of 2 ≤ Ar ≤ 5 as shown in Fig. 3, while incorrect results are obtained when Ar < 2 or Ar > 5 is used as shown in Fig. 4. The overall accuracy of LP detection for the proposed graph-cut algorithm is presented in Table 2. The results show that all vehicle images were segmented either correctly (accurately) or incorrectly. 14 out of 1050 were segmented incorrectly; these were mostly vehicle images in which the picture quality was low or in which the algorithm found it hard to extract features from the image. The LPs shown in Fig. 5 with skew (slant) orientations were also used for the purpose of experimentation. Similarly, Fig. 6 shows that the proposed technique achieved accurate segmentation results for vehicle images with skew LPs. It was observed that for skew LPs, the algorithm achieved accurate results when Ar falls between the range of 3 ≤ Ar ≤ 5 as presented in Table 3. Furthermore, in Fig. 6 we show that the segmented images can be processed further by performing grayscaling and binarization on them. It was observed that an average processing time of 0.1 s was achieved for LPs having slant orientation while the processing time falls within a range of 0.1–0.2 s. In addition, it was observed that the adaptive aspect ratio falls between a range of 3.0–4.0 as shown in Table 3. In Fig. 7, the front view of some of the acquired vehicle
An Effective Graph-Cut Segmentation …
27
Fig. 3 Front and back view of vehicle images for LPs of different countries and their graph-cut segmentation results
Fig. 4 Results of incorrect foreground and background segmentation using Ar < 2
28
A. O. Salau
Table 2 Experimental results of accuracy test for LP detection of datasets 1 and 2 S/N
Number of vehicles
Percentage (%) accuracy of graph-cut segmentation
Total number of vehicle images
1050
100
License plates segmented correctly
1036
98.67
License plates segmented incorrectly
14
1.33
License plates not segmented
0
0
Fig. 5 Acquired slant vehicle images (a), (b), (c), and (d) for LPs of varying size, colors, and orientation
images are shown, and in Fig. 8 we show the results of the segmentation achieved. In Table 4, it was observed that a lower processing time for non-skew LPs (LPs with non-slant orientation) was achieved. The average processing time for non-skew LPs for acquired vehicle images is 0.08 s which falls within the range 0.07–0.08 s. Figure 9 shows results obtained when the proposed graph-cut segmentation method is used for optical character recognition (OCR). These results suggest that the proposed approach will perform effectively for OCR when the pixel color complexity is reduced, noise is removed, and the image is converted to a digital image through binarization. Finally, Table 5 presents a comparison of our proposed method with the
An Effective Graph-Cut Segmentation …
29
Fig. 6 Results of graph-cut segmentation, grayscaling, and binarization for skew vehicle images (a), (b), (c), and (d) Table 3 Results of processing time and aspect ratio of LPs with slant (skew) orientation Vehicle image
LP number
Aspect ratio
Processing time (s)
a
PY-5633
3.92307692308
0.115488113015
b
BIK-5900
3.13333333333
0.100472226729
c
YXT-9427
3.22222222222
0.181018798674
d
YHT-6033
3.37368421053
0.108196881796
Fig. 7 Acquired front view vehicle images (a), (b), (c), and (d) with LPs of varying size, colors, backgrounds, and distances
30
A. O. Salau
Fig. 8 Results of graph-cut segmentation for vehicle images (a), (b), (c), and (d)
Table 4 Results of processing time and aspect ratio of LPs with a non-slant orientation Vehicle image
LP number
Aspect ratio
Processing time (s)
a
ZG.4898.AC
3.92307692308
0.082021198875
b
HOT 6OY
3.13333333333
0.082072413760
c
977-K-593
3.22222222222
0.083390162619
d
EKY.718CL
3.37368421053
0.076521278873
Fig. 9 Results of grayscale conversion, binarization, and character segmentation of license plate numbers
existing state-of-the-art methods. The results show a fast processing time of the proposed graph-cut approach as compared with other techniques, and a comparatively high rate of LP detection.
5 Conclusion This paper presented an automatic graph-cut segmentation approach to effectively detect multi-style license plates (LPs). Experimental results show that the proposed
An Effective Graph-Cut Segmentation …
31
Table 5 Comparison of the accuracy of detection and processing time of the proposed method with other methods Author
Method
Accuracy of detection (%)
Processing time (s)
Proposed method
Graph cut
98.67
0.100
[9]
Sliding concentric window (SCW) algorithm
86.50
0.187
[4], [5]
Modified GrabCut algorithm
99.80
0.210
[14]
You only look once (YOLO)-darknet deep learning
98.22
0.800
[27]
2-level 2D Haar wavelet transform and Wiener-deconvolution vertical edge enhancement
98.00
0.300
[15]
Contour algorithm
93.00
Not specified
[16]
Template matching
93.00
Not specified
approach is efficient and robust for detecting LPs. In future works, we will explore raising the LP detection accuracy using a more sophisticated computing device to achieve a lower processing time. In addition, we will explore using the approach extensively for optical character recognition (OCR).
References 1. K.T. Thomas, J. Vaijayanthi, A review of automatic license plate detection using edge detection methods. Int. J. Res. Appl. Sci. Eng. Technol. 2(V), 18–22 (2014) 2. A.M. Al-Ghaili, S. Mashohor, A. Ramli, A. Ismail, Vertical-edge-based car license-plate detection method. IEEE Trans. Veh. Technol. 62(1), 26–38 (2013) 3. A. Roy, D.P. Ghoshal, Number plate recognition for use in different countries using an improved segmentation. IEEE 1–3 (2011) 4. A.O. Salau, Development of a vehicle plate number localization technique using computer vision. Ph.D. Dissertation. Obafemi Awolowo University, Ile-Ife, Nigeria, p. 200 (2018) 5. A.O. Salau, T.K. Yesufu, B.S. Ogundare, Vehicle plate number localization using a modified grabcut algorithm. J. King Saud Univ. Comput. Inf. Sci. (2019). https://doi.org/10.1016/j.jksuci. 2019.01.011 6. Z. Fu, L. Wang, Color image segmentation using gaussian mixture model and em algorithm. Springer 346, 61–66 (2012) 7. F. Yi, I. Moon, Image segmentation: a survey of graph-cut methods, in IEEE International Conference on Systems and Informatics (ICSAI) (2012), pp. 1936–1941 8. B. Basavaprasad, R.S. Hegadi, Improved grabcut technique for segmentation of color image. Int. J. Comput. Appl. (2014) 5–8
32
A. O. Salau
9. W.S. Chowdhury, A.R. Khan, J. Uddin, Vehicle license plate detection using image segmentation and morphological image processing, in International Symposium on Signal Processing and Intelligent Recognition Systems (2018), pp. 142–154. https://doi.org/10.1007/978-3-31967934-1_13 10. V. Franc, V. Hlaváˇc, License plate character segmentation using hidden markov chains, in Joint Pattern Recognition Symposium (2005), pp. 385–392. https://doi.org/10.1007/11550518_48 11. D. Khattab, H.M. Ebeid, F.M. Tolba, A.S. Hussein, Clustering-based image segmentation using automatic grabcut, in Proceedings of the 10th International Conference on Informatics and Systems (2016), pp. 95–100 12. C. Hung, Y. Chen, Y. Chang, S. Ruan, An efficient thresholding algorithm for license plate recognition based on intelligent block detection, in 4th IEEE Conference on Industrial Electronics and Applications, Xi’an (2009), pp. 236–240. https://doi.org/10.1109/iciea.2009. 5138203 13. T. Panchal, H. Patel, A. Panchal, License plate detection using harris corner and character segmentation by integrated approach from an image. Proc. Comput. Sci. 79, 419–425 (2016). https://doi.org/10.1016/j.procs.2016.03.054 14. H. Hendry, C. Chen, Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019). https://doi.org/10.1016/j.imavis.2019. 04.007 15. A.C. Roy, M.K. Hossen, D. Nag, License plate detection and character recognition system for commercial vehicles based on morphological approach and template matching, in 3rd IEEE International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), Dhaka. (2016), pp. 1–6. https://doi.org/10.1109/ceeict.2016.7873098 16. H.V. Dastjerdi, V. Rostami, F. Kheiri, Automatic license plate detection system based on the point weighting and template matching, in 7th IEEE Conference on Information and Knowledge Technology (IKT), Iran (2015), pp. 1–5. https://doi.org/10.1109/ikt.2015.7288783 17. C. Hsieh, Y. Juan, K. Hung, Multiple license plate detection for complex background, in 19th IEEE International Conference on Advanced Information Networking and Applications (AINA’05), Taipei, Taiwan (2005), pp. 389–392. https://doi.org/10.1109/aina.2005.257 18. E.N. Mortensen, W.A. Barrett, Intelligent scissors for image composition, in Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques (1995), pp. 191– 198 19. Y. Boykov, M.P. Jolly, Interactive graph-cuts for optimal boundary and region segmentation of objects in N-D images, in International Conference on Computer Vision. I (2001), pp. 105–112 20. C. Rother, V. Kolmogorov, A. Blake, GrabCut interactive foreground extraction using iterated graph-cuts. ACM Trans. Graph. 309–314 (2004) 21. D. Freedman, T. Zhang, Interactive graph cut based segmentation with shape priors, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) (2005), p. 8 22. F. Wang, S. Ferraro, L. Lin, T.S. Mayor, V. Molinaro, M. Ribeiro, Localised boundary air layer and clothing evaporative resistances for individual body segments. J. Egronomics 55(7), 799–812 (2012). Taylor and Francis 23. Z. Yu, M. Xu, Z. Gao, Biomedical image segmentation via constrained graph-cuts and presegmentation, in IEEE International Conference on EMBC (2011), pp. 5714–5717 24. Online CV Database: http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.html, www. cvpapers.com/datasets.html, www.medialab.ntua.gr/research/LPRdatabase.html 25. A.O. Salau, S. Jain, Feature extraction: a survey of the types, techniques and applications, in 5th IEEE International Conference on Signal Processing and Communication (ICSC-2019), Noida, India 26. Y. Boykov, O. Veksler, Graphcuts in vision and graphics: theories and applications, in Handbook of mathematical models in computer vision (2005), pp. 100–118 27. S. Yang, J. Jiang, M. Wu, C.C. Ho, Real-time license plate detection system with 2-level 2D Haar Wavelet transform and Wiener-deconvolution vertical edge enhancement, in 9th IEEE International Conference on Information, Communications and Signal Processing, Tainan (2013), pp. 1–5. https://doi.org/10.1109/icics.2013.6782805
Iterative Filtering-Based Automated Method for Detection of Normal and ALS EMG Signals Richa Singh and Ram Bilas Pachori
Abstract Electromyogram (EMG) signals have been proved very useful in identification of neuromuscular diseases (NMDs). In the proposed work, we have proposed a new method for the classification of normal and abnormal EMG signals to identify amyotrophic lateral sclerosis (ALS) disease. First, we have obtained all motor unit action potentials (MUAPs) from EMG signals. Extracted MUAPs are then decomposed using iterative filtering (IF) decomposition method and intrinsic mode functions (IMFs) are obtained. Features like Euclidean distance quadratic mutual information (QMIED ), Cauchy–Schwartz quadratic mutual information (QMICS ), cross information potential (CIP), and correntropy (COR) are computed for each level of IMFs separately. Statistical analysis of features has been performed by the Kruskal– Wallis statistical test. For classification, the calculated features are given as an input to the three different classifiers: JRip rules classifier, reduces error pruning (REP) tree classifier, and random forest classifier for the classification of normal and ALS EMG signals. The results obtained from classification process show that proposed classification method provides very accurate classification of normal and ALS EMG signals and better than the previously existing methods.
1 Introduction Human beings are apparently the most complex life forms on this planet. Millions of tiny components, each with its very own identity, working simultaneously in an organized way. The human body is a solitary structure but it is composed of millions of smaller structures of cells, tissues, organs, and various systems such as nervous, skeletal, and muscular systems. R. Singh · R. B. Pachori (B) Discipline of Electrical Engineering, Indian Institute of Technology Indore, Indore, India e-mail: [email protected] R. Singh e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 S. Jain and S. Paul (eds.), Recent Trends in Image and Signal Processing in Computer Vision, Advances in Intelligent Systems and Computing 1124, https://doi.org/10.1007/978-981-15-2740-1_3
33
34
R. Singh and R. B. Pachori
The nervous system provides link for communication between brain and muscles. Nerves have cells known as neurons which deliver messages from the brain to the muscles through the spinal cord. These neurons are known as motor neurons. Due to any reason, if neurons became unhealthy and dead then communication between brain and muscles breaks down. This condition is known as neuromuscular disease (NMD) [1]. Amyotrophic lateral sclerosis (ALS) is described by progressive NMD which results in loss of ability of movement, being wheelchair-bound, swallowing difficulties weakness of respiratory muscles and lastly, death due to respiratory failure [2]. According to statistics, about 1 in 3500 of the population around the world is expected to have an NMD present in childhood or in later life. According to statistics, ALS is the cause of five deaths in every 100,000 people of ages 20 or older [3]. Till now, there is no cure available for ALS. Therefore, identification at earliest stage and assessment of these diseases is crucial in order to understand the nature of the disease and to find possible therapies for these diseases. NMDs can be identified using electromyogram (EMG) signals. The EMG signal is a biomedical signal that measures the electrical potential produced in the muscles in the form of voltage and current during its contraction and relaxation [4]. The EMG signals can be characterized as nonstationary and nonlinear signals and influenced by the basic and utilitarian qualities of muscles. It is possible to evaluate EMG measurements to identify medical defects, actuation rate, or to study anatomy of motion of a person or animal. EMG signals are basically comprised of superimposed motor unit action potential (MUAPs) from several motor units (MUs) [5]. MUAP plays a major part in the analysis of EMG signals. An MU is defined as one motor neuron and all the innervating muscle fibers to it. When an MU fires, the impulse is transferred from motor neuron to the muscle and is known as an action potential. EMG signal is an extremely helpful diagnostic utility for NMDs like ALS. The detection of normal and abnormal EMG signals visually is a difficult and timeconsuming activity and requires very experienced and highly skilled neurophysiologist. This motivates the designing of automatic identification methods for NMDs like ALS. To diagnose the NMDs various methodologies have been proposed by researchers which are based on classification of EMG signals. Numerous transforms, various decomposition techniques, various types of feature extraction through different techniques, different types of classifiers were used for the classification of EMG signals in the past. A technique based on wavelet neural networks (WNN) and feedforward error backpropagation artificial neural networks (FEBANN) classifiers has been developed [6]. A method based on mel frequency cepstrum coefficient (MFCC) of MUAP has been proposed for classification of EMG signals in [7]. Tunable-Q factor wavelet transform (TQWT)-based classification of ALS and healthy EMG signals have been presented in [8]. Another method for classification of ALS disease has been proposed in [9], a convolutional neural network (CNN) is used for classification of EMG signals. Pattern classification technique with two-fold features extraction technique also has been used for EMG signal classification in [10]. The outlier-based method also has been presented in [11]. EMG signal classification has also been done by fuzzy logic
Iterative Filtering-Based Automated Method for Detection ...
35
in [12]. Another wavelet transform-based technique for characterizing EMG signals in terms of singularity was provided in [13]. A method based on empirical mode decomposition (EMD) for feature extraction was used on the EMG signal directly [14]. By applying attributes directly to non-overlapping frames obtained from EMG signals, classification of NMDs has been conducted in [15]. Another method based on TQWT has been presented for the detection of abnormal EMG signal in [16]. Classification of EMG signals has been done using eigenvalue decomposition-based time-frequency representation in [17]. In this proposed work, we have used iterative filtering (IF) method for classification of normal and ALS EMG signals. All MUAPs have been obtained from EMG signals and decomposed using IF method. Four features Euclidean distance quadratic mutual information (QMIED ), Cauchy–Schwartz quadratic mutual information (QMICS ), cross information potential (CIP), and correntropy (COR) have been calculated from intrinsic mode functions (IMFs) obtained from decomposition. These calculated features are then given to the three classifiers: JRip, REP tree, and Random forest to classify normal and ALS EMG signals. The objective of this chapter is to utilize the iterative filtering method for the decomposition of EMG signals to distinguish between normal and abnormal EMG signals and to get high accuracy of classification by computing less number of parameters.
2 Database Information The database of EMG signals has been taken from EMGLAB which is available online publicly [18]. The database consisted of three groups: a normal group, a group of myopathic patients, and a group of ALS patients. For our study, we have used normal group data and ALS group data. The normal group consists of 10 healthy persons aged between 21 to 37 years, out of which 4 were women and 6 were men. There were no traces or history of NMDs in any subject belongs to the normal control group. There were 8 patients in the ALS group; 4 women and 4 men were of age between 35 to 67 years. In addition to ALS-compatible clinical and electrophysiological symptoms, five of them were dead in a couple of years from the beginning of the disease which supports the presence of ALS [19]. The two most examined muscles were utilized in this database. The data was collected from three insertion depth from five locations in the muscles.
3 Methodology The proposed method for the classification of EMG signals has been represented in Fig. 1.
36
R. Singh and R. B. Pachori
Fig. 1 Block diagram of proposed method for EMG signal classification
3.1 MUAP Extraction Extracting the MUAPs from the EMG signal involves detecting and identifying potential from all MUs. MUAP extraction has been done in three stages: segmentation, clustering, and resolution [20]. First, EMG signal is partitioned into several time intervals and time intervals having MUAPs were searched. These time intervals are known as segments. In one segment, one MUAP or many superimposed MUAPs can exist. For segmentation process, a window wd of length 5.6 ms is applied along the entire EMG signal z(n). Thereafter, the variance of signal inside that window is calculated. The variance can be calculated by the following eq. [21]: ⎛ ⎞2 q q 1 1 var(k) = z 2 (k + j) − ⎝ z(k + j)⎠ w − 1 j=−q w − 1 j=−q
(1)
Iterative Filtering-Based Automated Method for Detection ...
37
Amplitude
(a) 500 0 -500
0
100
200
300
(b) Amplitude
400
500
600
400
500
600
Samples 1000 0
-1000
0
100
200
300
Samples
Fig. 2 Plots of extracted MUAPs from a Normal EMG signal, b ALS EMG signal
where var(k) denotes the variance of signal z(n) at the kth sample and calculated for the sample range from −q to q. And w represents the segment length. A new segment is detected only when the variance calculated inside the window is greater than a detection threshold. In clustering stage, all the similar looking segments are clustered into groups. The false templates and compound segments are resolved in the resolution stage [20]. Figure 2 shows the plots of extracted MUAPs for both normal and ALS EMG signals.
3.2 Iterative Filtering Decomposition Method IF is an iterative technique which decomposes a nonlinear and nonstationary signal into finite number of simple oscillatory components [22]. The obtained components from the IF method are known as IMFs. An IMF is a function that meets two necessary requirements mentioned in [23]. Unlike the traditional EMD technique, IF decomposition method remains stable under perturbations [24]. IF uses moving average in the sifting process which is determined by the convolution of that signal with low-pass filters. The process of acquiring IMFs by using IF is explained below: For a given signal s(t), where t ∈ R, let an operator R to make R(s) which determines the moving average filter of signal s(t). If h(τ ) denotes a double average filter then the moving average of signal s(t) can be determined as follows [24]: R[s(t)] =
m
−m
s(t + τ )h(τ )dτ
(2)
where double averaging filter h(τ ) is given by: h(τ ) =
(m + 1 − τ |) , (m + 1)2
t ∈ [−m, m]
(3)
38
R. Singh and R. B. Pachori
If s1 = s and consider a operator O1,n (sn ) = sn − R1,n (sn ) = sn + 1 which catches the fluctuations of sn . The first IMF can be shown by I1 = limn→∞ O1,n (sn ), where R1,n rely upon the mask length m n , which defines the filter length at step n. To obtain I2 (the second IMF), apply the operators O to the remainder signal s − I1 . Similarly, we obtained the q-th IMF as Iq = limn→∞ Oq,n (rn ) = rn+1 , where r1 = s − I1 − . . . − Iq−1 . The IF method stops when r = s − I1 − . . . − Iq , q ∈ N turns into a trend signal, i.e., the remaining signal r can maximum have one local maxima or minima. Thus, the signal s(t) can be represented in terms of decomposition as follows: s(t) =
q
I j (t) + r (t)
(4)
j=1
where q is count of IMFs obtained from the signal and r(t) represents the remaining trend signal. There are two nested loops present in IF algorithm: (i) internal loop (ii) external loop. Internal loop is to derive each single IMF and external loop is to figure out all the IMFs. The mask length m n is calculated as follows [22]: N mn = 2 β q
(5)
where β is a parameter generally fixed around 1.6. N represents count of samples present in signal sn (t) and q is count of extremes. . approximates a positive number to the nearest integer closer to zero. The mask length calculated for the first step in the executed algorithm has been used for the further steps too. Using same mask length at all steps of inner loop ensures that the IMFs obtained from this method have a proper group of instantaneous frequencies. To get that, O and R should not be dependent on step number n. So, the 1st IMF can be shown as I1 = limn→∞ O n (s), where O(s) = s − R(s) and R(s) is m defined by R(s)(t) = −m s(t + τ )u(τ )dτ , with mask length m calculated initially for the internal loop. And u(τ ) represents any appropriate filter function. The executed algorithm has some termination criterion for internal loop, and we did not take n = ∞. Termination criteria can be defined as follows: α=
||I1,n − I1,n−1 ||2 ||I1,n−1 ||2
(6)
To stop the algorithm, certain threshold value of α can be used as a stop criteria [23] or the maximum number of iterations can be set in the inner loop. By using IF method, MUAPs extracted from EMG signals have been decomposed into IMFs. We have used publicly available MATLAB code of the IF for decomposition of EMG signals. IF method has been used for automatic sleep stages classification of electroencephalogram (EEG) signals in [25]. We have obtained first
Iterative Filtering-Based Automated Method for Detection ...
39
500
Signal
0 -500
0
100
200
300
400
500
600
20
I1
0 -20
0
100
200
300
400
500
600
20 0
Amplitude
-20
I2 0
100
200
300
400
500
600
0.02 0 -0.02 5
I3 0
100
200
300
400
500
0 -5 2
I4 0
100
200
300
400
500
2
I5 0
100
200
300
400
500
600
×10 -3
0 -2
600
×10 -3
0 -2
600
×10 -3
I6 0
100
200
300
400
500
600
Samples
Fig. 3 Plots of a normal EMG signal and it’s first six obtained IMFs
six IMFs for both normal and ALS classes of EMG signals. Extracted IMFs for normal and ALS EMG signals are displayed in Figs. 3 and Fig. 4, respectively.
3.3 Feature Calculation The features have been calculated from the decomposed components to uniquely distinguish normal and ALS data. In this work, we have studied QMIED , QMICS , CIP, and COR features for each level of IMFs separately, which are extracted from MUAP signal. • Euclidean distance quadratic mutual information: The mutual information (MI) of two random variables quantifies the amount of information obtained about
40
R. Singh and R. B. Pachori 1000
Signal
0 -1000
0
100
200
300
400
500
600
200 0 -200
I 0
100
200
300
400
500
1
600
200 0
Amplitude
-200
I 0
100
200
300
400
500
2
600
20 0 -20
I 0
100
200
300
400
500
3
600
5 0 -5
I 0
100
200
300
400
500
4
600
5 0 -5
I 0
100
200
300
400
500
5
600
2
I
0 -2
0
100
200
300
400
500
6
600
Samples
Fig. 4 Plots of a ALS EMG signal and it’s first six obtained IMFs
one random variable through observing the other random variable. When MI measured with only simple quadratic form of probability density functions (PDFs), then it is known as quadratic mutual information (QMI). So basically, QMI is a measure for statistical dependency between random variables [26, 27]. The Euclidean distance is straightforward distance measure for two PDFs, and defined as follows [28]: 2 f (y) − z(y) dy (7) DED ( f, z) = where DED denotes the Euclidean distance between two PDFs f (y) and z(y). The square of distance between the joint PDF and the factorized marginal PDF is known as the QMIED . QMIED for two random variables Y1 and Y2 can be defined as follows [28]:
Iterative Filtering-Based Automated Method for Detection ...
Q M IED (Y1 , Y2 ) = D E D f Y1 Y2 (y1 , y2 ), f Y1 (y1 ) f Y2 (y2 )
41
(8)
where f Y1 Y2 (y1 , y2 ) denotes the joint PDF of Y1 and Y2 and f Y1 (y1 ) and f Y2 (y2 ) are the marginal PDFs of Y1 and Y2 , respectively. • Cauchy–Schwartz quadratic mutual information: QMICS is a variant of QMI which is based on the Cauchy–Schwartz distance between two PDFs and can be calculated by the following equation [28]:
2 f (y)dy z 2 (y)dy
DCS ( f, z) = log f (y)z(y)dy
(9)
where DC S denotes the Cauchy–Schwartz distance between two PDFs f (y) and z(y). Based on Cauchy–Schwartz distance, QMICS for two random variables Y1 and Y2 can be defined as follows [28]: Q M ICS (Y1 , Y2 ) = DC S f Y1 Y2 (y1 , y2 ), f Y1 (y1 ) f Y2 (y2 )
(10)
• Cross information potential: CIP is characterized as similarity between two PDFs [29]. CIP has been computed as follows [30]: N N 1 C I P(A, B) = 2 k(ai − b j ) N i=1 j=1
(11)
where A and B denote two random variables data sets with independent and identically distributed (iid) sample sets {a1 .........a N } and {b1 .........b N }, respectively, and N represents the count of samples. ai is the ith sample of the data set A and b j is the jth sample belong to data set B. And k(ai − b j ) is a kernel function. • Correntropy: COR is the measure of probability of closeness between two random variables in a neighborhood of the joint space, in a specific window controlled by the kernel size [31]. The COR determines the similarity between the signal and the delayed samples of the signal. COR function for two random variables a and b can be calculated as follows [32]: C O R(a, b) =
N 1 1 (a − b)2 exp − √ N i=1 2πσ 2σ 2
(12)
In this work, we have calculated above four features of decomposed IMFs for both classes separately by taking first sample as a reference from both normal and ALS classes of EMG signals. We have used the ITL toolbox for the calculation of all features, with kernel size equal to 1.
42
R. Singh and R. B. Pachori
3.4 Analysis Using Kruskal–Wallis Statistical Test Kruskal–Wallis statistical (KWS) [33] test is also known as Kruskal–Wallis H test or one-way analysis of variance (ANOVA) on ranks [34, 35] and it uses rank of data instead of data value [36, 37] to compare two or more than two independent samples with having similar or dissimilar number of samples. It is a nonparametric [38, 39] method to test if samples come from a similar distribution. The KWS test looks for the median of groups to determine whether they are different. In this work, The KWS test is performed on the features calculated from both normal and ALS classes. The data from both classes is combined into one string, and a rank is given to all data points in that combined set. The KWS test is performed to get the statistical significance ( p < 0.05) of features.
3.5 Classification Classification refers to the process of predicting the class of given data points. In this proposed work, we have used three classifiers, viz., JRip rules classifier, REP tree classifier, and random forest classifier, to differentiate EMG signals. We have used 10-fold cross-validation method to classify the data. The classifiers used in this proposed work, are discussed below briefly. • JRip rules classifier: JRip is a popular and fundamental classifier for classification among different classes. JRip applies a propositional rule learner which is known as “repeated incremental pruning to produce error reduction” (RIPPER) [40]. In JRip algorithm, first all classes are reviewed on the basis of their increasing size, and a number of rules is created for the class initially by utilizing incremental reduced error JRip. After that, all the examples of a specific decision in the training data treated as a class, and a number of rules covering all participants of that class has obtained. It then continues to the following class and does likewise. Repetition takes place until all classes have been addressed [41, 42]. • REP tree classifier: This classifier is a fast decision tree learning algorithm. It functions on the concept of estimating the information gain and entropy while decreasing the error caused by variance [43]. REP tree applies regression tree logic and generates multiple trees in altered iterations. Thereafter, it selects the best tree among all generated trees and considers it as representative. REP tree classifier builds a decision/regression tree using variance and information gain [44]. • Random forest classifier: This classifier is a set of decision tree classifiers, where every classifier is created utilizing a random vector which is sampled independently from the input vector [45, 46]. To classify an input vector, every tree makes a unit choice for the most prominent class, and then it takes the average of the votes received from different decision trees in order to decide the final class of the test objects [47]. It basically merges the multiple decision trees to create a wide
Iterative Filtering-Based Automated Method for Detection ...
43
diversity which leads to the better classification accuracy [48]. Random forest classifier works efficiently even on the large data sets. In this proposed work, the popular machine learning toolbox Waikato Environment for Knowledge Analysis (WEKA) has been used.
4 Results and Discussion The IF method decomposes the nonlinear and nonstationary signal into its components known as IMFs. In this proposed method, first six IMFs have been obtained from the MUAPs which are extracted from EMG signals. Four features QMIED , QMICS , CIP, and COR are then calculated from these IMFs. KWS test is performed to analyze features to get the statistical significance of ( p < 0.05). The obtained p-values from the analysis of all features are given in the Tables 1, 2, 3, 4 and box plots are displayed in the Figs. 5, 6, 7, 8. From the obtained results we can observe that for all features, p-values obtained are less than 0.05 and either zero or very close to zero. This shows that all calculated features are very suitable for the classification of EMG signals. The obtained features are given as input to the three classifiers, viz., JRip rules classifier, REP tree classifier, and random forest classifier. Three classification parameters, viz., accuracy (Acc), sensitivity (Sen), and specificity (Spe) have been calculated for all features [49, 50]. The Acc can be defined as the ability to differentiate the normal class and ALS class correctly. The Sen can be defined as the ability to determine the ALS class correctly. The Spe can be defined as the ability to determine the normal class correctly. Table 5 shows the best classification results when we have used only one feature to classify normal and ALS EMG signals. Table 6 shows the best obtained classification results when we have used two features to classify normal and ALS EMG signals. The obtained classification results show that even with only one feature we got the maximum Acc of 97.8%, and the highest Spe and Sen obtained are 99.7% and 100%, respectively. The Acc is increased upto 99.9% when we use only two features for classification and highest Spe and Sen obtained are 100% and 100%, respectively. These classification results show that proposed method is very effective for classifying EMG signals. Table 7 shows the performance comparison of our proposed method with other previously existing methods for classification of EMG signals, which indicates that our proposed method is better for the classification of normal and ALS EMG signals.
5 Conclusions and Future Scope In this chapter, IF decomposition-based method is proposed for classification of normal and ALS EMG signals. Statistical features like QMIED , QMICS , CIP, and
QMIED 2 6.7073 × 10−64
QMIED 1
2.2524 × 10−109
Features
p-value
Table 1 p-value for QMIED feature of first six IMFs QMIED 3 0
0
QMIED 4
5.2049 × 10−275
QMIED 5
QMIED 6 1.9133 × 10−231
44 R. Singh and R. B. Pachori
Iterative Filtering-Based Automated Method for Detection ...
45
Table 2 p-value for QMICS feature of first six IMFs Features
QMICS 1
QMICS 2
QMICS 3
QMICS 4
QMICS 5
QMICS 6
p-value
8.2216 × 10−107
5.3181 × 10−75
0
0
2.1800 × 10−276
2.1512 × 10−217
Table 3 p-value for CIP feature of first six IMFs Features p-value
CIP1
CIP2
0.0407
1.5295 × 10−283
CIP3
CIP4
CIP5
CIP6
0
1.3511 × 10−290
1.1558 × 10−284
0
COR have been calculated from the decomposed components. KWS test is performed to analyze features to get the statistical significance of ( p < 0.05). From the analysis results, we can see that for all features, p-values obtained are less than 0.05. For most of the features it is either zero or very close to zero. This shows that all calculated features are highly suitable for the classification of normal and ALS EMG signals. For EMG signal classification, we have given QMIED , QMICS , CIP and COR features in three different classifiers: JRip rules classifier, REP tree classifier, and random forest classifier. The results show that even with the use of only one feature we got the maximum Acc of 97.8%. The Acc has been increased upto 99.9% when two features are used for classification. These classification performance measures show that our proposed methodology is good enough to classify normal and ALS EMG signals very accurately. On the basis of results obtained from the feature analysis and classification of EMG signals, it can be easily concluded that IF decomposition method is very accurate with even less number of features extracted and better than the other previously existing methods for the classification of normal and ALS EMG signals. This methodology can help in differentiating abnormal and normal EMG signals for diagnosis of subjects. It is necessary to study the proposed method for classification of normal and ALS EMG signals in this chapter on large databases before applying it for clinical applications. In future, the proposed method can be studied for analysis and classification of other biomedical signals corresponding to normal and abnormal classes and can be very useful for identifying other NMDs.
COR1
7.3062
Features
p-value
× 10−25
0
COR2
Table 4 p-value for COR feature of first six IMFs 1.3189 × 10−318
COR3 6.0976 × 10−277
COR4
2.0751 × 10−280
COR5
4.9407 × 10−324
COR6
46 R. Singh and R. B. Pachori
Iterative Filtering-Based Automated Method for Detection ...
(a)
(b)
1
2
×10 -3
0.8
47
(c)
×10 -3
20
1.5
15
1
10
0.5
5
×10 -4
0.6
0.4
0.2
0
Normal
ALS
(d) 12
0
Normal
ALS
(e)
×10 -4
7
3
ALS
Normal
ALS
×10 -4
2.5
5
8
Normal
(f)
×10 -4
6
10
0
2
4 1.5
6 3
1
4 2 2 0 -2
0.5
1
0
0
Normal
ALS
-1
Normal
ALS
Fig. 5 Box plots of QMIED feature for first six obtained IMFs
-0.5
48
R. Singh and R. B. Pachori
(a) 3.5
(c)
(b)
×10 -3
5
×10 -3
15
×10 -3
3 4
10
2.5 3
2
5 1.5
2
1
0
1 0.5 0
Normal
ALS
Normal
ALS
(e)
(d) 20
0
×10 -3
6
-5
Normal
ALS
Normal
ALS
(f)
×10 -3
3
5
2.5
4
2
3
1.5
2
1
1
0.5
0
0
×10 -3
15
10
5
0
-5
Normal
ALS
-1
Normal
ALS
Fig. 6 Box plots of QMICS feature for first six obtained IMFs
-0.5
Iterative Filtering-Based Automated Method for Detection ...
(a)
(b)
0.4
0.4
0.35
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
49
(c) 0.4 0.35 0.3 0.25 0.2 0.15 0.1
Normal
ALS
0.05
Normal
ALS
0
(d)
(e)
(f)
0.4
0.4
0.4
0.35
0.35
0.3
Normal
ALS
Normal
ALS
0.35
0.3
0.3
0.25 0.25
0.25
0.2 0.2 0.15
0.15
0.1
0.05 0
0.2
0.15
0.1
Normal
ALS
0.05
Normal
ALS
Fig. 7 Box plots of CIP feature for first six obtained IMFs
0.1
50
R. Singh and R. B. Pachori
(a)
(b)
(c)
1
1
0.9
0.9
0.8
0.8 0.8
0.7
0.6
0.7 0.6 0.6 0.5
0.2
0.4 0.3
0.4
0.5 0.4
Normal
ALS
0.3
Normal
0
ALS
(d)
(e)
(f)
1.2
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
Normal
ALS
Normal
ALS
1
0.8
0.6
0
Normal
ALS
0
0.4
Normal
0.2
ALS
Fig. 8 Box plots of COR feature for first six obtained IMFs Table 5 Classification results by using only one feature Features
Classifiers
QMIED 3
JRip REP tree Random forest JRip
QMICS 3
CIP6
COR2
COR6
Spe (%)
Sen (%)
Acc (%)
99.7
95.3
97.8
99.6
95.3
97.7
97.8
95.9
97.0
99.6
95.3
97.7
REP tree
99.4
95.7
97.8
Random forest
96.8
96.0
96.4
JRip
94.7
100
96.9
REP tree
94.7
99.6
96.8
Random forest
96.6
95.1
95.9
JRip
94.2
99.8
96.5
REP tree
94.0
99.9
96.5
Random forest
94.5
93.3
94.0
JRip
94.7
99.6
96.7
REP tree
94.7
99.9
96.8
Random forest
96.7
94.2
95.6
Iterative Filtering-Based Automated Method for Detection ...
51
Table 6 Classification results by using two features Features
Classifiers
Spe (%)
Sen (%)
Acc (%)
CIP6 +COR2
JRip
99.9
99.8
99.8
COR2 +COR6
CIP3 +CIP6
QMIED 3+COR2
QMIED 3+CIP6
Table 7 Performance comparison with other methods
REP tree
99.7
99.9
99.7
Random forest
99.9
99.8
99.8
JRip
99.8
99.6
99.7
REP tree
99.6
99.8
99.7
Random forest
99.8
99.7
99.7
JRip
99.3
99.9
99.5
REP tree
99.3
99.9
99.5
Random forest
99.4
99.6
99.4
JRip
98.7
97.1
98.0
REP tree
98.4
97.1
97.8
Random forest
98.7
98.3
98.5
JRip
100
99.9
99.9
REP tree
99.6
100
99.7
Random forest
99.9
100
99.9
Methods studied
Classifiers
Acc (%)
Method of Doulah & Fattah (2014) [7]
KNN
92.5
TQWT based method (2018) [8]
LS-SVM
95
Method of Sengur and Akbulut (2017) [9]
CNN
96.8
Proposed method
Random forest
99.9
References 1. E.R. Kandel, J.H. Schwartz, Principles of Neural Science (McGraw Hill, Appleton & Lange, 2012) 2. European respiratory journal, ERS publications. https://erj.ersjournals.com/ 3. A.E.H. Emery, Population frequencies of inherited neuromuscular diseases—A world survey. Neuromuscul. Disord. 1(1), 19–29 (1991) 4. M.B.I. Raez, M.S. Hussain, F.Mohd.Yasin, Techniques of EMG signal analysis: detection, processing, classification and applications, 8, 11–35 (2006) 5. R.R. Sharma, P. Chandra, R.B. Pachori, Electromyogram signal analysis using eigenvalue decomposition of the Hankel matrix. In: Advances in Intelligent Systems and Computing, vol 748. (Springer, Singapore, 2019) 6. A. Subasi, M. Yilmaz, H.R. Ozcalik, Classification of EMG signals using wavelet neural network. J. Neurosci. Methods 156(1), 360–367 (2006) 7. A.B.M.S.U. Doulah, S.A. Fattah, Neuromuscular disease classification based on mel frequency cepstrum of motor unit action potential. In: International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT), pp. 1–4 (2014)
52
R. Singh and R. B. Pachori
8. P.U. Kiran, N. Abhiram, S. Taran, V. Bajaj, TQWT based features for classification of ALS and healthy EMG signals. Am. J. Comput. Sci. Inf. Technol. 6(2), 19 (2018). https://doi.org/ 10.21767/2349-3917.100019 9. A. Sengur, Y. Akbulut, Y. Guo, V. Bajaj, Classification of amyotrophic lateral sclerosis disease based on convolutional neural network and reinforcement sample learning algorithm. Health Inf. Sci. Syst. 5(1), 9 (2017). https://doi.org/10.1007/s13755-017-0029-6 10. A. Hazarika, L. Dutta, M. Barthakur, M. Bhuyan, Two-fold feature extraction technique for biomedical signals classification. In: International Conference on Inventive Computation Technologies, vol. 2, pp. 1–4 (2016) 11. E. Stalberg, C. Bischoff, B. Falck, Outliers, a way to detect abnormality in quantitative EMG. Muscle Nerve 17, 392–399 (1994) 12. O. Ulkir, G. Gokmen, E. Kaplanoglu, EMG signal classification using fuzzy logic. Balakan J. Electrcical Comput. Eng. 5(2), 97–101 (2017) 13. E.W. Abel, H. Meng, A. Forster, D. Holder, Singularity characteristics of needle EMG IP signals. IEEE Trans. Biomed. Eng. 53(2), 219–225 (2006) 14. V.K. Mishra, V. Bajaj, A. Kumar, G.K. Singh, Analysis of ALS andnormal EMG signals based on empirical mode decomposition. IET Sci., Meas. Technol. 10(8), 963–971 (2016) 15. N.F. Guler, S. Kocer, Classification of EMG signals using PCA and FFT. J. Med. Syst. 29(3), 241–255 (2005) 16. D. Joshi, A. Tripathi, R. Sharma, R.B. Pachori, Computer aided detection of abnormal EMG signals based on tunable-Q wavelet transform. In: International Conference on Signal Processing and Integrated Networks (2017) 17. R.R. Sharma, M. Kumar, R.B. Pachori, Classification of EMG Signals Using Eigenvalue Decomposition Based Time-Frequency Representation (Biomedical and Clinical Engineering for Healthcare Advancement, IGI Global, 2019) 18. K.C. McGill, Z.C. Lateva, H.R. Marateb, EMGLAB: an interactive EMG decomposition program. J. Neurosci. Methods 149(2), 121–133 (2005) 19. K.C. McGill, Z.C. Lateva, M.E. Johanson, Validation of a computer-aided EMG decomposition method. Proceeding IEEE Eng. Med. Biol. Soc. Conf. 4744–4747 (2004) 20. M. Nikolic, C. Krarup, EMGTools, an adaptive and versatile tool for detailed EMG analysis. IEEE Trans. Biomed. Eng. 58, 2707–2718 (2011) 21. K.C. McGill, Optimal resolution of superimposed action potentials. IEEE Trans. Biomed. Eng. 49, 640–650 (2002) 22. L. Lin, Y. Wang, H. Zhou, Iterative filtering as an alternative algorithm for empirical mode decomposition. Adv. Adapt. Data Anal. 1(4), 543–560 (2009) 23. N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A: Math. Phys. Eng. Sci. 454(1971), 903 (1998) 24. A. Cicone, J. Liu, H. Zhou, Adaptive local iterative filtering for signal decomposition and instantaneous frequency analysis (2014). arXiv:1411.6051 25. R. Sharma, R.B. Pachori, A. Upadhyay, Automatic sleep stages classification based on iterative filtering of electroencephalogram signals. Neural Comput. Appl. (2017) 26. V. Tangkaratt, H. Sasaki, M. Sugiyama, Direct Estimation of the Derivative of Quadratic Mutual Information with Application in Supervised Dimension Reduction (2015), arXiv:1508. 01019v1, Accessed 5 Aug 2015 27. J.C. Principe, D. Xu, Q. Zhao, J.W. Fisher, Learning from examples with information theoretic criteria. VLSI Signal Processing 26(1–2), 61–77 (2000) 28. D. xu, Energy, entropy and information potential for neural computation 31–33 (1999) 29. J.W. Xu, A.R.C. Paiva, I. Park, J.C. Principe, A reproducing kernel Hilbert space framework for information-theoretic learning. IEEE Trans. Signal Process. 56(12), 5891–5902 (2008) 30. H. Tang, H. Li, Information theoretic learning: Renyi’s entropy and kernel perspectives. IEEE Comput. Intell. Mag. 6(3), 60–62 (2011) 31. A. Gunduz, J.C. Principe, Correntropy as a novel measure for nonlinearity tests. Signal Process. 89(1), 14–23 (2009)
Iterative Filtering-Based Automated Method for Detection ...
53
32. W. Liu, P.P. Pokharel, J.C. Principe, Correntropy: properties and applications in non-Gaussian signal processing. IEEE Trans. Signal Process. 55(11), 5286–5298 (2007) 33. F.J. Rudolf, W.J. William, Statistical Methods (Academic Press, San Diego, CA, USA, 1993) 34. P.E. McKight, J. Najab, Kruskal-Wallis test. Corsini Encycl. Psychol. (2010) 35. T.P. Hettmansperger, Statistical Methods Based on Ranks (Wiley, New York, 1984) 36. E. Ostertagova, O. Ostertag, J. Kovac, Methodology and application of the Kruskal-Wallis test. Appl. Mech. Mater. 611, 115–120 (2014) 37. J.S. Maritz, Distribution-Free Statistical Methods (CRC Press, Chapman and Hall Mathematics Series, 1995) 38. C. Siegel, Castellan, Nonparametric Statistics for the Behavioral Sciences, 2nd edn. (McGrawHill, New York, 1988). ISBN 0070573573 39. G.W. Corder, D.I. Foreman, Nonparametric Statistics for Non-Statisticians (Wiley, 2009), pp. 99–105. ISBN 9780470454619 40. W.W. Cohen, Fast effective rule induction. In: 12th International Conference on Machine Learning (1995), pp. 115–123 41. V. Parsania, N.N. Jani, V. Bhalodiya, Applying Naïve bayes, BayesNet, PART, JRip and OneR Algorithms on Hypothyroid Database for Comparative Analysis, IJDI-ERET, 3 (2014) 42. R. Anil, R.P. Aharwal, D. Meghna, S.P. Saxena, R. Manmohan, J48 and JRIP rules for EGovernance data. Int. J. Comput. Sci. Secur. (IJCSS), 5(2) (2011) 43. I.H. Witten, E. Frank, Data mining: practical machine learning tools and techniques-2nd edn., The United States of America, Morgan Kaufmann series in data management systems (2005) 44. B. Srinivasan, P. Mekala, Mining social networking data for classification using REPTree. Int. J. Adv. Res. Comput. Sci. Manag. Stud. 2, 155–160 (2014) 45. M. Pal, Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26, 217–222 (2005) 46. L. Breiman, Random Forests-Random Features, Technical Report 567 (University of California, Berkeley, Statistics Department, 1999) 47. T.K. Ho, Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition (Montreal, QC, 1995), pp. 278–282 48. T. Shi, S. Horvath, Unsupervised learning with random forest predictors. J. Comput. Graph. Stat. 15, 118–138 (2006) 49. A. Baratloo, M. Hosseini, A. Negida, G.E. Ashal, Part 1: Simple definition and calculation of accuracy, sensitivity and specificity. Emergency 3(2), 48–49 (2015) 50. W. Zhu, N. Zeng, N. Wang, Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations (North-East SAS Users Group, Health Care and Life Sciences, 2010)
A Study of Remote Monitoring Methods for Solar Energy System Gurcharan Singh and Amit Kumar Manocha
Abstract In this paper, we have discussed various techniques and methods of remote monitoring. The aim of this paper is to update the readers about the latest techniques of remote monitoring in solar systems and their advantages and disadvantages. In order to prevent the damage of photovoltaic cells used in solar energy generation and its forecasting, it is essential to continuously monitor the conditions of PV panels through mounted sensors. We also have discussed how developed algorithms for remote monitoring of solar energy systems can increase the overall efficiency of the solar energy systems to overcome the critical issue of the energy sector. Keywords Photovoltaic (PV) cell · Maximum power point tracking (MPPT) · Supervisory control and data acquisition (SCADA) · Cloud computing · Internet of things (IOT)
1 Introduction The population of the world is continuously increasing every day, it was 3.2 billion in 1962 and it becomes 7.7 billion in 2019 and is forecasted to grow up to 10 billion in 2050 [1]. Hence, the standard of living, growing demand for energy and water, food is continuously growing pressure on environment, also supplies of oil, gas, and coal are forecasted to be depleting soon. At same time, fear of changing climate is putting stress on the sector of energy to move away from carbon burning to environmentfriendly green energy options [2]. In India, we are currently generating approximately 1,037,185 GW-h electrical energy using fossil fuels in abundance, which is very dangerous for environment. The yearly gross generation of energy by fossil fuels in G. Singh (B) Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India e-mail: [email protected] A. K. Manocha PIT, GTB Garh Moga, Moga, Punjab, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 S. Jain and S. Paul (eds.), Recent Trends in Image and Signal Processing in Computer Vision, Advances in Intelligent Systems and Computing 1124, https://doi.org/10.1007/978-981-15-2740-1_4
55
56
G. Singh and A. K. Manocha
Table 1 Yearly electricity generation (GW-h) by fossil fuels in India
Year
Coal
Oil
Gas
2017–18
986,591
386
50,208
2016–17
944,850
262
49,100
2015–16
896,260
406
47,122
2014–15
835,838
1,407
41,075
Table 2 Yearly electricity generation (GW-h) by renewable energy sources in India Year
Hydro
Solar
Wind
Biomass
Others
2018–19
135,040
39,268
62,036
16,325
425
2017–18
126,134
25,871
52,666
15,252
358
2016–17
122,313
12,086
46,011
14,159
213
2015–16
121,377
7,450
28,604
16,681
269
2014–15
129,244
4,600
28,214
14,944
414
India [3] is given in Table 1. Sources of green energy have now gradually becoming a vital part of production of energy since the reserves of fossil fuel are getting very close to exhaustion [2]. Though the resources of renewable energy are increasing in numerous aspects, the problems associated with them, such as their nature of irregularity and growing capital cost, are the main hurdles to their application. Among the renewable resources of energy, it is considered that solar energy is the most appropriate option as solar energy is free from pollution and also spread all over the earth free of cost. Table 2 shows yearly electricity generation by renewable energy sources in India [3]. Hence, today’s need is to enhance the reliability and availability and power quality of the renewable energy scheme to overcome the abovementioned issues [4].
1.1 Solar Energy Conversion into Electrical Energy A PV cell is used for conversion of solar energy into electrical energy shown in Fig. 1. A PV cell is a device which converts the light energy directly into electricity by means of the photovoltaic effect. It is a form of photoelectric cell, which is the device whose characteristics, like resistance, voltage or current, vary at the time of exposal to sunlight [2]. A number of solar cells are combined to form a module, or else known as PV panels. Basically, a sole junction silicon PV cell can produce a maximum voltage of about 0.5 to 0.6 V. A solar cell can be described as being a photovoltaic, regardless of whether the source is an artificial light or sun’s light [4]. They can be used as photodetector, perceiving light or electromagnetic radiations to a visible range, measuring the intensity of light.
A Study of Remote Monitoring Methods for Solar Energy …
57
Fig. 1 Photovoltaic cell
1.2 Maximum Power Point Tracking (MPPT) It is an algorithm which is commonly used with PV systems to maximize extraction of power in all the conditions. Solar cell has a relationship between total resistance and temperature which produces nonlinear output efficiency that can be analyzed based on “I–V” curve. And also the purpose of MPPT system is to sample output of the PV cells and apply the proper load (resistance) to get maximum power under any conditions of environment. MPPT device is characteristically integrated into a converter of electric power that provides current or voltage conversion, regulation, and filtering for driving several loads, including motors, power grids, or batteries [4]. There are numerous kinds of MPPT, namely, Incremental conductance, Perturb and observe, Constant voltage, Current sweep and Temperature Technique, etc. [4].
1.3 Generation Forecasting Generation forecasting of solar energy includes the atmospheric conditions, the knowledge of sun´s path, the processes of scattering and the characteristics of plant of solar energy that utilizes the energy of the sun to generate solar power. A solar PV system transforms sun’s energy into electric power. The output power depends on the incoming solar radiations on panel. It is essential to forecast information for an efficient use, management of the grid of electricity, and for trading of solar energy [5].
58
G. Singh and A. K. Manocha
1.4 Technologies and Theories Previously Implemented Many researchers around the world have contributed to the progress of renewable energy and remote monitoring, their contribution has been discussed in this section. Xiaocong and Qiang (2007) have described a design of power converter, it is able to regulate the corresponding input “resistance” and functioning point to value, that value is equal to optimum value of the PV cell. This novel model uses closed-loop control system artistically and also carries in numerous methods, such as observation, disturbance, correcting voltage to get maximum power. The scheme also complements the algorithm at a level [6]. Kolhe (2009) has done an analytical study to attain an optimal association between the capacity of battery and the PV array to supply energy at an uncovered energy load section. This helps to utilize the sun’s energy to maximum instant. And also it improves the reliability of solar cell array. It is novel analysis done by the author of this paper [7]. Datta and Karakoti (2010) have defined that the area of earth between the latitude 40° N and 40° S is generally recognized as the belt of solar energy and it is supposed that this region to be with abundant of solar radiation. Rajasthan gets solar radiation in the range of 5.7 to 5.9 kWh/m and some areas of New Delhi, Gujarat receive global solar radiation in between 5.4 and 5.7 kWh/m solar radiation. This study finds that the mentioned areas of northwest India have higher solar radiations and are idyllically suitable for collecting solar energy [8]. Yoo et al. (2012) have examined a grid-connected domestic system of PV array, power of grid under the critical uttermost pricing and a battery. The PV produces solar power as the primary source and the battery stores extra solar power and delivers charged power when required. To ensure reliability of energy delivery, system is connected with grid. Effective scheme of power management containing load estimating grounded on Kalman filter has established. By means of forecasting model with the scheme of management, the projected system governs the components of system and use of the electricity economize by taking the CPP (critical peak pricing) of power of grid into account [9]. Jabalameli and Masoum (2013) have executed an energy management scheme (Battery Storage-EMS) for a rooftop PV system to support domestic loads, recompense influences of the solar dissimilarities and moving clouds while regulating the voltage at the PCC (Point Of Common Coupling) and provided grid with a continuous power output in daylight. The BS-EMS performance is examined through thorough simulations of rooftop PV, numerous grid, and atmosphere situations [10]. Anil et al. (2013) have described that the rising demand of electrical energy and the environmental problems like as pollution and effect of global warming, solar energy is one of the best choices for generating neat and clean energy. A PI controller is used to achieve the maximum power point for a PV system [11]. Murdoch and Reynoso (2013) have designed MPPT tracking system for an Air Vehicle, the design employing the boost converter. The MPPT technique could be improved which could improve the productivity of solar panel. They get maximum
A Study of Remote Monitoring Methods for Solar Energy …
59
solar energy by using effective optimization algorithm. It provides more flexible system such as transmission of data from the panel to host computer and also broadcasting information to the relevant stake owners excluding any geographical barrier. The MPPT is improved which improves the productivity of solar panel [12]. Ke (2013) has implemented solar system based on remote monitoring of parameters of industries by means of IOT to measure several parameters, for instance, current, temperature, pressure, speed, gas, and the voltage from the sensors. The outputs of these sensors have given to IoT module via PIC controllers where users are able to observe the parameters that were measured and likewise users are able to control devices by giving input to the PIC controller by the means of IOT module with the use of cayenne software. Huge amount of obtained data by means of radio frequency-ID, automatic control, information sensing system, and wireless communication of IOT is handled with an “agricultural information cloud,” truly employed smart agriculture [13]. Tirupathamma et al. (2014) have determined that characteristics of the PV cell and a grid-connected hysteresis current-control solar PV system has been established. The result of the PV system provides current, inverter track reference current of solar PV system, and delivers to the grid utility. It also minimizes the THD (Total Harmonic Distortion) with the use of hysteresis current-control technique [14]. Deveci and Kasnakoglu (2015) have examined a grid-connected domestic system of PV array, power of grid under the critical uttermost pricing and a battery. The PV produces solar power as the primary source and the battery stores extra solar power and delivers charged power when required. To ensure reliability of energy delivery, the system is connected with grid. Effective scheme of power management containing load estimating grounded on remote monitoring has established. The values attained are useful in forecasting the parameters considered. The stored data in cloud can be analyzed with the help of the MATLAB [2]. Vidhya (2015) discussed the modeling of photovoltaic with storage devices for management of energy in the system like lead–acid battery. They get maximum solar energy by using remote monitoring and an intelligent algorithm for MPPT for a PV system. The productivity of solar panel enhanced up to maximum by using the remote monitoring technique [4]. Parikh et al. (2015) have described a design of power converter, it is able to regulate the corresponding input “resistance” and functioning point to value, that value is equal to optimum value of the PV cell. This novel model uses closed-loop control system artistically and also carries in numerous methods, such as observation, disturbance, correcting voltage to get maximum power. This scheme also supplementary changes the algorithm to a level [15]. Vignesh and Samydurai (2016) have established a data logger based on hardware and software has been modeled, programmed, made, and mounted as the tentative model on various sites. The combination of the IoT with solar systems permits the monitoring of standalone photovoltaic (PV) systems from remote place. It enhances the maintenance and performance of the system [16]. Lee et al. (2016) have examined a grid-connected domestic system of PV array, power of grid under the critical uttermost pricing and a battery. The PV produces
60
G. Singh and A. K. Manocha
solar power as the primary source and the battery stores extra solar power and delivers charged power when required. To ensure reliability of energy delivery, system is connected with grid. Effective scheme of power management containing load estimating grounded on Kalman filter has established. By means of forecasting model with the scheme of management, the projected system governs the components of system and use of the electricity economize by taking the CPP (critical peak pricing) of power of grid into account [17]. Kurundkar et al. (2017) have designed the software and hardware design for solar monitoring system in remote area. It is solar system based on remote monitoring of parameters of industries by means of IOT to measure several parameters, for instance, current, temperature, pressure, speed, gas, and the voltage from the sensors. The outputs of these sensors have given to the IOT module via PIC controllers where users are able to observe the parameters that were measured and likewise users are able to control devices by giving input to the PIC controller by the means of IOT module with the use of cayenne software [18]. Vignesh and Samydurai (2017) have proposed an automatic IOT-based monitoring system for solar power that allows automatic monitoring for solar system from anyplace in excess of the Internet. AT mega controller is used to monitor the parameters of solar panel. This system continuously monitors solar system and transfers the output of solar system to IOT system in excess of the Internet. Here IoT is used to talk on the parameters of solar power system via internet to the server [19]. Patila et al. (2017) have examined a grid-connected domestic system of PV array, power of grid under the critical uttermost pricing and a battery. The PV produces solar power as the primary source and the battery stores extra solar power and delivers charged power when required. To ensure reliability of energy delivery, system is connected with grid. Effective scheme of power management containing load estimating grounded on remote monitoring has established. The values attained are useful in forecasting the parameters considered. The stored data in cloud can be analyzed with the help of the MATLAB [20]. Li et al. (2018) have done the solar energy assessment using remote sensing technologies. They get maximum solar energy by using effective optimization algorithm have implemented the solar power systems web monitoring, this web-assisted software provides more flexible system such as transmission of data from the panel to host computer and also broadcasting information to the relevant stake owners excluding any geographical barrier [21]. Madhubala et al. (2018) have implemented solar system based on remote monitoring of parameters of industries by means of IOT to measure several parameters, for instance, current, temperature, pressure, speed, gas, and the voltage from the sensors. The outputs of these sensors have given to the IOT module via PIC controllers where users are able to observe the parameters that were measured and likewise users are able to control devices by giving input to the PIC controller by the means of IOT module with the use of cayenne software [22]. Katyarmal et al. (2018) have proposed an automatic IOT-based monitoring system for solar power that allows automatic monitoring for solar system from anyplace in excess of the Internet. AT mega controller is used to monitor the parameters of solar
A Study of Remote Monitoring Methods for Solar Energy …
61
panel. This system uninterruptedly monitors solar system and transfers the output of solar system to IOT system in excess of the Internet. Here IOT is used thing speak to pass on the parameters of solar power system via internet to IOT thing speak server. It displays the parameters to users using the GUI and alerts user as the output falls under particular limits [23]. Vargas et al. (2019) have established a data logger based on hardware and software has been modeled, programmed, mounted, and build as the experimental model on various sites. The combination of “Internet of Things” with solar systems permits the monitoring of standalone photovoltaic (PV) systems at remote place. It enhances the maintenance and the performance of the system [24].
2 Methodology of Remote Monitoring The need for electricity is rising day by day. Owing to deficit in fossil fuels and the environmental harms caused by customary power generation, green energy (renewable energy) becomes very demanding and popular. The sources of renewable energy offer the benefit of being free from pollution. It is considered that solar energy is the most appropriate option as it is free from pollution and also circulated all over the earth totally free. There is a necessity for conditioning of the output of sources of the solar energy which improves its power quality, performance, and productivity of PV panel. For this, we need a remote monitoring and optimal load forecasting of the solar system. However, there are many remote monitoring techniques exist already, but there is a need to do generation forecasting and optimal load scheduling. The diagram of remote monitoring scheme is displayed in Fig. 2. Fig. 2 Diagram of remote monitoring scheme
62
G. Singh and A. K. Manocha
2.1 Methods of Remote Monitoring 2.1.1
Physical/Wired Monitoring
The technique of automatic transmission and measurement of data by the means of wires from remote sender to receiving places, for the purpose of analysis and recording [25].
2.1.2
Wireless Monitoring
The technique of automatic transmission and measurement of data by the means of wire free from remote sender to receiving places, for the purpose of analysis and recording [25].
2.1.3
SCADA Monitoring
SCADA (Supervisory control and data acquisition) is a scheme of hardware components and software which permits industrial groups to control the process of industry locally or at a location at remote place. Observer, collect, and real-time process of data [25].
2.1.4
Monitoring Using Cloud Computing
It is the availability of the resources of computer system on demand, particularly storage of data and computing without any direct active management by user. The word “cloud computing” is in general refers to the availability of data centers over the Internet to many users. Big clouds are predominant nowadays and have functions spread over various locations from central servers. Cloud computing is used for real-time monitoring and controlling [25].
2.1.5
Monitoring Using IOT
The term “IOT” (Internet of things) is the extension of the connectivity of the physical devices into Internet. Implanted with Internet connectivity, electronic devices, and other hardware components, these devices are able to interact and communicate with each other through Internet, and also remotely monitored and controlled [18].
A Study of Remote Monitoring Methods for Solar Energy …
63
3 Discussion and Comparisons Deveci and Kasnakoglu [2] have designed, PV panel in MATLAB simulation, DC to DC converters, MPPT, battery, proportional integral and derivative (PID) controllers, and a set of codes for the constant output of the PV system. They could get maximum solar energy by using remote monitoring with this system. Jabalameli et al. [10] have employed a BS-EMS for rooftop PV system to support domestic loads. They could get maximum solar energy by using an effective MPPT technique. Vidhya [4] discussed the modeling of photovoltaic with storage devices for management of energy in the system like lead–acid battery. They could get maximum solar energy by using remote monitoring. Anil et al. [11] have modeled PI controllerbased intelligent algorithm for MPPT for a PV system. The productivity of solar panel could be enhanced up to maximum by using the remote monitoring technique. Villalva et al. [26] have done a simulation modeling of photovoltaic arrays to obtain the values of the parameters of nonlinear I–V equation by modifying the curve on the three points: open circuit, short circuit, and the maximum power. An effective optimization technique could be used to improve productivity. Kurundkar et al. [18] have designed the hardware and software model for monitoring of solar inverter system at remote place. The system is equipped with current sensor, voltage sensor, and for data transmission a Wi-Fi module. The acquired data is exhibited on the IOT platform (Cayenne). They could get maximum solar energy by using the appropriate tilt angle and better algorithm of MPPT. Li et al. [21] have projected an algorithm of OD-PSO MPPT for a PV power system. This algorithm is able to quickly find any small region contained GMPP devoid of thorough information about PV array, the MPPT speed could be enhanced which could result in better productivity of solar panel. Murdoch and Reynoso [12] have designed MPPT tracking system for an Air Vehicle, the design employing the boost converter. The MPPT technique could be improved which could improve the productivity of solar panel. Hammera et al. [27] have done the solar energy assessment using remote sensing technologies. They could get maximum solar energy by using effective optimization algorithm. Kumar [28] have implemented the solar power systems web monitoring, this web-assisted software provides more flexible system such as transmission of data from the panel to host computer and also broadcasting information to the relevant stake owners excluding any geographical barrier. The MPPT could be improved which could improve the productivity of solar panel. Vignesh and Samydurai [16] have implemented an IOT-based solar system. The monitoring could be improved by using better MPPT algorithm and effective optimization technique. Madhubala et al. [22] have implemented solar system based on remote monitoring of parameters of industries by means of IOT to measure several parameters, for instance, current, temperature, pressure, speed, gas, and the voltage from the sensors. The outputs of these sensors have given to the IOT module via PIC controllers where users are able to observe the parameters that were measured and likewise users are able to control devices by giving input to the PIC controller by the
64
G. Singh and A. K. Manocha
means of IOT module with the use of Cayenne software. They could get maximum solar energy by using appropriate MPPT algorithm. Parikh et al. [15] have done the deep study to access a solar panel system at remote place with the use of WSN. They could get maximum solar energy by using effective optimization algorithm.
4 Conclusions and Future Scope In our study, the latest developments in area of remote monitoring using signal processing belong to the solar energy technologies have been studied. Signal processing of signal is necessary for remote monitoring because the conditioning makes a signal noise-free and signal process removes unwanted signals. The solar energy is available in abundance on the earth, which makes solar energy an attractive substitute to the conventional energy system (fuels) for numerous applications. But the inappropriateness in demand and supply, and in generation impending variance make it essential for the integration with the remote monitoring. These systems revealed improved overall performance when joined with the remote monitoring using signal processing, improving the system efficiency, and reliability. More, studies of remote monitoring and the effects on solar energy system imposed by them and exegetical efficiency is also recommended. The enhancement of performance of the solar systems can be further explored by carrying out intensive research on the remote monitoring. This in order would help to a bigger level and the scope for implementation of the remote monitoring for the real-time forecasting and optimal load scheduling.
References 1. World population. https://en.wikipedia.org/wiki/World_population 2. O. Deveci, C. Kasnakoglu, Control of a photovoltaic system operating at maximum power point and constant output voltage under different atmospheric conditions. Int. J. Comput. Electr. Eng. 7(4), 240–247 (2015) 3. Energy policy of India. https://en.wikipedia.org/wiki/Energy_policy_of_India 4. Vidhya (2015) Energy storage management in grid connected solar photovoltaic system. Int. J. Eng. Res. Appl. 5(4), 1–5 5. M.N. Akhter et al., Review on forecasting of photovoltaic power generation based on machine learning and metaheuristic techniques. IET Renew. Power Gener. 13(7), 1009–1023 (2019) 6. W. Xiaocong, S. Qiang, The Research on Mppt Intelligent Solar Charging System. ISES Solar World Congress. 4, 172–1574 (2007) 7. M. Kolhe, Techno-economic optimum sizing of a stand-alone solar photovoltaic system. IEEE Trans. Energy Convers. 24(2), 511–519 (2009) 8. A. Datta, I. Karakoti, Solar resource assessment using GIS and remote sensing techniques, in 11th ESRI India User Conference (2010), pp. 1–20 9. J. Yoo et al., Look-ahead energy management of a grid-connected residential PV system with energy storage under time-based rate programs. Energies 5, 1116–1134 (2012)
A Study of Remote Monitoring Methods for Solar Energy …
65
10. N. Jabalameli, M.A.S. Masoum, Battery storage unit for residential rooftop PV system to compensate impacts of solar variations. Electr. Electron. Eng. Int. J. (ELELIJ). 2(4), 55–70 (2013) 11. G. Anil et al., PI Controller based MPPT for a PV System. IOSR J. Electr. Electron. Eng. (IOSR-JEEE). 6(5), 10–15 (2013) 12. C.S. Murdoch, S.N. Reynoso, Design and implementation of a MPPT circuit for a solar UAV. IEEE Latin Am. Trans. 11(1), 108–111 (2013) 13. F.T. Ke, Smart agriculture based on cloud computing and IOT. J. Convergence Inf. Technol. (JCIT). 8(2), 1–7 (2013) 14. N. Lakshmi Tirupathamma et al., Matlab simulation of grid connected PV system using hysteresis current control inverter. Int. J. Res. Stud. Comput. Sci. Eng. (IJRSCSE). 1(5), 13–20 (2014) 15. A. Parikh et al., Solar panel condition monitoring system based on wireless sensor network. Int. J. Sci. Eng. Technol. Res. (IJSETR). 4(12), 4320–4324 (2015) 16. R. Vignesh, A. Samydurai, A survey on IoT system for monitoring solar panel. IJSDR 1(11), 114–115 (2016) 17. Y.-T. Lee et al., An integrated cloud-based smart home management system with community hierarchy. IEEE Trans. Consum. Electron. 62(1), 1–9 (2016) 18. S. Kurundkar et al., Remote monitoring of solar inverter (an application of IOT). Am. J. Eng. Res. (AJER). 6(7), 70–74 (2017) 19. R. Vignesh, A. Samydurai, Automatic monitoring and lifetime detection of solar panels using internet of things. Int. J. Innov. Res. Comput. Commun. Eng. 5(4), 7014–7020 (2017) 20. S. Patila et al., Solar energy monitoring system using Iot. Indian J. Sci. Res. 15(2), 149–155 (2017) 21. H. Li et al., An overall distribution particle swarm optimization MPPT algorithm for photovoltaic system under partial shading. IEEE Trans. Ind. Electron. (2018) 22. S. Madhubala et al., Solar POWER based remote monitoring and control of industrial parameters using IoT. Int. Res. J. Eng. Technol. (IRJET). 5(3), 3231–3236 (2018) 23. M. Katyarmal et al., Solar power monitoring system using IoT. Int. Res. J. Eng. Technol. (IRJET). 5(3), 3431–3432 (2018) 24. A. López-Vargas et al., IoT application for real-time monitoring of Solar Home Systems based on Arduino with 3G connectivity. IEEE Sens. J. 19(2), 679–691 (2019) 25. Telemetry. https://en.wikipedia.org/wiki/Telemetry 26. M.G. Villalva et al., Comprehensive approach to modeling and simulation of photovoltaic arrays. IEEE Trans. Power Electron. 24(5), 1198–1208 (2009) 27. A. Hammera et al., Solar energy assessment using remote sensing technologies. Remote Sens. Environ. 86, 423–432 (2003) 28. B.A. Kumar, Solar power systems web monitoring. Renew. Energy Technol. (SoRET) (2011)
An Automatic Thermal and Visible Image Registration Using a Calibration Rig Lalit Maurya, Prasant Mahapatra, Deepak Chawla and Sanjeev Verma
Abstract Registration of thermal and visible image is a prominent prerequisite for the various medical and industrial image processing. Due to the different imaging principles, the contrast variation and texture are different in the thermal and visible image. In such cases, the automatic registration of thermal and visible image is a crucial step. In this paper, an automatic calibration rig-based registration algorithm is proposed. The calibration rig is used to extract the correspondence pairs in both images. The proposed algorithm finds the corner and centroid of the square in calibration rig which has the same position in world coordinates and same characteristic in thermal and visible image. Experimental results show that the proposed approach is good for the automatic registration without any human intervention. Keywords Image registration · Thermal image · Visible image · Calibration rig
1 Introduction Multi-sensor images such as infrared and visible image have a significant role in advancement of the machine perception, knowledge, target detection, and recognition [1]. It has various applications in the field of medical imaging, pattern reorganization, target detection, remote sensing, as well as modern military [2]. Infrared imaging, also denominated as thermal imaging is the visual representation of the amount of infrared energy emitted by an object. It is a new additional attribute or dimension L. Maurya · P. Mahapatra (B) Academy of Scientific and Innovative Research (AcSIR), 201002 Ghaziabad, India e-mail: [email protected]; [email protected] L. Maurya e-mail: [email protected] L. Maurya · P. Mahapatra · S. Verma CSIR-Central Scientific Instruments Organisation (CSIR-CSIO), 160030 Chandigarh, India D. Chawla Government Medical College and Hospital (GMCH), 160030 Chandigarh, India © Springer Nature Singapore Pte Ltd. 2020 S. Jain and S. Paul (eds.), Recent Trends in Image and Signal Processing in Computer Vision, Advances in Intelligent Systems and Computing 1124, https://doi.org/10.1007/978-981-15-2740-1_5
67
68
L. Maurya et al.
of knowledge in imaging and that knowledge is about temperature distribution. An infrared camera measures this temperature profile and generates a black and white or color palette-based thermal image. As, abnormal thermal pattern can be easily recognized by the thermal image of the body, it can be effectively used in the early detection and diagnosis. The combining of visible and thermal data may scientifically improve the region of interest (ROI) selection in medical diagnosis. Since, the two images are from different modalities, there is a problem of alignment of images. Thermal image has less texture and has uniform intensity where the temperature is same. On the one hand, the visible image has the rich texture with high resolution. Image registration is the process of aligning two or more images of the common scene or object. These images can be obtained by different sensors or the same sensor with a different time, distance, and angle viewpoint. The aim of the registration is to align the images so that the information content of both images improves for computer vision, and also enables to compare the common feature in images. It can be classified in several ways, broadly as intensity-based and feature-based method [3, 4]. In intensity-based method a metric has defined like sum of square difference, cross correlation, and mutual information, which evaluate the displacement of two correspondence pixels from different images [5, 6]. In such case, the registration problem works as to the optimization problem to minimize (or maximize) the metric such that the two images are aligned to each other. In feature-based method the feature points based on corner, line, region, and contour are extracted. The corresponding feature points in two modalities are matched by the feature matching method. After that, the transformation matrix generated by matched point, registered the target image into the coordinate of reference image [7–9]. This match feature points called control points and obtained either manually or automatically. The pixel-based method is computationally costly while the control points-based method is efficient, robust, and fast. Once the accurate match points are obtained the pixel-by-pixel alignment of thermal and visible image can be performed. It is possible to determine these points using a simple calibration rig which has known characteristics. Most of the feature extraction methods like Harris [10], SIFT [8], SURF [11], etc., used in image registration are intensity and gradient based, which are sensitive to illumination and contrast variation. The different gradient direction in visible and thermal image pair causes false description of image subregion. Several methods have been reported in order to improve this problem of multispectral image registration. In [12], authors developed an edged oriented histogram (EOH) for feature description by calculating edges in four directions and no direction. Mouats et al. [13] introduced phase congruency with edge histogram-based descriptor. Further, Ye et al. [14] proposed a histogram of orientated phase congruency (HOPC) descriptor for optical-to-SAR multimodal image matching. However, these methods are false matched in case of large differences in magnification and field of view (FOV) in thermal-visible image pair. In recent years, the researchers focused on the use of thermal imaging for medical diagnosis. The facial thermal imaging has been studied for emotion detection, fever detection, eye localization, and respiration detection. For respiration rate detection,
An Automatic Thermal and Visible Image …
69
the temperature variation of nostril area is recorded over time. However, the automatic extraction of the nostril area in thermal image is very challenging because the thermal image has a temperature-dependent texture leading to obscure detailing of ROI. The aim of this work is to develop an automatic alignment of thermal and visible images for the application of locating eye and nostril area in the thermal image. Large-scale difference occurs due to significant changes of focal length from thermal to visible camera in the particular application. Therefore, a rig-based approach is proposed to select the control points on the basis of common characteristic in both thermal and visible images.
2 Proposed Method To align the visible and thermal image, the control points are first extracted from both visible and thermal images. These control points have similar characteristics in the real world and easy to detect in both thermal and visible images. Since, the thermal and visible image has different texture it is difficult to find the common control points in a common scene for registration. In this work, an easy-to-make calibration rig has been used for detecting control points in both thermal and visible image pairs. The image registration can be summarized mainly by the following three steps: 1. Extraction of control points in both moving and fixed image and determine the correspondence between them. 2. Determination of geometrical transformation model that maps the moving image geometrical view into the reference image view. 3. The warping of moving image by the means of transformation function. The block diagram of image registration as proposed in this paper is shown in Fig. 1. After the acquisition of both thermal and visible images an enhancement is performed to improve the sharpness of edges and signal to noise ratio. The multiscale adaptive smoothing-based unsharp masking (MAS-UM) method has been used as proposed in [15]. In MAS-UM, at three level adaptive filtering is performed to boost the higher frequencies in image pair. The proposed approach used a rig to extract the control points. The rig or board is easy–to-make and a bad conductor of heat, so that its visual representation is fixed in thermal and visible images. The grid pattern is regular size square cuts out from a thin material. After the enhancement, the calibration square search algorithm has been performed. The algorithm consists of the following process: 1. 2. 3. 4. 5.
The binary image is generated from enhanced image Regions detection and their features calculation Filtering the desired calibration square feature Control points detection from calibration square Image alignment with affine geometric transformation.
Fig. 1 The block diagram of proposed methodology
70 L. Maurya et al.
An Automatic Thermal and Visible Image …
71
The first step of the proposed method is the binary or logical image conversion of both thermal and visible images simultaneously. The Otsu method [16] is used to find the threshold points in the image. It is possible that there are small artifacts in images. A morphological operation for removing small object has been performed. In the proposed approach, all the connected components below the 100 pixels are removed from the images. The remaining regions are labeled to differentiate each other. Several features like area, perimeter, centroid, major axis, and minor axis length have been calculated for each label region in images. These features are used to designed the filtering criteria for desired calibration square detection in images. In this work, the two criteria are defined for filtering, and that criteria are defined by the characteristic of square. The area and perimeter of a square, which side is defined by a, are given as follows: ar ea = a 2
(1)
perimeter = 4a
(2)
√ 1 ar ea = × perimeter 4
(3)
From Eqs. 1 and 2,
From Eq. 3 the first criterion is defined as sC =
√ 1 ar ea/ × perimeter 4
(4)
For the perfect square the value of sC is equal to 1. However, it is possible that the shape is not a perfect square in the image due to the pixel error. The second criterion is defined by the absolute difference in major axis and minor axis of detected shape. This criterion is used to filter the square from rectangle shape. When both criteria are true the required calibration squared is filtered out. The outer corner of the square and the centroid of the square are used as the control points in image pairs. Since, they have the same position in world coordinate and have the same characteristics in both thermal and visible images, can be used to register the visible image in the spatial location of thermal image. The characteristic of detected shape in binary images is calculated by the MATLAB image processing tool. After a set of corresponding point pairs between thermal and visible image, a 3 × 3 geometrical transformation matrix could be determined. In our work, the acquisition system, the visible camera and thermal camera, are fixed in a parallel. The most appropriate homography model the affine transformation has been used which preserves the parallelism. The affine transformation can exactly measure the geometrical discrepancy between the image pairs if the scene is parallel. It is defined by six degrees of freedom to perform translation, scaling , rotation, and shearing.
72
L. Maurya et al.
Suppose the (x, y) is the coordinate of fixed image and (x’, y’) is a corresponding point in moved image then affine transformation can be expressed as follows: ⎡
⎤ ⎡ ⎤ ⎡ ⎤ x a11 a12 tx x ⎣ y ⎦ = ⎣ a21 a22 t y ⎦ · ⎣ y ⎦ 1 0 0 1 1
(5)
For the affine parameter calculation minimum three control point pairs are required. The minimum three or more corresponding points are collated as follows: ⎛
x1 ⎜0 ⎜ ⎜ ⎜ x2 ⎜ ⎜0 ⎜ ⎝ x3 0
y1 0 y2 0 y3 0
1 0 1 0 1 0
0 x1 0 x2 0 x3
0 y1 0 y2 0 y3
⎞⎛ a11 0 ⎜ 1⎟ ⎟⎜ a12 ⎟⎜ 0 ⎟⎜ t x ⎟⎜ 1 ⎟⎜ a21 ⎟⎜ 0 ⎠⎝ a22 1 ty
⎞
⎛
⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟=⎜ ⎟ ⎜ ⎟ ⎜ ⎠ ⎝
x1 y1 x2 y2 x3 y3
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
(6)
where the image coordinate of moving image can be found using the following equation: xi = a11 xi + a12 yi + tx
(7)
yi = a21 xi + a22 yi + t y
(8)
The registered image from the moving image is determined by the coordinate mapping. The interpolation takes place in moving image so that neither holes nor overlaps can occur in a registered image and the registered image has the same spatial coordinate as fixed image. The nearest neighbor interpolation, bilinear and bicubic interpolation and spline interpolation are mainly common techniques for interpolation. Here, the experimental simulation adopts the bilinear interpolation.
3 Experiment and Results The proposed registration of thermal and visible image has been implemented in MATLAB. The experimental simulation has performed on Intel core i7 processor of 3.40 GHz frequency with 8 GB RAM memory desktop. The FLIR E-60 (resolution 360 × 240 pixels) and a Logitech C-170 webcam (resolution: 1024 × 768 pixels) have been used to capture the thermal and visible image, respectively. The thermal sensitivity of thermal camera is less than 0.05 °C. The default White Hot mode is selected as a color palette for thermal image. Both cameras are placed in a tripod (Osaka VCT 880) in such a way that the webcam is placed at the top of thermal image in a parallel way. A 3D printed holder is used to fix the webcam with the thermal
An Automatic Thermal and Visible Image …
73
Fig. 2 The camera setup
camera. The camera setup is shown in Fig. 2. Two calibration rigs have been used during the experiments. One is made by simply cutting the hard file paper with a knife, which is simple-to-make. The square size is 5 × 5 cm. Another one is made of a material of Polylactic acid (PLA) by 3D printer. The square size of this rig is 3 × 3 cm. The rigs used in the experiment are shown in Fig. 3. The performance of the proposed method is evaluated using several pairs of thermal and visible image pairs. In Fig. 4, the thermal image, visible image, and the
(a) Calibration Rig made of hard file paper (Square size: 5cm x 5cm)
(b) Calibration rig designed by 3D Printing (Square size: 3cm x 3cm)
Fig. 3 The calibration rigs designed for the experiments
74
L. Maurya et al.
Fig. 4 The input a thermal and b visible image; and c the registered visible image into the coordinate of thermal image
align visible image have shown, respectively. The result of the proposed method is compared with the result of manually chosen corresponding points using MATLAB image processing tool [17] and EOH-based method [12]. In the registration process, the thermal image is selected as a fixed image and the visible image selected as moving image. Figure 5 shows the matching of control points in thermal and visible image pairs using proposed algorithms, manually selected control points, and EOH-based method. In manual method minimum 10 points are selected in each image pair and the selection of correspondence provided by authors in experiments. The results show that the proposed method performs well as an automatic method. There are a large number of mismatches occurred in EOHbased method. As shown in Fig. 5, pair 2 shows error in case of EOH-based method due to many-to-one mapping. Note that there are large differences in magnification and FOV of thermal-visible image pairs, causes large-scale variation problem during feature extraction and description in EOH-based method. Hence, the proposed approach is a better alternate of manual control point selection method in case of large-scale variation in thermal and visible image pairs. There are maximum total 8 points are extracted with the help of calibration rig in each image pair by the proposed method. However, minimum 3 points are required for affine transformation. The algorithm sorted the control points and find the minimum in each pair to match the corresponding pairs.
4 Conclusion In this paper, a calibration rig-based algorithm is proposed to perform the registration of thermal and visible images. The control points in both image pairs are extracted with the help of calibration rig. Then the affine transformation parameters are obtained by the correspondence control pairs to register the visible image into an infrared image coordinate. The rig used in experiments is simple-to-make and compensate the human intervention for the selection of control points. Experiment results revealed that the proposed automatic registration approach avoid the human intervention for the selection of control pairs and comparatively better than the manual
Fig. 5 The comparison of proposed method with manual and EOH-based method
An Automatic Thermal and Visible Image … 75
76
L. Maurya et al.
approach. However, in some cases, the visible image is suffered by the illumination which causes error in registration process. In future, the robustness of the algorithm will be observed with high dynamic scenes and rotation effect of calibration rig. Acknowledgements The authors would like to thank the Director, CSIR-CSIO for providing the necessary infrastructure during the investigation. This work is under the fellowship of CSIR-SRF.
References 1. S.G. Kong, J. Heo, F. Boughorbel, Y. Zheng, B.R. Abidi, A. Koschan, M. Yi, M.A. Abidi, Multiscale fusion of visible and thermal IR images for Illumination-invariant face recognition. Int. J. Comput. Vis. 71(2), 215–233 (2007). https://doi.org/10.1007/s11263-006-6655-0 2. J. Ma, Y. Ma, C. Li, Infrared and visible image fusion methods and applications: a survey. Inf. Fusion 45, 153–178 (2019). https://doi.org/10.1016/j.inffus.2018.02.004 3. L.G. Brown, A survey of image registration techniques. ACM Comput. Surv. 24(4), 325–376 (1992). https://doi.org/10.1145/146370.146374 4. B. Zitová, J. Flusser, Image registration methods: a survey. Image Vis. Comput. 21(11), 977– 1000 (2003). https://doi.org/10.1016/S0262-8856(03)00137-9 5. J.P.W. Pluim, J.B.A. Maintz, M.A. Viergever, Image registration by maximization of combined mutual information and gradient information. IEEE Trans. Med. Imaging 19(8), 809–814 (2000). https://doi.org/10.1109/42.876307 6. C. Studholme, D.L.G. Hill, D.J. Hawkes, An overlap invariant entropy measure of 3D medical image alignment. Pattern Recogn. 32(1), 71–86 (1999). https://doi.org/10.1016/S00313203(98)00091-0 7. J. Ma, J. Zhao, J. Tian, A.L. Yuille, Z. Tu, Robust point matching via vector field consensus. IEEE Trans. Image Process. 23(4), 1706–1721 (2014). https://doi.org/10.1109/TIP.2014. 2307478 8. D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94 9. J. Chen, J. Tian, N. Lee, J. Zheng, R.T. Smith, A.F. Laine, A partial intensity invariant feature descriptor for multimodal retinal image registration. IEEE Trans. Biomed. Eng. 57(7), 1707– 1718 (2010). https://doi.org/10.1109/TBME.2010.2042169 10. C.G. Harris, M. Stephens, A combined corner and edge detector, in Citeseer (1988), pp 10–5244 11. H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008). https://doi.org/10.1016/j.cviu.2007.09.014 12. C. Aguilera, F. Barrera, F. Lumbreras, A.D. Sappa, R. Toledo, Multispectral image feature points. Sensors 12(9) (2012). https://doi.org/10.3390/s120912661 13. T. Mouats, N. Aouf, A.D. Sappa, C. Aguilera, R. Toledo, Multispectral stereo odometry. IEEE Trans. Intell. Transp. Syst. 16(3), 1210–1224 (2015). https://doi.org/10.1109/TITS.2014. 2354731 14. Y. Ye, L. Shen, M. Hao, J. Wang, Z. Xu, Robust optical-to-SAR image matching based on shape properties. IEEE Geosci. Remote Sens. Lett. 14(4), 564–568 (2017). https://doi.org/10. 1109/LGRS.2017.2660067 15. M. Lalit, M. Prasant Kumar, K. Amod, A fusion of cuckoo search and multiscale adaptive smoothing based unsharp masking for image enhancement. Int. J. Appl. Metaheuristic Comput. (IJAMC) 10(3), 151–174 (2019). https://doi.org/10.4018/IJAMC.2019070108 16. N. Otsu, A threshold selection method from gray-level histograms. IEEE Trans. Syst., Man, Cybern. 9(1), 62–66 (1979). https://doi.org/10.1109/TSMC.1979.4310076 17. https://in.mathworks.com/help/images/ref/cpselect.html. Accessed 10 Oct 2019
Simple Cycle Gas Turbine Dynamic Analysis Using Fuzzy Gain Scheduled PID Controller Mohamed Iqbal Mohamed Mustafa
Abstract Heavy Duty Gas Turbines (HDGT) which ensure clean and efficient electrical power generation in grid-connected operation experiences load disturbances on regular basis. Proportional plus Integral plus Derivative (PID) controller has been introduced to simple cycle gas turbines rated from 18.2 to 106.7 MW. In addition, Fuzzy gain scheduled PID controller has been proposed and their dynamic behavior is analyzed. The simulation results in terms of time-domain parameters and error criteria reveal that the fuzzy gain scheduled PID controller yield better response during dynamic and steady-state period. Further, the stability of gas turbine is analyzed from the dynamic performance of various state variables, viz., fuel demand (Wd), valve positioner signal (Vp), fuel supply (Wf2), turbine torque developed (F2), and actual torque (Tt) which are also being analyzed in this paper. Analysis of the dynamic response also ensures that the fuzzy tuned PID controller is a suitable controller to be implemented with the latest derivative speedtronic governor control system of the HDGT power plants. Keywords Fuzzy gain scheduled PID controller · Heavy duty gas turbine · Simple cycle operation · Speedtronic governor · Dynamic analysis
1 Introduction Gas turbines are extensively used in power industries for the last 40 years. Advanced gas turbine technology, efficient energy conversion, and flexibility of using multiple fuels are the special features of HDGT [1]. Gas turbine useful for simple cycle operation consists of a compressor, combustion chamber, and a turbine. The stability of such a system is affected when it is subjected to severe load disturbances. As the gas turbine installations are increased, the analysis of gas turbine dynamics also becomes most important [2]. Many researches are being carried out globally to M. I. Mohamed Mustafa (B) Department of Electrical and Electronics Engineering, PSG Institute of Technology and Applied Research, Coimbatore, Tamil Nadu, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 S. Jain and S. Paul (eds.), Recent Trends in Image and Signal Processing in Computer Vision, Advances in Intelligent Systems and Computing 1124, https://doi.org/10.1007/978-981-15-2740-1_6
77
78
M. I. Mohamed Mustafa
identify the simulation model for the gas turbines [2–4]. Among all the models, SPEEDTRONIC Mark IV developed for General Electric gas turbines has become popular for analyzing the dynamic response [5, 6]. Literature review on Speedtronic Governor based gas turbine models reveal that it can be used in either droop governor mode or isochronous governor mode [6, 7]. Because of droop characteristics of the governor, steady-state response of the HDGT plants become poor and hence requires a secondary controller to improve the response [7, 8]. A case study of 84 biomass gasifier power plants in Tamilnadu, India also reveals that the efficient controllers are needed to improve the stability behavior [9, 10]. Since the Proportional–Integral–Derivative (PID) Controllers are versatile, highly reliable, and easy to operate, they are commonly used in control industries [11, 12]. Initially, PID controllers are implemented in a simplified gas turbine model with the gains tuned by conventional methods [13]. It is also possible to embed the soft computing control in the recent governor control mechanisms [13]. Hence fuzzy logic technique is introduced for tuning PID gain parameters. Since the dynamic performance of an engine can be analyzed based on engine speed, the step response of gas turbine has been analyzed based on turbine speed (N) against the step load disturbance. As the turbine installations are increasing drastically, it is mandatory to analyze the turbine stability [14, 15]. Therefore, the state variables, namely, speed deviation (e), fuel demand (Wd), valve positioner signal (Vp), fuel supply (Wf), turbine torque developed (F2), actual torque (T ), and turbine speed (N) have been identified and their behavior is analyzed using MATLAB. On comparing the dynamic responses for a load disturbance, an adaptive controller for stable operation of gas turbine in grid-connected operation is identified.
2 Modeling of Gas Turbine The dynamic simulation model of Speedtronic Governor based HDGT plants consists of three limiters for controlling the turbine speed, turbine acceleration and exhaust temperature. Even though the speed governor can work as droop governor or isochronous governor, droop mode was found as a suitable model for interconnected network [7]. Speed governor controls the turbine speed based on the speed deviation signal, e, which is the difference between the desired speed and original speed (N). Based on the speed deviation, e, the speed governor decides the fuel demand, Wd in order to keep the system stable. Fuel demand signal, Wd as the function of speed deviation, e is expressed in Eq. (1). Wd(s) =
W (X s + 1) .e(s) Ys + Z
(1)
Based on the fuel demand signal, Wd, the fuel supply signal, Wf2 is generated by the valve positioning mechanism and fuel system actuator. Valve positioning
Simple Cycle Gas Turbine Dynamic Analysis Using Fuzzy …
79
Fig. 1 Simulation model of HDGT plant
mechanism decides the valve position, Vp, which gives the fuel supply signal, Wf2, the output of fuel system actuator. Based on the Input–Output signals, the valve positioner and fuel system of gas turbine can be expressed as given in Eqs. (2) and (3), respectively. Vp(s) =
a .Wd(s) bs + c
(2)
Wf2(s) =
1 .Vp(s) 1 + sT
(3)
Gas turbine torque, F2 is determined by the fuel supply level, Wf2, and the turbine speed, N. It is expressed as shown in Eq. (4). Turbine torque will bring down the speed error to zero. The step load disturbance, T L of 1 p.u. magnitude is applied and the actual torque (Tt) which is the difference between the developed torque (F2) and the load torque (T L ) has been obtained. It is then converted into turbine speed, N by the rotor dynamics block as shown in Eq. (5). The gas turbine rotor time constant, T 1 ranges from 12.2 to 25.2 [6]. F2 = 1.3(Wf2 − 0.23) + 0.5(1 − N ) N (s) =
1 .Tt(s). T 1s
(4) (5)
The acceleration limiter takes the control action only during startup time and the influence of the same during normal operating condition is negligible. Similarly, the temperature control loop will work, if the limit of exhaust temperature is exceeded. Since the acceleration and temperature limiter does not influence much during normal operation, these limiters are eliminated and simplified the transfer function model [6, 13]. The simplified model consisting of the dominant speed control loop with all
80
M. I. Mohamed Mustafa
the above dynamic variables is shown in Fig. 1. It is used for analyzing the dynamic responses of all the state variables of the HDGT plants.
3 Development of PID Controller PID Controller has been widely used in many process control industries since it was introduced in the market by 1939 [11, 12]. PID controller as shown in Fig. 2, delivers the control signal, u(t), based on the PID controller gains, namely, Proportional (K p ), Integral (K i ), and Derivative (K d ) gains as shown in Eq. (6). t
u(t) = K p e(t) + K i ∫ e(t)dt + K d 0
de(t) dt
(6)
In this paper, two tuning procedures, viz., conventional and fuzzy tuning methods are used for developing PID controllers. Initially, PID controllers are tuned by ZN and Performance index methods. Fuzzy logic technique has also been used for tuning the gains of the PID controllers as detailed below.
3.1 Fixed Gain PID Control Even though there are many algorithms available to tune the PID controller gains, Ziegler–Nichols (ZN) method has been found to be the standard and most widely used procedure [16–18]. In this paper, the tuning rules of ZN method as mentioned in Table 1 has been applied for PID tuning. Values of PID parameters have been arrived using the gain (K u ) and time (Tu ) when the system response oscillates with constant magnitude. Performance index method is another tuning procedure used to determine the values of PID parameters of HDGT [19, 20]. Integral of error terms namely squared error (JISE ), squared error multiplied with time (JITSE ), and absolute error multiplied with time (JITAE ) as given in Eqs. (7), (8), and (9) are considered as performance
Reference Speed
e
+ N
Kp Ki
_
Fig. 2 Structure of PID controller
+ +
Kd
+ u
Simple Cycle Gas Turbine Dynamic Analysis Using Fuzzy … Table 1 PID gain tuning by ZN method
Tuning method
81
Kp
ZN-C
0.6Ku
ZN-PI
0.7Ku
ZN-SO
0.33Ku
ZN-NO
0.2Ku
Ki
Kd
2K p Tu 2.5K p Tu 2K p Tu 2K p Tu
K p Tu 8
0.15K p Tu K p Tu 3 K p Tu 3
indices criteria for tuning purpose. JISE = ∫ e2 dt
(7)
JITSE = ∫(e2 )tdt
(8)
JITAE = ∫|e|tdt
(9)
By this procedure, the gain parameters are obtained by adjusting the gains until the slope becomes minimum in performance index plot [20]. The PID controllers with the PID gains tuned by these fixed gain PID tuning procedures are implemented in MATLAB and the dynamic behavior is compared as shown in Sect. 5.
3.2 Fuzzy Gain Scheduled PID Control Since, the control mechanism should be more adaptive for the dynamic system; selftuning methods are evolved to keep track of the system dynamics [21–25]. In this paper, fuzzy sugeno model-based gain scheduling PID controller is presented and their dynamic performances are analyzed in detail. L. A. Zadeh proposed fuzzy set theory in 1965 [26]. The gas turbine speed deviation, e and the change in speed deviation signal, e are the driving inputs for fuzzy gain scheduling block and the control signal Us (t) is used as the output signal as shown in Fig. 3.
Ge Gce
N Reference speed
_ +
e
Fig. 3 Structure of fuzzy gain scheduled PID controller
e Δe
Fuzzy Gain Scheduling block
K
K,
,
K
,
PID Controller
Us
82
M. I. Mohamed Mustafa 1
NB
NS
Z
PS
PB
0.5
0 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Fig. 4 Triangular MF for the input signals of fuzzy PID controller
Fig. 5 Constant MF for the output signals of fuzzy PID controller
In this paper, Sugeno fuzzy model has been considered with triangular membership function (MF) for input signals and constant membership function for output signal as shown in Figs. 4 and 5, respectively. Negative Big (NB), Negative Small (NS), Zero (Z), Positive Small (PS), and Positive Big (PB) are selected as the linguistic variables for the input signal. On the other hand, the linguistic variables, namely, Very Big (VB), Medium Big (MB), Big (B), Medium (M), Small (S), Medium Small (MS), and Zero (Z) are chosen for the output signals. Table 2 shows the fuzzy rules using these linguistic variables. The rule base is formulated using IF-THEN rules. Development of fuzzy gain scheduled PID controller involves three stages, viz., fuzzification, evaluation of rules and defuzzification. During fuzzification stage, the error and error rate are converted from crisp value to fuzzy value. Based on these values, the antecedent of fuzzy rules is obtained by selecting ‘PROD’ as fuzzy operator. During second stage, the output of each fuzzy rule is obtained using ‘MIN’ as implication operator. Then the fuzzy output of all rules is combined by choosing ‘MAX’ as aggregation operator. Finally, the combined fuzzy output is converted into crisp value by selecting WTAVER as defuzzification method. Finally, the PID gains are derived from the product of K P,F , K I,F , K D,F and the PID gains. The output signal of F-PID controller, Us (t) is expressed in terms of self-tuned PID gains as given in Eq. (10). Then the F-PID controller is implemented with the Simulink model of 5001M HDGT and the step response is obtained as shown in Sect. 5. Us (t) = K P,S e(t) + K I,S ∫ e(t)dt + K D,S
de(t) dt
(10)
VB
B
Z
B
VB
PS
Z
NS
NB
NB
e
PB
e
K P,F , k I,F , k D,F
M
S
MS
S
M
Z
Z
M
B
VB
Table 2 Rules for Fuzzy PID controller
VB
B
Z
B
VB
NS
M
S
MS
S
M
S
B
MB
VB
VB
VB
B
MS
B
VB
Z
M
S
Z
S
M
M
MB
MB
VB
VB
VB
MB
S
MB
VB
PS
M
S
MS
S
M
MB
VB
VB
VB
VB
VB
VB
S
VB
VB
PB
M
S
MS
S
M
VB
VB
VB
VB
VB
Simple Cycle Gas Turbine Dynamic Analysis Using Fuzzy … 83
84
M. I. Mohamed Mustafa
4 Simulation Results and Discussion MATLAB/Simulink model of HDGT is implemented with conventionally and fuzzybased PID controllers and the responses are compared to identify an adaptive controller for gas turbine. Gas turbine model is simulated for 1.0 p.u. step load variation applied at t = 1.0 s. Initially, the PID controller gains of 5001M HDGT model are tuned by ZN and Performance index methods and the gain parameters are shown in Table 3. Based on the droop characteristics presented in [27], HDGT model has been simulated for 4 percentage droop with PID controller. In this paper, ZN-C and ZN-PIbased PID controllers have alone been considered for analysis due to poor dynamic response by ZN-SO and ZN-NO methods. Figure 6 compares the step responses by these tuning methods. The transient and steady-state parameters are shown in Table 4. Even though the steady-state deviation (Ess) is almost zero by all the rules, maximum overshoot (Mp), rise time (Tr) and settling time (Ts) of ZN-PI based controller are lesser comparing to Table 3 Controller gains conventional tuning procedures
Tuning rules
Kp
Ki
Kd
ZN-C
4.416
6.861
0.711
ZN-PI
5.152
10
0.995
ZN-SO
2.429
3.774
1.042
ZN-NO
1.472
2.287
0.6317
ISE
2.3
0.06
0.41
ITAE
2.25
0.136
0.325
ITSE
1.65
0.048
0.315
1.8
ISE ITAE ITSE ZN-PI ZN-C
1.6
Speed in p.u.
1.4 1.2 1 0.8 0.6 0.4 0.2 0
0
1
2
3
4
5
6
Time in Second Fig. 6 Dynamic performance of HDGT plant with PID controllers
7
8
9
10
Simple Cycle Gas Turbine Dynamic Analysis Using Fuzzy …
85
Table 4 Time domain parameters with PID controllers Tuning rule
Tr (s)
Mp (per unit)
Ts (s)
Ess (per unit)
ZN-PI
0.2681
0.7415
3.6705
0.0
ZN-C
0.2913
0.7453
3.9495
0.00
ISE
0.5674
0.1123
1.8724
0.00
ITAE
0.5416
0.1703
1.982
0.00
ITSE
0.732
0.0642
2.223
0.00
Performance Evaluation Indices
ZN-C method. Further, it is also noted that the ISE based PID performs better during steady-state period. Moreover, the overall comparison of these methods indicates that ISE improved both the steady-state as well as transient behavior. It would be appropriate to conclude the time-domain performance using performance index terms. Therefore, the integral of error terms, namely, square of error (Q ISE ), square of error multiplied with time (Q ITSE ), absolute of error (Q IAE ) and absolute of error multiplied with time (Q ITAE ) as shown in Eqs. (11)–(14), respectively, have been obtained and shown in Fig. 7. Q ISE = ∫ e2 dt
(11)
Q ITSE = ∫(e2 )tdt
(12)
Q IAE = ∫|e|dt
(13)
Q ITAE = ∫|e|tdt
(14)
1.2 1 0.8 0.6 0.4 0.2 0
ISE
ITAE
ITSE
ZN-PI
PID Controller tuning Methods Fig. 7 Performance evaluation indices of 5001M with PID controllers
ZN-C
86
M. I. Mohamed Mustafa
It is clear that ZN-PI method yields better performance among ZN methods. On comparing this with performance index methods, PID controller tuned by ISE gives improved dynamic and steady-state responses. Then the fuzzy tuned PID controller is developed as explained in Sect. 3 and the responses are compared as shown in Fig. 8. The values of Mp, Tr, Ts, and Ess are shown in Table 5. It is understood from Mp, Ts, and Ess that the fuzzy tuned PID has improved the overall performance. For validation, the error indicates are also obtained during simulation and presented in Fig. 9. The values of Q ISE , Q ITSE , Q IAE , and Q ITAE for Fuzzy gain scheduled PID controller are found to be very less and witnessed for the optimal behavior. For ensuring the stability of turbine during load disturbance, the dynamic behavior of error (e), fuel demand (Wd), valve positioner (Vp), and the fuel supply signal (Wf2) with ZN-PI, ISE and fuzzy-based PID are obtained as given in Fig. 10. The dynamic simulation results show that the fuel demand, valve position, and fuel supply signals with F-PID reacted faster compared ISE and ZN-PI methods. It is also noticed that these responses reach their equilibrium point when the error signal approaches zero. Based on the load disturbance and hence the fuel demand signal, the torque is generated by turbine torque function. As presented in the Simulink model of HDGT in Sect. 3, the rotor dynamics with the rotor time constant, T1 is responsible for generating speed signal, N. Figure 11 shows the dynamic responses of Wf2, Tt and N of 5001M model with based on the fuel supply signal, Wf2. Here again, the F-PID controller response is noticed as faster than other methods. It is also identified that 1.8
F-PID ISE ZN-PI
1.6
Speed in p.u.
1.4 1.2 1 0.8 0.6 0.4 0.2 0
0
1
2
3
4
5
6
7
8
9
Time in Second
Fig. 8 Comparison of step responses of HDGT plant
Table 5 Time-domain parameters with F-PID, ISE and ZN-PI tuned controllers Tuning by
Tr (s)
Mp (per unit)
Ts (s)
Ess (per unit)
F-PID
0.4288
0.00
0.7392
0.00
ISE
0.5674
0.1123
1.8724
0.00
ZN-PI
0.2681
0.7415
3.6705
0.00
10
Performance Evaluation Indices
Simple Cycle Gas Turbine Dynamic Analysis Using Fuzzy …
87
1 0.8 0.6 0.4 0.2 0
F-PID
ISE
ZN-PI
PID Controller tuning methods Fig. 9 Comparison of performance evaluation indices
(Wf) in p.u.
50 F-PID
0
ISE ZN-PI
-50
0
1
2
3
4
5
6
7
8
9
10
(Vp) in p.u.
100 F-PID
0
ISE ZN-PI
-100
0
1
2
3
4
5
6
7
8
10
9
(Wd) in p.u.
200 F-PID
0
ISE ZN-PI
-200
0
1
2
3
4
5
6
7
8
10
9
(e) in p.u.
1 F-PID
0
ISE ZN-PI
-1
0
1
2
3
4
5
6
Time in Second
7
8
9
10
Fig. 10 Dynamic response of e, Wd, Vp and Wf2 with different controllers
the torque developed by the HDGT, Tt reaches its normal value. Hence, the actual torque (T ) required for the given load disturbance reduces the speed deviation signal to 0.0 p.u. and then settles at its equilibrium point 0.0. per unit. The actual speed (N ) of HDGT plant reaches the desired speed of 1.0 p.u. With respect to all the dynamic responses of state variables, F-PID reacts faster during dynamic and steady-state period. The time-domain parameters and error indices also witness that fuzzy logic technique based PID improves both the transient and steady-state performances. In addition, the response of the state variables
(N) in p.u.
88
M. I. Mohamed Mustafa 2 F-PID
1
ZN-PI
(T) in p.u.
0
(F2) in p.u.
0
1
2
3
4
5
6
7
8
10
9
100 F-PID
0
ISE ZN-PI
-100
0
1
2
3
4
5
6
7
8
9
10
100 F-PID
0
ISE ZN-PI
-100
(Wf) in p.u.
ISE
0
1
2
3
4
5
6
7
8
9
10
50 F-PID
0
ISE ZN-PI
-50
0
1
2
3
4
5
6
7
8
9
10
Time in Second
Fig. 11 Dynamic response of Wf2, Tt, T , and N with different Controllers
also indicates that the HDGT plant can be more stable with fuzzy tuned PID controller. Hence the Fuzzy gain scheduled PID controller is found to be an adaptive PID controller for HDGT plant in interconnected operation.
5 Conclusion HDGT plants with droop governor require an effective secondary controller to improve the steady-state and dynamic behavior. Modeling of gas turbine and control using fixed gain and Fuzzy logic tuned PID controllers have been presented in this paper. Simulation results of 5001M HDGT model based on time-domain parameters and performance indices prove that F-PID tuned PID can perform better under load disturbances. Further, the reliable operation of HDGT has also been ensured by analyzing the behavior of state variables during uncertainties. The dynamic responses show that fuzzy gain scheduled PID controller responds faster for the load disturbances and hence provides stable operation irrespective of the turbine speed variation and rotor time constant. Therefore, the fuzzy logic-based PID controller is considered to be the adaptive controller for grid-interactive gas turbine plants.
Simple Cycle Gas Turbine Dynamic Analysis Using Fuzzy …
89
References 1. M.P. Boyce, Gas Turbine Engineering Handbook, 3rd edn. (Gulf Professional Publishing, USA, 2017) 2. S.K. Yee, J.V. Milanovic, F. Michael Hughes, Overview and comparative analysis of gas turbine models for system stability studies. IEEE Trans. Power Syst. 23(1), 108–118 (2008) 3. P. Centeno, I. Egido, C. Domingo, F. Fernández, L. Rouco, M. González, Review of gas turbine models for power system stability studies, in Proceeding of the 9th Spanish Portuguese Congress on Electrical Engineering, Marbella, 30 June–2 July 2005, pp. 1–6 4. L.N. Hannet, A. Khan, Combustion turbine dynamic model validation from tests. IEEE Trans. Power Syst. 8(1), 152–158 (1993) 5. F. Jurado, M. Ortega, A. Cano, J. Carpio, Neuro-fuzzy controller for gas turbine in biomass based electric power plant. Electr. Power Syst. Res. 60(3), 123–135 (2002) 6. W.I. Rowen, Simplified mathematical representation of heavy duty gas turbines. ASME J. Eng. Power 105(4), 865–869 (1983) (ASME Paper No. 83-GT-63) 7. S. Balamurugan, R. Joseph Xavier, A. Ebenezer Jeyakumar, Selection of governor and optimization of its droop setting and rotor time constant for Heavy-duty gas turbine plants. Indian J. Power River Val. Dev. 57, 35–37 (2007) 8. M. Mohamed Iqbal, R. Joseph Xavier, A review of controllers for isolated and grid connected operation of biomass power plants, in Proceedings of International Conference on Renewable Energy Technologies (ICORET-2011), Coimbatore, India, 16–17 December 2011, pp. 366–371 9. M. Mohamed Iqbal, R. Joseph Xavier, D. Arun Kumar, G. Raj Kumar, P. Selva Kumar, C. Tamilarasa, A sample survey on biomass gasifier power plants, in International Conference on Emerging Technologies in Renewable Energy (ICETRE-2010), Ref. No. P-87, Anna University, Chennai, August 2010, pp. 18–21 10. M. Mohamed Iqbal, R. Joseph Xavier, Factors influencing the operation of the Biomass gasifier power plants, in Proceedings of Second World Renewable Energy Technology Congress and Expo-2011, Hotel Le Meridien, New Delhi, April 2011, pp. 21–23 11. C.K. Benjamin, Automatic Control Systems, 8th edn. (Prentice Hall, 2007) 12. E.P. Popov, The Dynamics of Automatic Control Systems (Pergamon Press, NY, 2014) 13. S. Balamurugan, R. Joseph Xavier, A. Ebenezer Jeyakumar, Control of heavy duty gas turbine plants for parallel operation using soft computing techniques. Electr. Power Compon. Syst. (37), 1275–1287 (2009) 14. B.W. Bequette, Process Control: Modeling, Design and Simulation (Prentice Hall, 2002) 15. S.K. Yee, J.V. Milanovi´c, F.M. Hughes, Overview and comparative analysis of gas turbine models for system stability studies. IEEE Trans. Power Syst. 23(1), 108–118 (2008) 16. J.G. Ziegler, N.B. Nichols, Optimum setting for automatic controllers. Trans. ASME 64, 759– 768 (1942) 17. A.S. McCormack, K.R. Godfrey, Rule-based autotuning based on frequency domain identification. IEEE Trans. Control Syst. Technol. 6(1), 43–61 (1998) 18. D.W. Pessen, A new look at PID-controller tuning. J. Dyn. Syst. Meas. Control 116, 553–557 (1994) 19. G. Stephanopoulas, Chemical Process Control—An Introduction to Theory and Practice (PHI Learning Pvt. Ltd., New Delhi, 2009) 20. B. Anand, A.E. Jeyakumar, Fuzzy logic Based load frequency control of hydro-thermal system with Non-linearities. Int. J. Electr. Power Eng. 3(2), 112–118 (2009) 21. Y. Fu, T. Chai, Self-Tuning control with a filter and a neural compensator for a class of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 24(5), 837–843 (2013) 22. M.M. Ismail, M.A. Mustafa Hassan, Load frequency control adaptation using artificial intelligent techniques for one and two different areas power system. Int. J. Control Autom. Syst. 1(1), 12–23 (2012) 23. W. Jiang, X. Jiang, Design of intelligent temperature control system based on the Fuzzy self tuning PID, in International Symposium on Safety Science and Engineering in China. Elsevier’s Procedia Eng. 43, 307–311 (2012)
90
M. I. Mohamed Mustafa
24. M. Ramirez-Ganzalez, O.P. Malik, Self-tuned power system stabilizer based on a simple fuzzy logic controller. Electr. Power Compon. Syst. 38(4), 407–423 (2010) 25. M. Mohamed Iqbal, R. Joseph Xavier, Fuzzy self-tuning PID controller for speedtronic governor controlled heavy duty gas turbine power plants. Electr. Power Compon. Syst. 42(14), 1485–1494 (2014) 26. L.A. Zadeh, Fuzzy sets. Inf. Control 8, 338–353 (1965) 27. M.M. Iqbal, R.J. Xavier, J. Kanakaraj, Optimization of droop setting using genetic algorithm for speedtronic governor controlled heavy duty gas turbine power plants. WSEAS Trans. Power Syst. 11, 117–124 (2016)
An Image-Based Android Application for Colorimetric Sensing of Biomolecules Sibasish Dutta
Abstract In the present work, ubiquitous application of smartphone software application for colorimetric quantification of biomolecular samples and its potential usage in biomedical imaging and analysis have been reported. Android being the most popular and versatile operating system (OS) has been used to develop the software application. The developed app can take the images of the sample through the smartphone camera and thereby analyze and display the concentration of the given biomolecule with good accuracy. Keywords Smartphone · Android · Biomolecules · Colorimetry · Imaging
1 Introduction The growth of mobile phone technology has made the world smaller by making communication between peoples much easier than before. The early versions of mobile phones could support audio features only with very little scope for other features like we see today in modern-day smartphones. Mobile phone technologies have grown leaps and bound because of which ordinary feature phones have evolved and became smarter than before so that we can call it smartphones [1]. Nowadays, the use of mobile phones among common people is not only limited to making and attending calls but being regularly used as camera, music player, browsers and many other various applications, thanks to variety of robust and inbuilt sensors embedded in the smartphone [2]. Sensors such as camera, proximity and ambient light have been utilized for developing spectrometers [3–5], microscopes [6, 7] and other types of colorimetric sensors [8, 9]. The USB port and bluetooth connectivity have also been utilized for connecting auxiliary devices for different types of sensing applications [10–12]. In most of these works, a smartphone application interface is developed
S. Dutta (B) Department of Physics, Pandit Deendayal Upadhyaya Adarsha Mahavidyalaya (PDUAM), Eraligool, Karimganj, Assam, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 S. Jain and S. Paul (eds.), Recent Trends in Image and Signal Processing in Computer Vision, Advances in Intelligent Systems and Computing 1124, https://doi.org/10.1007/978-981-15-2740-1_7
91
92
S. Dutta
which along with optomechanical hardware attachments can be very useful for developing robust and portable devices in the field of environmental parameter sensing and healthcare diagnostics [8, 13]. Smartphone-based devices have already been popularized for the past one decade. However, very few works have been reported toward an integrated application-based platform for running the biochemical test and displaying the result along with sending the information to decentralized laboratories. Toward this end, the present work focuses on the development of a smartphone application for detection and analysis of three different biomolecular samples in the same platform. The anatomy of a smartphone can be described in terms of its hardware and software features. The hardware architecture of a smartphone can be roughly classified into application processor, modem or baseband processor and other peripheral components [14]. With newly and rapidly evolving hardware technologies, new software and operating systems are needed to meet the requirements. Above all, the operating system of a smartphone manages all the hardware and software features. The kernel is the central component of an operating system. It is actually an interface between the user and the software system. There are different types of operating systems currently running in the market for different categories of smartphones. The two big players in the smartphones’ operating systems are Android and iOS. Out of them Android is occupying almost 76.03% of the market’s share and iOS just 22.04% and the rest 1.93% by others such as RIM, Windows, as shown in Fig. 1. While iOS is the most preferred OS among the developing nations, Android OS has always dominated the global market share and there has been a marginal increase and decrease in its share in the global podium. Taking into account the huge popularity of Android over iOS and its lion’s share in the smartphone market, in the present work, an Android app has been developed that can take the image of the sample and estimate the concentration of target analyte using pre-defined calibration. The calibration is done by using samples of known concentration. The Android Architecture Android is an operating system based on Linux, specifically designed for smartphones and tablets. The Android architecture is a software stack and composed of Fig. 1 Market share of different OS currently available in the market
An Image-Based Android Application …
93
Fig. 2 Software stack of android architecture
OS kernel, system libraries, applications framework and key applications. The visual representation of the Android architecture has been illustrated in Fig. 2. The Android structure is divided into five different categories, namely: (i) (ii) (iii) (iv) (v)
Linux kernel Libraries Android runtime Application framework Applications.
The Android software stack comprised a Linux-based software stack and organized into different layers. Linux kernel, considered to be the heart of Android, is at the bottom of the stack that interfaces the application software with the physical hardware. The Linux kernel layer similar to any Linux kernel provides generic operating system services such as system security, process management, memory management and I/O device management, to name a few. In addition to that, there are Android-specific components services provided by Android Linux kernel, such as power management, inter-process communications are some of them. Above the Linux kernel, there is variety of system libraries often called as native libraries and are written in C/C++. Examples of some system libraries are Surface Manager, Media Framework, SQLite, OpenGL and so on. In addition to system libraries, there is Android runtime for supporting writing and running Android applications. The two
94
S. Dutta
major components of Android runtime are Core Java libraries and Dalvik virtual machine (DVM). Written in Java programming language, Android applications need necessary building blocks, provided by the Core Java libraries. DVM relies on Linux kernel and its function is to run the Android applications written in Java programming language. A Java compiler compiles the Java source code files into Java bytecode files. Thereafter, the Java bytecode files are converted into Delvik executable format (.dex). Application framework is the next layer of the Android architecture. It is the toolkit that provides services to build an application and includes textboxes, grids, lists, buttons and embedded web browser. At last, the final layer of the Android architecture is the application layer. It contains some in-built applications such as home screen, web browser, phone dialer and email reader. Android Development Tools Android development tools are required to develop Android applications. Earlier, there used to be eclipse plugin as the integrated development environment, which has been replaced by Android studio in December 2014. Today, Android studio is the official environment (IDE) for developing Android applications. In order to establish the Android application environment, two steps are involved: at first, installation of Java development kit (JDK) is necessary and then installing of Android studio is to be done. Android studio includes the Android software development kit (SDK). Installing the Java Development Kit (JDK) Android studio requires installation of JDK of version 6 or higher. It can be installed from the website Sun/Oracle: http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads1880260.html. In the present work, JDK of version 1.7.0 has been used and the operating system was Windows. Installing the Android Software Development Kit (SDK) After installation of JDK (version 1.7.0), the Android studio package has been installed from the following site: https://developer.android.com/studio/index.html. After installing the Android studio IDE, some default Android SDK packages are installed with it. Further installation of missing packages and necessary components can be done by using Android SDK manager. In the “Tools” section of the SDK manager, different versions of Android application programming interface (API) installed can be seen. Android API constitutes tools and protocols to build software for various applications. Different Android devices support different SDK versions, so it is necessary to install different versions of Android APIs so that the developed Android application can be supported by various Android devices. Lower-level API may encompass more devices but at the expense of less available features. It is possible that the developed app of API 15 and higher will run on 90% devices currently active in Google playstore.
An Image-Based Android Application …
95
Developing an app in Android Studio After configuring the system to develop Android apps using Android Studio IDE and ensuring that all the required packages are installed, an Android app can be developed and tested in Android emulator or real-time Android device. In the present work, Asus zenfone 5 has been used as the smartphone with Android 4.4.2 “KitKat” as operating system. The API level was set to 19 before starting to develop the app.
2 Methodology Custom-designed Android Application for Quantification of Biomolecules (Protein, Enzyme and Carbohydrate) The Android application has been created in Android studio, the official IDE for developing Android application. A combination of javascript and xml was used for processing activities and layout, respectively, for the designed software. Calibration of the Android Application The developed Android app has been calibrated by using reagent-treated biomolecular samples (a protein named BSA, an enzyme named catalase and carbohydrate) of known concentration. Lowry’s reagent was used for treating the protein and the enzyme, while Anthrone’s reagent was used for carbohydrate samples. For the protein and enzyme samples, standard calibration was done from 0 to 1 mg/mL, and for carbohydrate, the calibration was done in the range of 0–140 µg/mL. After preparing the reagent-treated samples, they were poured one by one inside a quartz cuvette. Taking a white background as a reference as shown in Fig. 3, images of the sample (of different concentrations) filled cuvette were taken using the mobile phone camera and their corresponding V-value has been calculated.
Fig. 3 Snapshots of the reagent-treated samples (carbohydrate) and cropped portion of the cuvette filled with sample solutions whose V-channel value is calculated
96
S. Dutta
The magnitude of V-value of HSV for the biomolecular samples treated with specific reagents has been calculated as follows: V=
max(R, G, B) 255
(1)
3 Results and Discussion After calculating the V-channels of each set of biomolecular samples, curve fitting has been done in order to determine the trend of variation between the sample concentration and their corresponding V-channel values. It has been found that the magnitude of V-channel varies linearly with increase in sample concentration (treated with specific reagents) with negative slope as described in Fig. 4 (a), (b) and (c) for BSA, catalase and carbohydrate, respectively.
(b) Smartphone data Linear fit R2 = 0.992
V value of HSV
0.5 0.4 0.3 0.2 0.1 0.0
0.2
0.4
0.6
0.8
Smartphone data Linear fit R2 = 0.978
0.48
V value of HSV
(a)
0.44 0.40 0.36 0.32
1.0
0.0
BSA concentration in mg/mL
0.2
0.4
0.6
0.8
1.0
Catalase concentration in mg/mL
(c) Smartphone data Linear fit R2 = 0.965
V value of HSV
0.4
0.3
0.2
0.1 0
40
80
120
160
Carbohydrate concentration in µg/mL
Fig. 4 Calibration of a BSA, b catalase and c carbohydrate through V-value of HSV color space using standard samples of known concentrations
An Image-Based Android Application …
97
Upon doing linear fitting, the empirical relation between the sample concentrations with V-channel values is found to be VBSA = −0.33677(xBSA ) + 0.49413
(2)
Vcatalase = −0.12044(xCatalase ) + 0.47679
(3)
Vcarbohydrate = −0.00197 xCarbohydrate + 0.36459
(4)
where VBSA , Vcatalase and Vcarbohydrate indicate the magnitude of V-value of BSA, catalase and carbohydrate, respectively. Workflow and Working of the Android App Figure 5a shows the workflow of the Android application. Figure 5b shows the screenshots images of Android app for step by step execution of colorimetric quantification. The workflow of the app functions is as follows: Initially, when the user clicks the application icon, a window pops up, prompting the user to enter the values of intercept and slope of the best fitted calibration graph of the standard samples as obtained from Eqs. (2–4). From the calibration curve, the slope of the reagent-treated BSA samples is found to be 0.33677 for BSA, 0.12044 for catalase and 0.00197 for carbohydrate and the corresponding intercepts of the linear fitted curve are found to be 0.49413, 0.47679 and 0.36459, respectively. When the user puts the value of slope and intercept, a “select photo” button directs the user to take a new image or images recorded previously and can be cropped accordingly. At last, the user can calculate the biomolecular concentration of the selected sample with a final window popping up and displaying the concentration of the specific biomolecule sample. The unknown concentration of the biomolecular sample has been calculated by using the following formula: XBSA =
VB S A − 0.49413 −0.33677
(5)
where XBSA represents the concentration of BSA and VBSA represents the corresponding V-channel value. Similarly, the unknown concentration of catalase and carbohydrate can be estimated.
4 Conclusion In summary, the present chapter demonstrates the use of smartphone camera to perform quantitative analysis of biomolecular samples. As a proof of concept, protein (BSA), enzyme (catalase) and carbohydrate have been used as standard biomolecular samples. Protein and enzyme samples have been treated with Lowry’s reagent and
98
Fig. 5 a Work process of the android app and b screen snapshots of the android app
S. Dutta
An Image-Based Android Application …
99
carbohydrate samples have been treated with Anthrone’s reagent. The change in color of the reagent-treated samples has been quantified through V-channel of HSV color space using the custom-designed Android application. The utility of the designed app demonstrated in this chapter is not only limited to colorimetric quantification of the above demonstrated biomolecules but can be extended for other types of biomolecular samples. Also, it is anticipated that the designed app could be very useful for monitoring environmental water quality through similar colorimetric technique. It is envisioned that such app-based devices could promote a wide range of analytical and bioanalytical sensing in different fields of applications. In the present work, only V-value of HSV color space has been used for colorimetric quantification. In future course of time, colorimetric sensing would be investigated using other color models such as RGB model, CMYK model and CIE color model. The app would be designed and customized in such a way that the user can select the specific color channel and correlate its variation with the sample concentration and use the best color channel and model for quantitative analysis. The only limitation using camera of a smartphone is associated with different parameters of camera, such as megapixel, sensor size, f-no., focus and aperture, which varies for different types and manufactures of smartphone. Apart from that, it is necessary to control the ambient lighting condition for obtaining uniformity in illumination of the sample solutions inside the cuvettes. However, these limitations can be overcome by using good quality camera phones and taking the sample images in a darkhood under uniform ambient illumination.
References 1. A. Ozcan, Mobile phones democratize and cultivate next-generation imaging, diagnostics and measurement tools. Lab Chip 14, 3187–3194 (2014) 2. N.D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, A.T. Campbell, A survey of mobile phone sensing. IEEE Commun. Mag. 48, 140–150 (2010) 3. Mary B. Stuart, Andrew J.S. McGonigle, Jon R. Willmott, Hyperspectral imaging in environmental monitoring: a review of recent developments and technological advances in compact field deployable systems. Sensors 19, 3071 (2019) 4. S.C. Lo, E.H. Lin, K.L. Lee, T.T. Liang, J.C. Liu, P.K. Wei, W.S. Tsai, A: concave blazedgrating-based smartphone spectrometer for multichannel sensing. IEEE Sens. 19, 11134–11141 (2019) 5. S. Dutta, A. Choudhury, P. Nath, Evanescent wave coupled spectroscopic sensing using smartphone. IEEE Photon. Technol. Lett. 26, 568–570 (2014) 6. M.V. Bills, B.T. Nguyen, J.Y. Yoon, Simplified white blood cell differential: an inexpensive, smartphone-and paper-based blood cell count. IEEE Sens. 19, 7822–7828 (2019) 7. C. Vietz, M.L. Schütte, Q. Wei, L. Richter, B. Lalkens, A. Ozcan, P. Tinnefeld, G.P. Acuna, Benchmarking smartphone fluorescence-based microscopy with dna origami nanobeads: reducing the gap toward single-molecule sensitivity. ACS omega 4, 637–642 (2019) 8. S. Dutta, Point of care sensing and biosensing using ambient light sensor of smartphone: critical review. TRAC-Trend Anal Chem. 110, 393–400 (2019) 9. J.I. Hong, B.Y. Chang, Development of the smartphone-based colorimetry for multi-analyte sensing arrays. Lab Chip 14, 1725–1732 (2014)
100
S. Dutta
10. X. Wang, M.R. Gartia, J. Jiang, T.W. Chang, J. Qian, Y. Liu, X. Liu, G.L. Liu, Audio jack based miniaturized mobile phone electrochemical sensing platform. Sens. Actuat B Chem. 207, 677–685 (2015) 11. P.B. Lillehoj, M.C. Huang, N. Truong, C.M. Ho, Rapid electrochemical detection on a mobile phone. Lab Chip 13, 2950–2955 (2013) 12. D. Zhang, J. Jiang, J. Chen, Q. Zhang, Y. Lu, Y. Yao, S. Li, G.L. Liu, Q. Liu, Smartphonebased portable biosensing system using impedance measurement with printed electrodes for 2, 4, 6-trinitrotoluene (TNT) detection. Biosens. Bioelectron. 70, 81–88 (2015) 13. D. Zhang, Q. Liu, Biosensors and bioelectronics on smartphone for portable biochemical detection. Biosens. Bioelectron. 75, 273–284 (2016) 14. N. Smyth, Android studio 2 development essentials (Payload Media, USA, 2016)
Doppler Ultrasonography in Evaluation of Severe Type 2 Diabetes Mellitus: A Case Study Saurav Bharadwaj and Sudip Paul
Abstract A study on the effect of type 2 diabetes mellitus is incomplete without the evaluation of the Doppler ultrasonography, as it indicates the fluid pressure in the walls of the major blood vessels. Doppler USG is a non-invasive imaging technique that detects the vessel blockage and blood clots in the arteries. In advancement, the technique of Doppler electrocardiography uses high-frequency sound waves to create an image of the heart, while the use of Doppler technology allows determination of the speed and direction of blood flow by utilizing the Doppler effect. The chapter presents a typical case of a severe type 2 diabetes mellitus patient, who showed symptoms of peripheral neuropathy (i.e. reduced blood flow in narrowed blood vessels that caused severe pain in the lower limbs). Complete work is analysed considering two stages: stage one states the abnormal systolic and diastolic pressure in the vessel walls of type 2 diabetes mellitus patient, and stage two depicts the recovery stage of lowering of glucose to a normal level in the patient. A wearable glucose monitoring system is placed over the arm in the patient to continuously monitor the glucose level for 15 days. Keywords Doppler ultrasonography · Type 2 diabetes mellitus · Ankle brachial index · Toe brachial index
Nomenclatures ABI BD CRP
Ankle brachial index Brachial difference C-reactive protein
S. Bharadwaj (B) Indian Institute of Information Technology Guwahati, Guwahati, Assam, India e-mail: [email protected] S. Paul North Eastern Hill University, Meghalaya, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 S. Jain and S. Paul (eds.), Recent Trends in Image and Signal Processing in Computer Vision, Advances in Intelligent Systems and Computing 1124, https://doi.org/10.1007/978-981-15-2740-1_8
101
102
D DF T2DM ESR HBP HP LADA LBP LP MODY P PAD PI PR RBC S SBI RI TBI TSH USG
S. Bharadwaj and S. Paul
Peak diastolic velocity Damping factor Type 2 diabetes mellitus Erythrocyte sedimentation rate High brachial pressure High pressure Latent autoimmune diabetes Low brachial pressure Low pressure Maturity onset diabetes Pressure Peripheral arterial disease Pulsating index Pourcelot ratio Red blood cell Peak systolic velocity Spectral broadening index Resistance index Toe brachial index Thyroid stimulating hormone Ultrasonography
1 Introduction T2DM is one of the major chronic conditions found in the world population. It is generally found in the patients with abnormal insulin control and response in conversion of glucose to energy within the cells [1]. Medically, the reaction of insulin in the glucose–energy cycle is coined as an autoimmune reaction. The chronic condition increases the level of sugar level in the blood in consumption of a high amount of sugar in the form of natural and complex carbohydrates in regular diets [2]. Annually, a large number of patients suffer from cardiac strokes, internal infections, cardiac diseases, hotspots unconsciousness, sweet-smelling breath, thirsty, fatigue, vision disorder, cataract, glaucoma, low work potential, high blood pressure, abnormal functioning of pancreas, gastroparesis, frequent urination, fats deposition in blood vessels, and dry cracked skin foot problems [3]. The International Diabetes Federation states that the statistics of a person being affected by the condition is increased to 371 million. It is a fact that almost 187 million people are unaware of the disease. Generally, a large number of working people are mostly affected due to their faux pas of overeating junk and complex carbohydrate sugary foods, limited exercise, and irregular glucose monitor. The United Kingdom Global Diabetes Community mentions that T2DM occurs in a patient due to an abnormal condition of excessive production of glucagon (glucagonoma), inflammation of the pancreas
Doppler Ultrasonography in Evaluation of Severe Type 2 Diabetes …
103
(chronic pancreatitis), genetically mucus build in the lungs and digestive system (cystic fibrosis), and surgical removal of pancreas (pancreatectomy). According to the United Kingdom Global Diabetes Community, the diabetes is of 12 types as type 1 diabetes, type 2 diabetes, gestational diabetes, diabetes LADA, diabetes MODY, type 3 diabetes, steroid-induced diabetes, brittle diabetes, secondary diabetes, diabetes insipidus, and juvenile diabetes. Limited to the study on T2DM, the chapter describes the Doppler USG on relevant experimental results of the chronic condition in connection to complete health status. Physical Doppler USG system consists of three segments: signal extraction module, a processing module, and the real-time display module. In signal extraction, the transducer captures the shift of Doppler frequency and identifies the direction of fluid flow in the blood vessels. Practically, Doppler USG transducers are of different configurations in the diagnosis of various parts of the body. Several configurations of the transducer are available in the market for practical applications as convex, linear, phased array, micro convex, T-type linear, biplanar, endocavitary, linear, and intrarectal transducers. Doppler USG system produces two models of imaging process as continuous-wave and pulse-wave Doppler imaging. Continuous-wave Doppler USG transducer consisting of transmitting and receiving elements separated a distance apart that captures the real-time signal continuously for the complete operation time. Pulse-wave Doppler USG transducer is a single element that transmits and receives periodically at two conjugative time slots. Graphically, the Doppler USG process is used to produce pulse-wave (spectrogram) and colour flow Doppler images. Doppler US spectrogram is extracted to estimate the accuracy of frequency range in spectral broadening, direction of fluid flow, gate threshold, maximum and mean scatterer velocity, high and low detectable velocity, system sensitivity, angle accuracy, identify and removal of strong signals produced from the vessel in wall movement, estimate the maximum depth of the blood vessel, production of the minimum detectable signal from the additional noise, ratio of the maximum cluster signal to minimum detectable flow signal, spatial-temporal and velocity resolution, degree of colourization in different tissue regions. Doppler USG technique estimates the ABI and TBI to identify the PADs and other cardiac diseases [4] in the deviation of the systolic and diastolic waveforms. PADs are found in the patients on increased risk factors, including age, limited exercises, smoking, and hypertension but not limited to these sets of factors [4, 5]. In ignorance, it may increase the discovery of wounds, gangrene, and amputation in the body [5, 6]. Experimentally, a high ABI indicates the calcification leading to stiff of the arterial walls and uncompressed lower extreme arteries. However, a decreased ABI indicates the occurrences of T2DM [7]. In combinations of both the situations, the magnitude of ABI lies in the normal range leading to undiagnosed PAD [4, 8]. Successively, TBI is the modern index for accurate diagnosis of PAD, sensorimotor neuropathy, and renal failure patients. An accurate TBI provides the derivation of the T2DM than the diagnostic boundary limitation of TBI. The limitation of TBI is that it can only be used in vascular medical imaging and diagnosis. It has a low sensitivity and specificity than traditional ABI [9].
104
S. Bharadwaj and S. Paul
Fig. 1 Signal processing in Doppler USG system: Doppler transducer detects the original signal and is processed in the signal processing module to reduce the background noises. Single processing module reduces complete noise and the modified signal is displayed in the monitor module in the Doppler USG system
Researchers stressed to find the threshold of diagnosing the PAD based on TBI standard measures. Physically, several other parameters affect the TBI value, including body temperatures, and correct limb positioning that deviate a high accuracy. The chapter is structured into five sections: Sect. 2 describes a brief survey on the Doppler USG imaging in T2DM, Sect. 3 describes the case study of a severe T2DM patient, Sect. 4 provides an elaboration of the results, and Sect. 5 concludes the important facts of the case study (Fig. 1).
2 Literatures Survey and Background Study Present-day advanced Doppler USG technique is a sequential development of the Doppler effect in the field of biomedical instrumentations. The grassroots of it started from the demonstration of Doppler effect in the year 1842 by the Austrian mathematician and physicist Christian Andreas Doppler. Later in 1950s, the idea of the Doppler effect was applied in medical applications. Grasping the theory of Doppler effect, Japanese physicist Shigeo Satomura applied it in medical US diagnosis and invented the first Doppler US system to observe the cardiac movement of the heart. Successively in 1959, two of the inventors Robert Rushmer from University of Washington and Dean Franklin prototyped the first continuous-wave Doppler device. Following in 1960s, Seattle research team developed the first pulsed-wave Doppler device. Pieces of literature reveal a complete diagnosis of the severe T2DM patient required to identify the effect in blood vessels of the patients [4, 5]. The clinical test provides an accurate estimate of the fluid components in the patient but it presents a relatively short observation period, suboptimal dosage schedules, and suboptimal surrogate markers for long-term outcomes. Clinical laboratory techniques fail to provide real-time data of fluid mobility, but the invasive electrophysiological techniques can visualize the real-time effect of fluid movement and pressure variance in abnormal T2DM graphical plots (Figs. 2, 3). The internal body organs are mostly affected by the chronic condition and electrocardiogram is the first step to observe the source pulse produced in the heart [5, 6]. However, electrocardiogram does not
Doppler Ultrasonography in Evaluation of Severe Type 2 Diabetes …
105
Fig. 2 Physical modules of Doppler USG system: Doppler USG transducer is placed over the skin to extract the Doppler signal and is filtered from the additional noise using different noise reduction algorithms. Doppler imaging pictographs are generally plotted in 2D, 3D, and 4D spectrogram [3, 4]
Fig. 3 Types of diabetes in the patients occur depending on a number of factors: age, genetic hierarchy, insulin secretion, resistance, deficiency, and action [5, 6]
106
S. Bharadwaj and S. Paul
Fig. 4 Doppler USG spectrogram examination of the severe T2DM patient: T2DM patients suffer from abnormal Doppler spectrogram with a major deviation in systolic and diastolic pressure that leads to different peripheral arterial diseases. Abnormal Doppler USG signals are recorded from right and left brachial, right and left posterior tibial, right dorsalis pedis, and right and left toe arteries. All the represented signals are recorded from the Doppler USG machine of the clinical laboratory [5, 6, 8, 9]
provide accurate information over a narrow region away from the pulse source and Doppler USG is the right choice to visualize the variation of the systolic and diastolic peaks of the spectrogram. The authors proposed a method of diagnosing the Doppler spectrogram in the limbs of a severe T2DM patient [6–8] (see Fig. 4). The abnormal value of the pressures, TBI and ABI indices, and deviation in the Doppler spectrogram depicts a new story of abnormal blood flow in the different limbs of the patient. After medication, a continuous evaluation is done to monitor the glucose level in the blood for a duration of 15 days to ensure complete recovery of the T2DM patient [9].
3 Case Study: Type 2 Diabetes Mellitus The case study of a T2DM patient is analysed from several tests by a complete checkup of the body. It was in a severe case, so it is restricted to analyse the complete blood fluid circulation in the body. Blood, plasma, and serum samples are collected from a 59-year-old male analysed in the pathological and haematology laboratories [10]. On clinical examination, several methods are performed using the method of glucose oxidase, alkaline picrate, ion-selective electrode, immunoturbidimetric, and chemiluminescent methods to determine the elevated level of fasting and postprandial blood glucose, creatinine, CRP, and TSH and the decreased level of sodium ion in the blood [11] (see Table 1).
Doppler Ultrasonography in Evaluation of Severe Type 2 Diabetes …
107
Table 1 Pathological experimentation of the patient: Laboratory tests are performed using different methods: glucose oxide, alkaline picrate, ion-selective electrode, immunoturbidimetric, chemiluminescent immunoassay, microscopy, Coulter principle, spectrophotometry, Westergren method to investigate different pigments present in the blood [2, 3] Test
Results
Biological reference intervals
Methods
Fasting plasma glucose
493 mg/dL
70–100 mg/dL
Glucose oxidase method
Creatinine
1.5 mg/dL
Infant: 0.2–0.4 mg/dL Child: 0.3–0.9 mg/dL Adult: 0.5–1.4 mg/dL
Alkaline picrate method
Sodium
133.5 mEq/L
135–155 mEq/L
Ion-selective electrode method
Potassium
4.28 mEq/L
3.5–5.5 mEq/L
Ion-selective electrode method
C.R.P.
39.6 mg/L
0–5 mg/L
Immunoturbidimetric method
Postprandial plasma glucose
639 mg/dl
0.4
38
19
Total
199
The sample respondent profile comprised mainly young farmers up to 35 years and indulged only in farming. They gained education up to graduation, and had annual between up to 0.4 million rupees.
4 Findings of the Study To find major dimensions causing farming distress, Factor Analysis technique has been applied.
4.1 Data Analysis For all analyses, a level of 0.05 was set as a level of statistical significance. The factors produced, due to agri-prenerial skills attributes, were retained, in case, the Eigenvalues was greater than 1. The varimax rotation method was used to rotate the factors. The factors formed were named keeping in view variables with higher factor loadings.
Factor-Based Data Mining Techniques …
239
Table 2 Agri-preneurial skills statistics Loading Factor 1: interpersonal skills Job is better than starting enterprise
0.711
Inadequate subsidies
0.785
Requires experience
0.774
Requires training
0.707
Expert advice required
0.684
Not suitable to youth
0.730
Decreases self-confidence
0.741
Do not get help
0.637
Factor 2: critical and creative thinking skills Better option for the rural poor
0.821
Only source of self-employment
0.865
To become a role model
0.838
Should be a self-motivated
0.830
Should be optimistic
0.840
Could not develop analytical skill
0.745
Factor 3: technology skills EDP
0.744
Not essentially a creative activity
0.783
Seasonal agri-enterprises not remunerative
0.769
Eigenvalue
% of variance
4.321
25.418
3.950
23.235
1.831
10.771
Factor Analysis of Agri-prenerial Skills Attributes. The result of the factor analysis is given as per Table 2. The Extraction Method used was the Principal Component Analysis (PCA). This procedure short-listed 17 Agri-prenerial Skills attributes to move on to the next stage out of the original 35 attributes. The KMO score is above 0.50 (0.879) and the Bartlett’s test is significant (χ2 = 2807.495, df = 136). Three factors obtained as a result explained 59.424% of the total variance.
5 Discussions Keeping in view farmer distress leading to farmer suicide of not only self but entire family [9], an attempt has been made to exploit immense potential of rural youth. Their attitude has been measured towards agri-preneurship. For starting own business in agriculture, they must focus on: 1. Identify a Need. 2. Identify Areas to Enjoy some Competitive Advantage.
240
3. 4. 5. 6. 7. 8.
M. Gupta Vashisht and V. B. Soni
Develop a Business Plan. Seek Mentorship From Experienced Agricultural Entrepreneurs. Join Farmer Associations in your Area. Diversify your Business Portfolio. Outsource where Necessary. Learn to use Relevant Business Development Tools [10].
The major agri-preneurial skills that have emerged from this study (Table 2) helped in determining the attitude of rural youth towards agri-entrepreneurship for better farming environment. In agri-preneurship, factors, which make the pressure of the ‘chore’ more bearable, such as ‘to become a role model’, are of natural concern to the agri-preneur. The evaluation of measured attitude of rural youth towards agripreneurship for better farming environment ultimately converged into three major Agri-preneurial Skills Dimensions. Important Agripreneurial Skills Dimensions: 1. Interpersonal Skills: To be a successful agri-preneur, the first question is ‘which business-specific area to enter’? A person must seek Mentorship from Experienced Agricultural Entrepreneurs and must Join Farmer Associations in the Area. They will guide about financial institutions, supply chain management, current market scenario and much more; and that too free of cost. ‘I feel job is better’, ‘not suitable to youth’, ‘decreases self-confidence’ indicates they need to focus more on skills that can provide enough confidence to start an enterprise. ‘Inadequate subsidies’, ‘non-receipt of any help’ reflects though government is coming up with one plan or the other, but how effective and realistic these are need to be evaluated. ‘Experience is required’. Many organizations, NGOs are coming up for support as reflected in next statement ‘Requirement of training’, backed by ‘Expert advice’. The governments at various levels are proposing policies for improvement of lives of farmers in the nation, keeping in view the aim to increase average farmers’ income twice by 2022. For this, inefficiencies in the agricultural supply chain need to be removed so that Indian agritech may attract lucrative investments [11]. The reality is, nearly two-third (62%) of interviewed the respondents (as per the CSDS survey) are not familiar with the minimum support price (MSP) and those aware (64%) were not satisfied with government-offered price. On the other hand, the farmers with minimum landholding of 10 acres are getting benefitted with government–laid schemes and policies. Furthermore, farmingrelated information is not reaching three-fourth (74%) respondents via the agriculture department’s officials. 2. Critical and Creative Thinking Skills: One of the very basic requirements to be entrepreneur is ‘Need Identification’, followed by ‘Identifying Areas of Competitive Advantage’ on the basis of which an effective and realistic ‘Business Plan’ can be developed. The attitude mearurement provides an insight to the budding agri-preneur. This skill dimension provides a sound base to the budding entrepreneur in identifying the area in which they should make
Factor-Based Data Mining Techniques …
241
successful entry along with a clear justification; being reflected from the statements ‘better option for the rural poor’, ‘only source of self-employment’. It also enhances the morale as well as to carry out business activities profitably. ‘To become a role model’; ‘should be a self-motivated’, ‘should be optimistic’. ‘could not develop analytical skill’. This skill dimension provides a sound base to the budding entrepreneur to become independent and to carve a niche for self [11]. 3. Technology Skills: For a business to be successful, it must be backed by technology in today’s era. Whether a person wants to Diversify Business Portfolio, Outsource where Necessary. It is becoming mandatary to create presence on social media, having an ‘app’ for own enterprise and so on. For this, ‘Entrepreneurship development programs were conducted’. Also, a person is able to analyze ‘the extent of creativity involved in Entrepreneurship’. The agri-preneur also develops an insight ‘Seasonal agri-enterprises are not remunerative’. Adhir Jha stated that technologies can be leased to farmers for a time-period by startups [11]. Ramesh and Madhavi [7] found that stress due to financial factors was slightly higher than other stresses. Based on these dimensions, innovative ways to learn agri-preneurship have been analyzed.
6 Conclusion The study was undertaken to determine how various agri-preneurial skills rate on rural youth evaluation towards agri-entrepreneurship for better farming environment”. The evaluation of measured attitude of rural youth towards agri-preneurship for better farming environment ultimately converged into three major Agri-preneurial Skills Dimensions viz., Interpersonal Skills, Critical and Creative Thinking Skills and Technology Skills. Besides this, other areas such as farm tourism have also been explored. An attempt has been made to motivate youth and women in family of farmers to provide support in this initiative. Technology is being implemented in retail and digital agronomy by AgriTech startups.
References 1. J. Sood, India’s deepening farm crisis: 76% farmers want to give up farming, shows study (2018), https://www.downtoearth.org.in/news/indias-deepening-farm-crisis-76-farmers-wantto-give-up-farming-shows-study-43728 2. J. Sood, Farmers have decreased, farm labourers increased: census report (2015), https:// www.downtoearth.org.in/news/farmers-have-decreased-farm-labourers-increased-censusreport–40940 3. Union Budget 2019 Highlights, Agriculture sector to be one of the prime focus areas, https:// www.bankbazaar.com/tax/union-budget-agriculture-sector.html
242
M. Gupta Vashisht and V. B. Soni
4. S. Narain, A.K. Singh, S. Gupta, Farmers’ distress in Uttar Pradesh, India—lesson from a research study. Int. J. Bio-resour. Stress Manag. 6(2), 274–279 (2015). https://doi.org/10.5958/ 0976-4038.2015.00046 5. J.S. Kureshi, K.V. Somasundaram, Assessment of occupational stress among farmers in Aurangabad district, Maharashtra. Int. J. Commun. Med. Public Health 5(4), 1434–1440 (2018) 6. S. Sharma, S. Kaur, W. Chawla, Farmer’s suicide in Punjab: causes and remedies. Int. J. Adv. Educ. Res. 2(4), 96–97 (2017) 7. A.S. Ramesh, C. Madhavi, Occupational stress among farming people. J. Agric. Sci. 4(3), 115–125 (2009) 8. G. Shivacharan, V. Sudharani, R. Vasantha, K. Supriya, Construction of attitude scale for rural youth towards agri entrepreneurship. Int. J. Pure App. Biosci. 5(4), 1117–1121 (2017). http:// dx.doi.org/10.18782/2320-7051.5638 9. L. Kaur, P. Sharma, L. Garg, Causes and cure of farmer’s suicide. Indian J. Econ. Dev. 12(1a), 305–310 (2016) 10. Entrepreneurs’ Square, 8 smart steps to becoming a world-class agricultural entrepreneur (2016), https://www.entrepreneurssquare.com/become-successful-agricultural-entrepreneur/ 11. Team Inc42, AgriTech in India: how startups are changing the face of Indian agriculture (2017), https://inc42.com/buzz/agriculture-agritech-india-startups/
Detection of Hate Speech and Offensive Language in Twitter Data Using LSTM Model Akanksha Bisht, Annapurna Singh, H. S. Bhadauria, Jitendra Virmani and Kriti
Abstract In today’s world, internet is an emerging technology with exponential user growth. A major concern with that is the increase of toxic online content by people of different backgrounds. With the expansion of deep learning, quite a lot of researches have inclined toward using their deep neural networks for abundant discipline. Even for natural language processing (NLP)-based tasks, deep networks, specifically recurrent neural network (RNN), and their types are lately being considered over the traditional shallow networks. This paper addresses the problem of hate speech hovering on social media. We propose an LTSM-based classification system that differentiates between hate speech and offensive language. This system describes a contemporary approach that employs word embeddings with LSTM and Bi-LSTM neural networks for the identification of hate speech on Twitter. The best performing LSTM network classifier achieved an accuracy of 86% with early stopping criterion based on loss function during training. Keywords Sentiment analysis · NLP · Deep learning · Hate speech · Offensive language · Bi-LSTM · LSTM · Twitter
A. Bisht · A. Singh · H. S. Bhadauria G B Pant Institute of Engineering and Technology, Pauri Garhwal, Uttarakhand, India e-mail: [email protected] A. Singh e-mail: [email protected] H. S. Bhadauria e-mail: [email protected] J. Virmani (B) CSIR—Central Scientific Instruments Organization, Chandigarh, India e-mail: [email protected] Kriti Thapar Institute of Engineering and Technology, Patiala, Punjab, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 S. Jain and S. Paul (eds.), Recent Trends in Image and Signal Processing in Computer Vision, Advances in Intelligent Systems and Computing 1124, https://doi.org/10.1007/978-981-15-2740-1_17
243
244
A. Bisht et al.
1 Introduction The ever-increasing number of messages, posts and comments on social media platforms allow people to express their opinions freely to the world. “Words are singularly the most powerful force available to humanity. This study can choose to use this force constructively with words of encouragement, or destructively using words of despair. Words have energy and power with the ability to help, to heal, to hinder, to hurt, to harm, to humiliate and to humble.” [1] Indeed, words are most powerful of all. With the rise of social media, people have become more expressive. They now have global platforms to share their views and opinions about anything or anyone. The world has never been this transparent. People have now gained the power of free speech over social networking sites (SNS). Unfortunately, this has also created an open environment, which is vulnerable to toxic contents, like hate speech or offensive language. The toxic can be of different kinds such as hate speech or the use of offensive language. The occurrence of hate speech has become quite frequent in the internet and social media. Hate speech and offensive language differ in the subjectivity of attack. Also, different countries have different legislative norms when dealing with this issue. Although hate speech has been protected under free speech provision in countries like USA, there are some countries where they categorize hate speech as a punishable offense and have laws prohibiting them. Often social media services such as Facebook and Twitter were criticized for not having a strict measure concerning the usage of toxic speech. Considering this, on May 31, 2016, the European Union made Facebook, Google, Microsoft and Twitter to jointly agree on their code of conduct that compels them to review and eliminate illegal hate speech posted on their services within 24 h. EU defines hate speech [2] as: All conduct publicly inciting to violence or hatred directed against a group of persons or a member of such a group defined by reference to race, color, religion, descent or national or ethnic.
From the past decade, several sentiment analysis studies have been conducted related to hate speech in different social networking and micro-blogging sites like Facebook, Twitter, Reddit and YouTube. In this work, a sentiment analysis classification system was developed that utilizes the concept of deep learning to study the occurrence of hate speech on Twitter. The study tries to improve the classification by the use of a multi-step classifier and word-embeddings with a deep neural-based network. The study contributes to the field in a three-fold manner as: • Propose a methodology that employs deep neural networks for textual data to learn deep features and can later be used for multi-class classification of tweets as hate speech, offensive language or neither. • Investigate the suitability of adapting pre-trained word-embeddings (GloVe wordembeddings) to individually convert every single token into vectors better known as embedding matrix.
Detection of Hate Speech and Offensive Language in Twitter Data …
245
• The study experiments with stacked LSTM and Bi-LSTM networks and delineate their effect on results. The results henceforth define how these experimental setups overcome the drawbacks of the preferred baseline models. The later sections of the paper are organized as follows. A detail introduction to hate speech, its subtypes and examples are defined in Sect. 2. Later, Sect. 3 portrays a concise review of the previous sentiment analysis methods, intend of the study and the algorithms targeted on the same datasets that are used in this research. In Sect. 4, the methodology, the proposed system and its workflow are presented in detail. The experimental setup and obtained results are mentioned in Sect. 5. On top of that, Sect. 6 concludes the paper by highlighting the key points.
2 The Problem of Hate Speech The internet has given people access to a global platform where they can express their feelings and opinions. There is an enormous amount of opinionated data available on the web. This data serves well in the area of sentiment analysis, text analysis, big data or data mining. Many a time, people post hate speech or use offensive language to express their views. These kinds of posts on social media may be hurtful to some people of certain religion or race or gender. The problem of unfiltered hate speech has also encouraged turning up group-based hatred against some minority. So, hate speech identification has become a significant task of sentiment analysis. Once hate speech is detected, the respective organization can then decide how to deal with them. There is no exact definition of hate speech. An online dictionary [3] defines hate speech as: Hate speech is speech that attacks a person or a group based on protected attributes such as race, religion, ethnic origin, national origin, sex, disability, sexual orientation, or gender identity.
In brief, hate speech is a speech inclined toward any particular social group in intention to harm them. Hate speech and offensive language differ in the subjectivity of attack. Also, different countries have different legislative norms when dealing with this issue. As of legal terms of most European countries and Canada, it is strictly illegal to use hate speech. While in the USA, hate speech lies under free speech provisions. If further classified, a hate speech can have the following subtypes as defined in Table 1. A certain English word has been masked as F in the upcoming examples. Several social networking organization deals with hate speech and offensive language frequently. Each one of them has different policies and concerns regarding the use of hate speech by its users. Hate speech can be of whichever kind as defined in Table 1. In this section, it will enlighten several conditions where Twitter considers tweets to be hate speech or offensive and convoy actions to the correlated reports. The Twitter hate speech policy [4] employ if it falls under any of the following categories.
246
A. Bisht et al.
Table 1 Subtypes of hate speech and examples Class
Description
Targeted words
Sexism
Hate speech biased toward a particular sex or gender, for instance, girl
Pregnant people, sexist people, cunt
Body
Hate speech toward a person based on their body type
Fat man, F tall, ugly people
Racism
Hatred toward a person of belonging to a certain race
Black people, nigga, nigger, white people
Ideology
Hate speech for a person who follows an ideology
Feminism, aesthetic people
Religion
Hate speech for a religion
Muslim people, Jewish people
Origin
Hate speech bases on a person’s place of origin, e.g., hate speech for Asians
Asians people, Canadian people, African people
Homophobia
Hate speech toward the sexual orientation of a person
Gay people, Lesbian people, straight people
Disability
Hate speech based on a disability
Retard, mental, dyslexic people, bipolar
Examples of Hate Speech • Violent Threats—Twitter has a zero-tolerance policy for the contents that attack with violent threats to any individual. These threats are statements that pronounce to hurt someone or to seriously injure them or inflict death to an identifiable target. The tweets like “I will kill you” are termed as a violent threat. • Wishing or hoping for someone’s harm—Twitter hateful conduct policy points out that they don’t permit any content that wishes, hopes, promote, encourages or expresses the desire for someone’s harm or death or a lasting disease. For example, tweets like “I wish you die of tumor” or “I hope a car ran over him.” • Provocative fear about a category—Tweets that incite others to be feared or scared of some protected community or religion. As in—“Those [religious group] are just murderers.” The policy also protects against the dehumanization of an individual or a group of people based on their religion or community. • Hateful imagery—Twitter’s policy on hateful images prohibits the users to post images, logos, banner or icons that rely on promoting resentment of a certain religion, community, gender, races, disability or sexual orientation. For instance, “Nazi swastika” is a historical image of a hate group. Twitter defines above as hateful and offensive as per their policy. The content is said to be hate speech if it is any of the above categories but not limited to these. Social media calls such messages as sensitive content and tries to steer clear of these hate speech in a very strict manner.
Detection of Hate Speech and Offensive Language in Twitter Data …
247
3 Related Works The extraction and analysis of text-based data have emerged out to be an active research field. Owing to the global availability of such data, text analytics has acquired a lot of attention. Several studies are being conducted on these data over the past decade; the only differences are the methods used and the targeted domain. Table 2 presents a brief investigation of previous work carried out for hate speech detection. As seen in Table 2, most of the studies were implemented on Twitter data. The reason is the effortless availability of tweets that can be crawled using the Twitter API. Out of all, the majority of the research focuses on the identification of hate speech and differentiating them with non-hate (or offensive) texts. The author in [5] devised a classification system for tweets. The system primarily focuses on “sexism” class and distinguishes them from “Hostile”, “Benevolent” or “Other.” Similar studies [6] correspond to results to classify tweets as racist, sexist or neither by applying FastText classifier [7] and SVM. They treated “Hostile” tweets in the “racist” class itself. With the power of deep learning we can find better data representations for classification and hence are widely explored for NLP tasks. The two popular DNNs that have been beneficial in this area are—convolution neural network (CNN) and recurrent neural network (RNN). In this section, we give a brief overview of relevant recent studying paying particular attention to RNN-based hate speech classification. To be more explicit, we have put in nutshell the studies conducted in the past decade. In recent years many efforts have been made to identify hate speech using data crawled from social media sites such as Twitter and Facebook. The studies [8, 9] imply bag-of-words to classify hate speech in tweets but rather lead to a high misclassification ratio. The study by [8] for racist speech found 86% of the time identification of racist tweets was merely because of the occurrence of offensive language. This statement made hate speech detection severely challenging by drawing a line between hate speech and offensive language. The major difference is often based on the subtleties of using languages. In some cases, CNNs, when used with word embeddings, have also emerged as a possible solution for the classification of toxic content. Following this approach, the author of [10] has identified the abusive (sexism or racism) text on Twitter and distinguished them from the non-abusive tweets. Another study [11] also used the same network to predict four hate speech classes and achieved a bit higher score than the character n-grams model. In a subsequent study [12] the authors evaluated Facebook data for binary classification for “hate” and “no-hate” class. The results concluded that a simple LSTM classifier did not enhance to a great extent the performance than that of an SVM classifier. The experimental results performed [13] on Twitter used LSTM with n-gram feature extraction followed by gradient boosted decision trees. So far, this model achieved the highest accuracy than any other learning techniques used on the same dataset. For SemEval-2019 task, [14] introduces a method that analyzes Twitter for
248
A. Bisht et al.
Table 2 Cartography of existing research in hate speech detection Citations
Classes
Model
Dataset used
Djuric et al. [19]
2 classes (Hate, Non-hate)
NLP
Yahoo
Waseem [5]
3 classes (Sexism, Racism, None)
Empirical
Twitter
Jha and Mamidi [6]
3 classes (Hostile, Benevolent, Others)
SVM, SEquation 2Seq
Twitter
Kwok and Wang [8]
2 classes (Racist, Non-racist)
NB
Twitter
Burnap and Williams [9]
2 classes (Yes, No)
SVM
Twitter
Basile et al. [20]
2 classes (Hate, Non-hate)
SVM
Twitter
Vigna et al. [12]
2 classes (Hate, Non-hate
LSTM
Facebook
Badjatiya et al. [13]
3 classes (Sexism, Racism, Neither)
CNN, LSTM, FastText
Twitter
Gamback and Sikdar [11]
4 classes (Sexism, Racism, Both, Neither)
CNN
Twitter
Park et al. [10]
3 classes (Sexism, Racism, Neither)
CNN
Twitter
Georgakopoulos et al. [21]
2 classes (Hate, No-hate)
LR, LSTM, Bi-SLTM
Fox News
Zhang et al. [22]
2 classes (Toxic, Non-toxic)
LR, SVM, CNN
Wikipedia
Watanabe et al. [23]
3 classes (Sexist, Racist, Neither)
SVM, CNN + LSTM
Twitter
Nobata et al. [24]
2 classes (Abusive, Clean)
NLP
Yahoo
Pitsilis et al. [25]
3 classes (Sexism, Racism, Neither)
LSTM
Twitter
Mathur et al. [26]
3 classes (Hate-inducing, Abusive, Neither)
CNN
Twitter
Vandersmissen [27]
3 classes (Sexual, Racist, Irrelevant)
SVM, NB
Twitter
Mathur et al. [28]
2 classes (Hate speech, Abusive)
CNN-LSTM
Twitter (continued)
Detection of Hate Speech and Offensive Language in Twitter Data …
249
Table 2 (continued) Citations
Classes
Model
Dataset used
Agarwal and Sureka [29]
1 class (Hate)
KNN, SVM
Twitter
Ibrohim and Budi [14]
3 classes (Abusive, Not abusive, Offensive)
NB, SVM, RF
Twitter
Note NLP Natural language processing, SVM Support vector machine, Seq2Seq Sequence to sequence, NB Naïve Bayes, CNN Convolution neural network, LSTM Long short-term memory, LR Linear regression, Bi-LSTM Bi-directional LSTM, KNN K-nearest neighbor, RF Random Forest
hate speech against immigrants and women. They applied innumerable statistical ML as well as DL classifiers on English and Spanish tweets. The framework in [12] represents RNN-based neural network classifier to filter offensive tweets. Similar to that, the authors of [15] have also used the LSTM model for the classification of the same classes for data of Twitter and Reddit. The majority of the studies mentioned above focus on whether speech is offensive or not. It is easier to club/arrange a list of offensive words and classify the rest as non-offensive as shown in Table 3. The only exception in the above experiments is [16] which measures the severity of the offense a statement will invoke. The degree of offense is very subjective and would differ from person to person, place to place and dataset to dataset. A great place to express your emotions and opinions online is Twitter. Time and again researchers use Twitter as a dataset due to the easy availability of datasets and also the sundry types of examples of different forms of expressions present on Twitter are endless. Short statements that have less than 280 characters also help in limiting the number of tokens generating per tweet. The same cannot be said for Facebook, Wikipedia and Reddit. The work present in this paper is more in line with the citations of Table 4. This study will try to identify the negative sentiment of a set of statements but the type of sentiment it invokes is different. For example, sadness and melancholy are different types of phases where a person isn’t in a happy state.
4 Methodology Hate speech detection is a prominent application of sentiment analysis. Hence, in Fig. 1 basic steps that are involved in the task sentiment classification or opinion mining are mentioned. It gives a brief idea of how to create a classifier for predicting sentiments. The various steps of the analysis are briefly explained below. Later, Fig. 6 presents a layered architecture of LSTM-based classifier used in this study.
250
A. Bisht et al.
Table 3 Cartography of existing research in identification of offensive language Citations
Classes
Model
Dataset used
Razavi, et al. [30]
2 classes (Flame, Okay)
NB
NSM, UseNet
Xiang, et al. [31]
2 classes (Offensive, Not offensive)
SVM, LR, RF
Twitter
Zampiere, et al. [32]
2 classes (Offensive, Not offensive)
SVM, LR, RF, LSTM, Bi-LSTM, RNN, GRU
Twitter
Xu, et al. [33]
2 classes (Offensive, Not offensive)
NLP
YouTube
Bretschneider, et al. [16]
2 classes (Offensive, Severely offensive)
NLP
Facebook
Wiedemann, et al. [34]
2 classes (Offensive, Other)
Bi-LSTM – CNN
Twitter
Santos, et al. [15]
2 classes (Offensive, Not offensive)
LSTM
Twitter, Reddit
Rother, et al. [35]
2 classes (Offensive, Other)
LSTM
Twitter, Wikipedia
Mubarak et al. [36]
3 classes (Offensive, Obscene, Clean)
NLP
Twitter
Note NB Naïve Bayes, NSM Natural semantic module, SVM Support vector machine, LR Linear regression, RF Random Forest, LSTM Long short-term memory, Bi-LSTM Bi-directional LSTM, RNN Recurrent neural network, GRU Grated recurrent unit, NLP Natural language processing
Datasets Dataset quality is mandatory with careful labeling that can train the network with better learning. To avoid wastage of time, the study is inclined toward a pre-annotated Twitter dataset that is publicly available. The dataset mentioned in [39] by Cornell University seemed most suitable for this task. The dataset Crowdflower website contains tweets crawled from hatebase.org. It contains approximately 25K tweets with three labels as—hate speech, offensive language (but not hate speech) and neither. But, after analyzing the entire dataset, it was found that the dataset has a strong class imbalance for “hate speech” tweets. On average, only 6% of the entire dataset has tweets with label “hate speech” compared to the other two classes. For that reason, here another dataset from Crowdflower itself is also considered. This new dataset had 15K tweets with two classes—hate speech and not a hate speech. After a lot of manipulation, it was concluded to use both the datasets and create a new dataset using them.
Detection of Hate Speech and Offensive Language in Twitter Data …
251
Table 4 Cartography of existing research for identification for both hate speech and offensive language Citations
Classes
Model
Dataset used
Almeida et al. [37]
3 classes (Ódio (i.e. hate), Offensivo, Regular)
KNN
Twitter
Gaydhani et al. [38]
3 classes (Hate, Offensive, Clean)
NLP
Twitter
Gröndahl et al. [39]
3 classes (Hate, Offensive, Ordinary)
LR, MLP, CNN + GRU, LSTM
Twitter
Gao and Huang [40]
3 classes (Hate, Offensive, Clean) 6 classes (Toxic, Obscene, Insult, Hate, Severe Toxic, Threat)
CNN, LSTM, Bi-LSTM, Bi- GRU
Twitter Wikipedia
Davidson et al. [17]
3 classes (Clean, Offensive, Hateful)
RF, LR, SVM, NB
Twitter
Note KNN K-nearest neighbor, NLP Natural language processing, LR Linear regression, MLP Multilayer perceptron, CNN Convolution neural network, GRU Grated recurrent unit, LSTM Long short-term memory, Bi-LSTM Bi-directional LSTM, Bi-GRU Bi-directional GRU, RF Random Forest, SVM Support vector machine, NB Naïve Bayes
Fig. 1 The basic workflow of a text-based general sentiment analysis pipeline
Now, the dataset that we created has three classes—hate speech, offensive language and neither. Table 5 defines the data distribution for different classes in this training and testing data. The training data has 9600 tweets with 3200 tweets in each class. The balanced data prevents the network to be biased and hence learn all the
252
A. Bisht et al.
Table 5 Description of dataset (Labels)
Total tweets
Hate speech (0)
Offensive (1)
Neither (2)
Training set
9600
3200
3200
3200
Testing set
800
249
250
301
Table 6 Sample tweet and their labels form the dataset Tweet
Label
#WestVirginia is full of White Trash.
Hate speech
“Typing like a retard to make the other person look dumber by pretending to agree with them” bazinga
Hate speech
I’m finally to the point where you disgust me
Offensive language
F hate windows… This F recovery error is pissing me off so much
Offensive language
#Yankees #Jeter Let him play the entire inning. That’s fitting
Neither
I never did those games, but I was obsessed with Angry birds. Which is why it took me so long to do CCS. I was scared. Lol
Neither
classes without discrimination. That’s why it is beneficial to always use a balanced dataset to avoid creating such a situation. As seen in Table 5, for training the study has taken a balanced dataset for each class. Even for the test data, efforts were made to make the class frequency comparable. From the original datasets, only the class and tweet were taken and the rest of the data was avoided as they were irrelevant for this study. For language concern, the frequently used English curse word is masked in this report as F but the actual training data is untouched. Pre-processing The data that is collected might contain a lot of junks and irrelevant data. So, the pre-processing of the data is required to enhance the data. In this, data is cleaned up for further tasks. The basic pre-processing tasks involved are: • Lower-case conversions—Converting all the letters to lower case for better understanding. • Removing stop-words and punctuations—Strip all the stop-words and punctuations [;:?” ~‘ from the words as they don’t hold any sentiment. • Removing special characters—Remove special characters [@ # $ &] to make data more accurate. • Tokenization of data—Break down the sentences and phrases into smaller tokens. • Stemming of words—Group together with the words that root out from a similar word. For example, “presentation”, “presented” and “presenting” are all the stems of “present.” Textual data comprises several components that might not be beneficial concerning the target analysis. Hence, a clean-up of this data is mandatory before handing it
Detection of Hate Speech and Offensive Language in Twitter Data …
253
Table 7 Data before and after pre-processing Raw tweet
Pre-processed tweet
#WestVirginia is full of White Trash
“is” “full” “of” “white” trash”
“Typing like a retard to make the other person look dumber by pretending to agree with them” bazinga
“typing” “like” “a” “retard” “to” “make” “the” “other” “person” “look” “dumber” “by” “pretending” “to” “agree” “with” “them” “bazinga”
I’m finally to the point where you disgust me
“I’m” “finally” “to” “the” “point” “where” “you” “disgust” “me”
F hate windows… This F recovery error is pissing me off so much
“f” “hate” “windows” “this” “f” “recovery” “error” “is” “pissing” “me” “off” “so” “much”
#Yankees #Jeter Let him play the entire inning. That’s fitting
“let” “him” “play” “the” “entire” “inning” “that’s” “fitting”
I never did those games, but I was obsessed with Angry birds. Which is why it took me so long to do CCS. I was scared. Lol
“i” “never” “did” “those” “games” “but” “i” “was” “obsessed” “with” “angry” “birds” “which” “is” “why” “it” “took” “me” “so” “long” “to” “do” “ccs” “i” “was” “scared” “lol”
over to the model. This study employs the above pre-processing tasks on the dataset. Table 7 shows the data before and after the pre-processing tasks. Dense Word-Embeddings Operations on single strings like dot products or back propagation cannot be done. Therefore, instead of a string, they are converted into vectors and feed them for the task. These vectors are created in such a way that they represent each word as its meaning or context or semantics. For example, let’s have vectors of two words “love” and “adore” to reside in close proximity (Fig. 2). In the vector space they must have 140 120 100 80 60 40 20 0
Love
Adore
Fig. 2 Related words embedded within proximity
Basketball
254
A. Bisht et al.
similar meaning or context. The vector representation of a word is also identified as a word-embedding. Words with similar contexts are placed close to each other in a vector space. Consider these sentences, for the previous words “love” and “adore”— I love taking long walks on the beach. My friends told me that they love popcorn. The relatives adore the baby’s cute face. I adore his sense of humor.
From the context of the sentences, it can be seen that both words have positive connotations and generally precede nouns or noun phrases. This can indicate that both words might be a synonym. Context is also significant while considering grammatical composition in sentences. The most popular pre-trained word-embeddings are Word2Vec and GloVe (global vectors). This work has employed the pre-trained GloVe word-level embedding of 100 dimensions. The word-embeddings analyze the tokens of sequences and convert them into vectors. Each sequence is now represented as a 100-dimensional wordembedding matrix. Deep Learning for Text Analysis The classification model can be trained using a machine learning algorithm or a deep neural network. This study utilizes the concept of deep learning for training and classification. For sequential data, in general, RNNs are used which can further be an LSTM or GRU model. Given below is a detailed understanding of RNN and LSTM deep neural networks. Recurrent Neural Network (RNN) In NLP, word and sentences are analyzed, where each word in a sentence depends upon the word that comes before and after it. For such dependency, we have a recurrent neural network (RNN). The RNN is slightly different from the long-established feedforward NN we know about. We know that a feed-forward network comprises input nodes, hidden units and output nodes. RNN differs from the feed-forward neural network because of its temporal aspects. In RNNs (Fig. 3), every word in an input sequence will be related to a definite timestamp. And the total number of timestamp will be the maximum length of the sequence.
Fig. 3 Processing a sequence in RNN
Detection of Hate Speech and Offensive Language in Twitter Data …
255
Fig. 4 A single RNN cell
Each timestamp is associated with a hidden state vector ht . This vector summarizes and encapsulates the data from the previous timestamp. This hidden state functions as a current state word vector and the previous timestamp has hidden state vector as well. An RNN cell (or timestamp in RNN) has a hidden state vector ht and the input value xt as shown in Fig. 4. These two are then combined through concatenation operation. The combination then passes through tanh activation and later produces a new hidden state. This hidden state is then forwarded to the next timestamp. RNN has tanh activation function that regulates the flow of the network. The tanh takes the input value and squishes it between −1 and 1. While traveling through a neural network vectors go through much such math operation due to which the actual meaning of the vector may be affected. By the end of the network, some values become astronomical and lose their significance. The situation then leads to the problem of vanishing gradient. Facing this, layers with gradient updates tend to stop learning. Thus, RNNs have short-term memory as it forgets data for long sequences. Owing to the above factors, although RNNs seem to work brilliantly for short sequences, for long sequences its other variants are considered. The long short-term memory (LSTM) and grated recurrent unit (GRU) are two variants of RNN created to resolve the problem of short-term memory and vanishing gradients. Both of them have gates as an internal mechanism that helps in regulating the flow of the network. Let’s have an insight of the LSTM network. Long Short-Term Memory Units (LSTM) Long short-term memory (LSTM) networks are a particular kind of RNN capable of analyzing and learning long-term dependencies. LSTMs are explicitly considered to deal with the long-term dependency problem. They can easily remember any piece of information as long as it is required, best for while working on a sequence of sentences. Long short-term memory units are modules that can be used inside of recurrent neural networks. At an advanced level, it makes sure that ht can encapsulate information about long-term dependencies in the text. LSTM is adopted whenever dealing with long-term dependencies. Let us look at the following example.
256
A. Bisht et al.
Fig. 5 A single LSTM cell
Passage: “The first number is 3. The dog ran in the backyard. The second number is 4. Question: “What is the sum of the 2 numbers?” Here, the sentence in the middle had no use for the question asked. However, the first and third sentences are strongly connected. With a traditional RNN, the hidden state vector might have stored more information about the second sentence rather than about the first sentence. Overall, the additional LSTM unit determines what information is required and more useful. Passage: “The first number is 3. The dog ran in the backyard. The second number is 4.” Question: “What is the sum of the 2 numbers?” LSTM is a variant of RNN so the basic working of both is the same. An LSTM cell (Fig. 5) is similar to the RNN except for the gates and cell state. LSTM has three gates—input gate, forget gate and output gate. Each gate has its purpose and computes different operations. The gates in LSTM contain sigmoid activation function. Sigmoid is similar activation to tanh. As described earlier, tanh activation squishes the value between −1 and 1. Similarly, a sigmoid activation squishes the value between 0 and 1. This helps in deciding whether to update or forget data. Any value when multiplied by 0 is 0, hence is forgotten. Likewise, any value multiplied by 1 remains the same and is kept. Hence, all the values inclined toward 0 are forgotten and the values inclined toward 1 are kept. Bi-directional Long Short-Term Memory Units (Bi-LSTM) Bi-LSTM is a variant of LSTM network that works both in the direction of the network. In this, the network flows in feed-forward as well as in a feedback loop for the units. This is quite beneficial when dealing with natural languages as in a sentence the meaning of each word depends on its neighboring words as well. A meaning of a word can change on account of its surrounding words and the overall sense of the sentence. This is why it is better that the network is trained in both ways. Let’s consider the following sentences: I accessed a bank account. My house is situated near the bank of the river.
Detection of Hate Speech and Offensive Language in Twitter Data …
257
Here, the word “bank” has a different meaning in the two sentences as “bank account” or “bank of a river.” It depends upon the context of its usage. A bi-directional model learns to differentiate between the two by learning the data from left-to-right and right-to-left. Stacked Neural Networks Stacked neural networks are nothing but using more than one layer of the network stacked one after the other. In this study, we have employed a three-layer stacked LSTM/Bi-LSTM network to train this classifier. Like a simple LSTM/Bi-LSTM network, stacked LSTM/Bi-LSTM evaluates the text to get prosperous contextual information, in case of stacked Bi-LSTM from previous and next time sequences.
(a). Simple LSTM neural network
(b). Stacked LSTM neural network Fig. 6 Layers of different variant of LSTM/Bi-LSTM-based networks
258
A. Bisht et al.
Figure 6b shows a peephole of three-layer LSTM classification system architecture. In contrast to a simple LSTM/Bi-LSTM, stacked networks have more upper layers to further imply more feature extraction while a prior one only has a single hidden layer. Training a Classifier A classification model can be constructed using deep neural networks. The model is trained using pre-annotated training data that has been already classified as hate speech, offensive language or neither. Now, a classification model is developed to evaluate the data and categorize them into different classes. This allows the model to understand the sentiments of a variety of sentences. In this proposed classification system, we have developed a selection of LSTM/Bi-LSTM-based classifiers. Each classifier has five base layers that are similar in all of them. The base layers applied are: • Sequence Input Layer—This layer takes the sequential input value, in this case, the embedding matrix of dimension 100. This particular layer is used only to fetch sequential data. • LSTM/Bi-LSTM Layer—The second layer is the LSTM or Bi-LSTM layer. This layer takes the input value from the previous layer of size 100 and learns them. Based on the other experiments of this study, the output space dimensionality for this layer was set up to 100 hidden units (or nodes). The hidden layer has sigmoid activation in its set. • Fully Connected Layer—The fully connected or dense layer obtains the output of the LSTM/Bi-LSTM layer to improve the learning. This enhances the stability of the output. The network has three fully connected layers for three different output classes. • Softmax—This layer is a Softmax activation function that pushes the resultant values from a dense layer to output classes. It is generally implemented before the output layer. This layer calculates the possibility of each possible class and has the same number of nodes as of output class. • Classification Layer—This is the output layer that provides the predicted value as hate speech, offensive language or neither. For stacked neural networks, both the variants have three layers of LSTM/BiLSTM stacked one after the other each with 100 hidden units. Figure 6 presents variants of LSTM-based classifiers used in this study. In total, the data are experimented with four different setups of the proposed LSTM classifier. The results from each one are presented in the later section. Figure 7 illustrates the layered architecture of this work for a simple LSTM and Bi-LSTM classification system. As mentioned above, the network is trained on pre-annotated tweets. The experimental setups are briefly explained below. After the classifier is trained, the accuracy of data is tested using the testing data. This is required to check how well this model understands data. The model then predicts the output value—in this case as hate speech, offensive language or neither. The results obtained are explained and illustrated in Sect. 5.
Detection of Hate Speech and Offensive Language in Twitter Data …
259
Fig. 7 Layer architecture of simple LSTM/Bi-LSTM classification system
5 Experiments and Results Experimental Settings To get better results several hyperparameters were used and incorporated. This report shows all the results along with the best performing network among all. Out of all the optimization algorithms, adaptive moment estimation (Adam) performed marvelously well. Table 8 gives an insight into the training metric used in Adam optimizer to train this network. This analysis opts for Adam optimizer as it improved the learning rate as well as the training accuracy for the classifier. It was also analyzed that by increasing the size of epochs, the accuracy was enhanced. To attain the best-trained model, some iterations were made by altering the size of epoch and mini-batch. The abovementioned parameters trained the resulting model. Stopping Criterion—In imitation of training for a time after half of the iterations, training loss became smaller and seemed constant. Hence, an early stopping criterion was applied and manually the training progress stopped. The resulting model achieved a training accuracy of 98.24% with minimum data loss. For better exploration, this study performs four experiments for filtering hate speech and offensive language in tweets. The series of experiments executed are: Experiment 1—In this, a simple LSTM classifier is developed with one single LSTM layer as illustrated in Fig. 6a. First, it has a sequence input layer that fetches the Table 8 Training metrics
Parameters
Value
Optimizer
Adam
Initial learn rate
0.01
Max epochs
300
Mini batch size
300
Gradient threshold
1
Max iteration
9600
260
A. Bisht et al.
sequential input. This layer then passes the data to the LSTM layer which incorporates 100 hidden units in a layer for the sequence input data. The data is then forwarded to the fully connected layer with an output size of 3. In addition to this, Softmax was used as an activation function. Later, the output value was generated by the classification layer. Experiment 2—The architecture in this employs a single Bi-LSTM layer along with the above-mentioned base layer, with the same dimensions as of Experiment 1. It uses 100 hidden units of Bi-LSTM to create a simple Bi-LSTM model. The rest of the layers and their dimensions are similar to the other experiments. Experiment 3—Here, the stacked neural networks are implemented. This comprises three stacked LSTM layers put one after the other (Fig. 6b). Each of the LSTM layers has 100 units as a hidden node whose output is passed on to the next layer simultaneously. The word vector travels from sequence input layer to the LSTM layers. The value then moves to the dense fully connected network and later deals with the Softmax activation that pushes it to the next layer. The classification layer analyzes the value and predicts the respective classes. Experiment 4—The sequence layer takes the word vector as the input for the network. Similar to Experiment 3, this one also uses stacked neural networks. For learning, three stacked Bi-LSTM layers are created, placed one after the other. The data passes from one Bi-LSTM to another and later to a fully connected layer. The resultant is then produced by the classification layer that defines which class the sequence belongs to. Interpretation of the Results The classifier is now tested on new tweets. The tweets were collected from Crowdflower.com. This study has already taken 9600 tweets from the dataset for training. Now from the same dataset, 800 random tweets were taken to test the accuracy of this model. The model studies the testing data and predicts them as hate speech, offensive language or neither. This analysis used LSTM and Bi-LSTM to build up a sentiment classifier. Quite a few researchers have used stacked neural networks in their classification model. Hence, this work also tried its hands on this by layering three layers with the same hidden units in both LSTM and Bi-LSTM networks. Although all the networks resulted in a similar accuracy region as illustrated in Table 9, the simple LSTM classifier (Experiment 1) tends to outperform the rest of the networks by yielding an accuracy of 86%. Furthermore, Experiments 2 and 4 attained accuracy in the vicinity of the best performing LSTM network. Table 9 Accuracy measures (in %) acquired from different experiments
Experiments
Accuracy (in %)
Experiment 1
86.00
Experiment 2
84.62
Experiment 3
83.00
Experiment 4
84.00
Detection of Hate Speech and Offensive Language in Twitter Data …
261
Sometimes the classifier is not able to perfectly understand the data as of which class it belongs and labels it as to different classes. The analysis of the above heat map (or confusion matrix) is explained in the following table. Table 10 shows the testing accuracy percentage for each class made by the best performing model. The key intention of this study which was to progress hate speech accuracy was achieved, as not more tweets of this class were misclassified. For this, it obtained an accuracy of 89.95% for the “hate speech” class. Comparison with the Baseline models The first base model is of [17] which has performed hate speech detection using a variety of techniques on parts of this dataset. Davidson and other authors have incorporated several statistical classifiers, namely logistic regression, linear SVM, Naïve Bayes and Random Forest in their work. The authors created the sentiment classifier using 25K tweets that were annotated by the workers of Crowdflower.com. Although the author has not used any deep learning models in their work, the research still serves as suitable as a performance baseline model for this work. During the course of our study, it defines (1) use of deep learning for text analytics and sentiment analysis and classification, (2) use of word-embedding technique, (3) developing a multi-class classification system and (4) moreover, balancing data as compared to the baseline model. Although the baseline model achieved an overall F1 score of 91%, a huge drawback in their work is the strong imbalance for data of class “hate speech” as compared to the other classes. The results of the study show how imbalance data heavily affect their results. Even though they got a high recall for the most populated classes as 91% for “offensive language” and 95% “neither”, the recall for “hate speech” class was only 61%. With a major confusion for hate speech data, the model fails terribly. Hence, the objective of this model is to improve hate speech accuracy. For that reason we created a dataset with equal data in each class. Although this model got an overall accuracy of 86% by analyzing Table 6, it can be seen that it obtained an accuracy of 89.95% for hate speech class. This particular result shows that this model improves the accuracy of one of the baseline model concerning the hate speech class. For developing LSTM models, this research has also considered another base paper for this study. The authors of [18] have utilized parts of this dataset for training their classifiers. They have experimented with several deep neural networks along with variants of LSTM. For feature study, they implied on GloVe and FastText word-embeddings. A comparison of this analysis was conducted with their classifier that uses LSTM and Table 10 Misclassification results for best performing LSTM classifier Class
Total tweets
Predicted tweets
Misclassified tweets
Accuracy (in %)
Hate speech
249
224
26
89.95
Offensive language
250
179
71
71.60
Neither
301
279
32
89.36
262
A. Bisht et al.
GloVe for classification. The study achieved an F1 score 78.1% for LSTM (GloVe) and 78.3% for Bi-LSTM (GloVe). In comparison, these models acquired an accuracy of 86% for LSTM (GloVe) and 80.1% for Bi-LSTM (GloVe). However, a direct comparison cannot be made between the baseline model and the present study as the authors in [17] had used hate speech data from a single data source, where a class imbalance in dataset was observed. In order to combat this flaw, tweets form other data sources were combined to equalize the dataset. Thus, dataset has been done to carry out the present research work and design a robust LSTM based classification system for hate speech data.
6 Conclusions The internet has given people access to a global platform where they can express their feelings and opinions. Many a time, people post hate speech or use offensive language to express their views. The problem of unfiltered hate speech has also encouraged turning up group-based hatred against some minority. So, hate speech identification has become a significant task of sentiment analysis. This analysis aims to develop a sentiment analysis classification system that utilizes the concept of deep learning to study the occurrence of hate speech and offensive language on Twitter. The present work tries to improve the classification by the use of a multi-step classifier and word-embeddings with a deep neural-based network. The goal of the present work is to overcome the limitations of the baseline models: (1) effect of imbalance data on the results and (2) improving accuracy for LSTM and Bi-LSTM classifier. This study experiments with a variety of LSTM and Bi-LSTM model— simple deep neural network and stacked networks. This study later attains an accuracy of 86% for the best performing LSTM model. In comparison to the baseline models, it improves the overall accuracy as well as the accuracy of the “hate speech” class. The above work contributes to the study and analysis of the occurrence of toxic content (hate speech, offensive languages, virtual assaults, etc.) over social media. The entire world is becoming a single entity with the emergence of the World Wide Web (WWW) and the rapid popularity of social media sites. Consequently, drawing off the curtains and increasing public transparency. So, to make this transparent world (i.e. a network of people connected via the internet) safer and better where the users can freely direct their thoughts and opinions, detection and removal of toxic contents are reasonably essential.
Detection of Hate Speech and Offensive Language in Twitter Data …
263
References 1. Quote by https://medium.com/@katiemaeonline/words-are-the-most-powerful-forceavailable-to-humanity-5a42f09dd1ac 2. A. Hern, Facebook, YouTube, Twitter and Microsoft sign EU hate speech code. The Guardian. Accessed 7 June 2016 3. L. Silva, M. Mondal, D. Correa, F. Benevenuto, I. Weber, Analyzing the targets of hate in online social media. arXiv preprint arXiv:1603.07709 (2016) 4. Hateful Conduct Policy (2017), https://support.twitter.com/articles/. Accessed Feb 2017 5. Z. Waseem, Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter, in Proceedings of the First Workshop on NLP and Computational Social Science (2016), pp. 138–142 6. A. Jha, R. Mamidi, When does a compliment become sexist? analysis and classification of ambivalent sexism using twitter data, in Proceedings of the Second Workshop on NLP and Computational Social Science (2017), pp. 7–16 7. A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of tricks for efficient text classification (2016). arXiv preprint arXiv:1607.01759 8. I. Kwok, Y. Wang, Locate the hate: detecting tweets against blacks, in Twenty-Seventh AAAI Conference on Artificial Intelligence (2013) 9. P. Burnap, M.L. Williams, Cyber hate speech on twitter: an application of machine classification and statistical modeling for policy and decision making. Policy Internet 7(2), 223–242 (2015) 10. J.H. Park, P. Fung, One-step and two-step classification for abusive language detection on twitter. arXiv preprint arXiv:1706.01206 (2017) 11. B. Gambäck, U.K. Sikdar, Using convolutional neural networks to classify hate-speech, in Proceedings of the First Workshop on Abusive Language Online, pp. 85–90 12. M. ElSherief, S. Nilizadeh, D. Nguyen, G. Vigna, E. Belding, Peer to peer hate: hate speech instigators and their targets, in Twelfth International AAAI Conference on Web and Social Media (2018) 13. P. Badjatiya, S. Gupta, M. Gupta, V. Varma, Deep learning for hate speech detection in tweets, In Proceedings of the 26th International Conference on World Wide Web Companion (International World Wide Web Conferences Steering Committee, 2017), pp. 759–760 14. M.O. Ibrohim, I. Budi, Multi-label hate speech and abusive language detection in indonesian twitter, in Proceedings of the Third Workshop on Abusive Language Online (2019), pp. 46–57 15. C.N.D. Santos, I. Melnyk, I. Padhi, Fighting offensive language on social media with unsupervised text style transfer. arXiv preprint arXiv:1805.07685 (2018) 16. U. Bretschneider, R. Peters, Detecting offensive statements towards foreigners in social media, in Proceedings of the 50th Hawaii International Conference on System Sciences (2017) 17. T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the problem of offensive language. arXiv preprint arXiv:1703.04009 (2017) 18. B. vanAken, J. Risch, R. Krestel, A. Löser, Challenges for toxic comment classification: an in-depth error analysis. arXiv preprint arXiv:1809.07572 (2018) 19. N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, N. Bhamidipati, Hate speech detection with comment embeddings, in Proceedings of the 24th International Conference on World Wide Web (ACM, 2015), pp. 29–30 20. V. Basile, C. Bosco, E. Fersini, D. Nozza, V. Patti, F.M.R. Pardo, M. Sanguinetti, Semeval2019 task 5: multilingual detection of hate speech against immigrants and women in twitter, in Proceedings of the 13th International Workshop on Semantic Evaluation (2019), pp. 54–63 21. S.V. Georgakopoulos, S.K. Tasoulis, A.G. Vrahatis, V.P. Plagianakos, Convolutional neural networks for toxic comment classification, in Proceedings of the 10th Hellenic Conference on Artificial Intelligence (ACM, 2018), p. 35 22. Z. Zhang, D. Robinson, J. Tepper, Detecting hate speech on twitter using a convolution-gru based deep neural network, in European Semantic Web Conference (Springer, Cham, 2018), pp. 745–760
264
A. Bisht et al.
23. H. Watanabe, M. Bouazizi, T. Ohtsuki, Hate speech on twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6, 13825– 13835 (2018) 24. C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, Y. Chang, Abusive language detection in online user content, in Proceedings of the 25th International Conference on World Wide Web (International World Wide Web Conferences Steering Committee, 2016), pp. 145–153 25. G.K. Pitsilis, H. Ramampiaro, H. Langseth, Detecting offensive language in tweets using deep learning. arXiv preprint arXiv:1801.04433 (2018) 26. P. Mathur, R. Shah, R. Sawhney, D. Mahata, Detecting offensive tweets in hindi-english codeswitched language, in Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media (2018), pp. 18–26 27. B. Vandersmissen, Automated detection of offensive language behavior on social networking sites. IEEE Trans. (2012) 28. P. Mathur, R. Sawhney, M. Ayyar, R. Shah, Did you offend me? classification of offensive tweets in hinglish language, in Proceedings of the 2nd Workshop on Abusive Language Online (ALW2) (2018), pp. 138–148 29. S. Agarwal, A. Sureka, Using knn and svm based one-class classifier for detecting online radicalization on twitter, in International Conference on Distributed Computing and Internet Technology (Springer, Cham, 2015), pp. 431–442 30. A.H. Razavi, D. Inkpen, S. Uritsky, S. Matwin, Offensive language detection using multi-level classification, in Canadian Conference on Artificial Intelligence (Springer, Berlin, Heidelberg, 2010), pp. 16–27 31. G. Xiang, B. Fan, L. Wang, J. Hong, C. Rose, Detecting offensive tweets via topical feature discovery over a large scale twitter corpus, in Proceedings of the 21st ACM international conference on Information and knowledge management (ACM, 2012), pp. 1980–1984 32. M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, & R. Kumar, Semeval-2019 task 6: identifying and categorizing offensive language in social media (offenseval). arXiv preprint arXiv:1903.08983 (2019) 33. Z. Xu, S. Zhu, Filtering offensive language in online communities using grammatical relations, in Proceedings of the Seventh Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (2010), pp. 1–10 34. G. Wiedemann, E. Ruppert, R. Jindal, C. Biemann, Transfer learning from LDA to BiLSTMCNN for offensive language detection in twitter. arXiv preprint arXiv:1811.02906 (2018) 35. K. Rother, M. Allee, A. Rettberg, Ulmfit at germeval-2018: a deep neural language model for the classification of hate speech in german tweets, in 14th Conference on Natural Language Processing KONVENS 2018 (2018), p. 113 36. H. Mubarak, K. Darwish, W. Magdy, Abusive language detection on Arabic social media, in Proceedings of the First Workshop on Abusive Language Online (2017), pp. 52–56 37. T.G. Almeida, B.À. Souza, F.G. Nakamura, E.F. Nakamura, Detecting hate, offensive, and regular speech in short comments, in Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web (ACM, 2017), pp. 225–228 38. A. Gaydhani, V. Doma, S. Kendre, L. Bhagwat, Detecting hate speech and offensive language on twitter using machine learning: an n-gram and tfidf based approach. arXiv preprint arXiv: 1809.08651 (2018) 39. T. Gröndahl, L. Pajola, M. Juuti, M. Conti, N. Asokan, All you need is: evading hate speech detection, in Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security (ACM, 2018), pp. 2–12 40. L. Gao, R. Huang, Detecting online hate speech using context aware models. arXiv preprint arXiv:1710.07395 (2017)
Enhanced Time–Frequency Representation Based on Variational Mode Decomposition and Wigner–Ville Distribution Rishi Raj Sharma, Preeti Meena and Ram Bilas Pachori
Abstract The Wigner–Ville distribution (WVD) gives a very high-resolution time– frequency distribution but diminishes due to the existence of cross-terms. The crossterms suppression in WVD is crucial to get the actual energy distribution in time– frequency (TF) plane. This chapter proposes a method to remove both inter and intra cross-terms from TF distribution obtained using WVD. The variational mode decomposition is applied to decompose a multicomponent signal into corresponding mono-components and inter cross-terms are suppressed due to the separation of mono-components. Thereafter, segmentation is applied in time domain to remove intra cross-terms present due to nonlinearity in frequency modulation. The obtained components are processed to get WVD of each component. Finally, all the collected WVDs are added to get complete time–frequency representation. Efficacy of the proposed method is checked using Renyi entropy measure over one synthetic and two natural signals (bat echo sound and speech signal) in clean and noisy environment. The method presented works well and gives better results in comparison to the WVD and pseudo WVD techniques.
1 Introduction Signal representation plays an important role in analysis and anomaly detection. In general, the activity of a system is recorded per unit time and expression is known as time domain (TD) representation. The TD representation does not contain the R. R. Sharma (B) Department of Electronics Engineering, Defence institute of Advanced Technology, Pune, India e-mail: [email protected] P. Meena · R. B. Pachori Discipline of Electrical Engineering, Indian Institute of Technology-Indore, Indore, India e-mail: [email protected] R. B. Pachori e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 S. Jain and S. Paul (eds.), Recent Trends in Image and Signal Processing in Computer Vision, Advances in Intelligent Systems and Computing 1124, https://doi.org/10.1007/978-981-15-2740-1_18
265
266
R. R. Sharma et al.
information of frequency domain (FD) and frequency resolution becomes zero [1]. In a similar manner, the FD representation contains frequency information but TD resolution becomes zero. The simultaneous information of both TD and FD is much helpful in signal analysis. Therefore time–frequency representation (TFR) possesses superiority over TD and FD representations. Complete TFR signifies the correlated information of TD and FD which is excess detail and much helpful in signal analysis. Various methods are proposed to show the information in TF plane; some of them are short-time Fourier transform (STFT) [1], Hilbert–Huang transform (HHT) [2, 3], Wigner–Ville distribution (WVD) [4, 5], wavelet transform (WT) [6], and eigenvalue decomposition of Hankel matrix. The Fourier transform of TD-segmented signal is STFT which does not give resolution as high as in WVD [6]. A technique based on the mean of amplitude envelope and Hilbert transform (HT) is HHT which becomes less effective due to mixing between components [2, 7]. The scaling- and shifting-based WT also represents signals in TF plane [8]. The eigenvalues of Hankel matrix based signal decomposition method with HT is suggested for TFR [9] and its advanced version is proposed for the signals having complex information [10]. The Fourier transform (FT) gives the information of each frequency at all accumulated time instants and due to which time information vanishes. The autocorrelation provides the correlation between signals and its shifted signal. Therefore, on applying FT over the autocorrelation of signals, the obtained output signal is the function of time and frequency which is termed as WVD. Due to the autocorrelation operation, WVD possesses quadratic property [11]. The WVD method gives very high TF localization and performance which is diminished due to the existence of false components [1, 12]. The presence of false components along with actual components creates confusion to identify the factual information. The identification and removal of false components are necessary for actual signal analysis and necessitate the suppression of false components. The cross-terms existing between two components are termed as inter cross-terms. The cross-terms presented due to the nonlinear modulation of frequency are described as intra cross-terms [11, 13]. The TFR methods are utilized in several fields with different applications like as fault diagnosis in gear [14], data classification [15], physiological signal analysis [16], wind signals [10], speech signals [17], and many others. The TFR based features are applied for coronary artery disease diagnosis [18, 19] and muscle disease detection [20, 21]. Many techniques have been suggested to eliminate the false-terms from WVDbased representation [22–24]. A technique based on windowing is presented to eliminate false components which is termed as pseudo WVD [25]. It fails to eliminate inter cross-terms. The image processing is also applied over the TF plane to remove crossterms but to consider the TF plane as image, texture-based image characteristics have to be assumed [26]. The eigenvalue decomposition of Hankel matrix based technique has been suggested in recent times for cross-terms suppression in WVD [13]. In [27], variational mode decomposition (VMD) based cross-terms reduction method is presented. This method can remove only inter cross-terms and is not able to suppress intra cross-terms. Therefore, the advanced version of [27] is presented in the proposed work which can remove both inter and intra cross-terms.
Enhanced Time–Frequency Representation …
267
The rest of the chapter is organized as follows: WVD explained in Sect. 2, the VMD method for signal decomposition is elaborated in Sect. 3, proposed cross-terms-free TFR method is explained in Sects. 4, 5 gives simulation results and discussion. In the end, the chapter is concluded in Sect. 6.
2 Wigner–Ville Distribution In the process of obtaining WVD, the autocorrelation of input signal is computed and thereafter, FT is applied over the resulting signal [1, 28]. The WVD comes under the family of quadratic distribution and provides very high resolution in comparison to the other TFR such as scalogram, STFT, and HHT. The mathematical equation of WVD computed for a time series signal m(t) is given as follows [29, 30]: WVD(t, ζ) =
+∞
−∞
t − jζt t ∗ m t− e m t+ dt 2 2
(1)
Similarly, the WVD can also be computed using frequency domain signal M(ζ) which is given as follows: WVDm (t, ζ) =
1 2π
+∞
−∞
ζ ζ M∗ ζ − e+ jζ t dζ M ζ+ 2 2
(2)
where the m ∗ (t) is the complex conjugate of signal m(t). In (1) and (2), it can be pointed out that WVD holds quadratic type property and as a result it also gives false components with original components which are known as cross-terms. There are two possibilities of occurring false-terms, (1) cross-terms coming between monocomponent signals are termed as inter cross-terms. (2) Cross-terms coming in a mono-component signal due to nonlinearity in frequency modulation are termed as intra cross-terms. The misinterpretation between cross-terms and auto terms leads to false analysis and may mislead the actual output. A multicomponent signal m(t) contains K 0 mono-component signals and can be represented as follows: K0 m(t) = m k0 (t) (3) k0 =1
Then WVD of signal m(t) is given as follows: for example, if in (3), K 0 = 2 and therefore, two mono-component signals m 1 (t) and m 2 (t), i.e., m(t) = m 1 (t) + m 2 (t). The WVD of m(t) will be as follows [28]: WVDm (t, ζ) = WVDm 1 (t, ζ) + WVDm 2 (t, ζ) + 2Re[WVDm 1 m 2 (t, ζ)] WVD of actual components WVD of false component
(4)
268
R. R. Sharma et al.
where real part of the signal is characterized by Re and WVDm 1 (t, ζ) and WVDm 2 (t, ζ) are the WVD of m 1 (t) and m 2 (t). The false components are m1m2
2Re[WVD (t, ζ)] in TFR. For K 0 actual components in a signal, there will be K20 false components. The appearance of false components in WVD leads to obstruction in finding the actual components of signal [31, 32]. Therefore, it becomes necessary to suppress or remove the false components for proper interpretation of actual components of a signal.
3 Variational Mode Decomposition (VMD) Method VMD [33] is an adaptive multicomponent signal decomposition method. It is a nonrecursive method used to extract p number of band limited modes M p (t) from a signal V (t). Extraction of every band limited mode is based on central frequency. Each extracted mode must need to concentrate around its central frequency ω p where p = 1, 2..., P. For each mode, the bandwidth in the spectral domain is chosen as a specific sparsity property to remake the original signal. The bandwidth of a mode is computed as a constrained optimization problem. This method requires the following steps [33]: 1. The unilateral frequency spectrum is extracted for each mode by applying Hilbert transform on a signal. 2. An exponential function tuned to corresponding estimated center frequency is multiplied to shift unilateral frequency spectrum to origin. 3. By applying Gaussian smoothing (H 1 ) over the demodulated signal, the bandwidth of the mode can be determined. That is, the L 2 -norm of the gradient [34, 35]. The constrained optimization problem used to determine bandwidth is given below: ⎧ ⎫ 2 ⎬ P ⎨ j ∂t δ(t) + min ∗ v p (t) e− jω p t ⎭ {v p },{ω p } ⎩ πt 2 p=1 where
P
(5)
v p (t) = V (t)
p=1
The constrained optimization issue has been resolved by transforming into unconstrained optimization problem by inserting Lagrangian multiplier λ and the penalty factor α in (5). The finalized unconstrained optimization problem is as follows:
Enhanced Time–Frequency Representation …
269
2
j − jω p t L {v p },{ω p }, λ := α ∂t δ(t)+ πt ∗ v p (t) e 2 p 2 + V (t) − v p (t) + λ(t), V (t) − v p (t) p
(6)
p
2
This equation can be solved with the alternate direction method of multipliers (ADMM). The final updated results are given as follows: vˆ m+1 (ω) p
=
Vˆ (ω) −
ˆl (ω) l= p v
+
1 + 2α(ω − ω p )2
ˆ λ(ω) 2
(7)
∞
ω m+1 p
ω|ˆv p (ω)|2 dω = 0 ∞ v p (ω)|2 dω 0 |ˆ
(8)
where vˆ m+1 (ω) denotes the updated mode and ω m+1 corresponding updated central p p ˆ ˆ (ω) show the FD frequency. In (6), (7), and (8), the V (ω), vˆ p (ω), λ(ω), and vˆ (m+1) p (t), respectively. The proposed method representation of V (t), v p (t), λ(t), and vˆ (m+1) p becomes well-conditioned using Wiener filter structure in (6) equation [33–37]. The VMD implementation based on MATLAB is given in [38]. The parameters needed to give as an input in VMD code are the required number of modes, total DC components, the TD signal to be decomposed, the balancing parameter of the data-fidelity constraint (α), time step of the dual ascent (tau), the convergence criterion tolerance (tol), and the center frequency initialization ω (init). In the presented technique, total computation is performed up to 500 iterations for convergence in the VMD method [36].
4 Proposed Method The methodology proposed in [27] is not able to remove intra cross-terms which is present in the case of nonlinearly modulated frequency of signals. The aim of the proposed method is to remove both inter and intra cross-terms from the WVD of nonstationary multicomponent signals. To achieve this, the signal is decomposed into mono-components using the VMD method and after that the TD segmentation (TDS) is applied with a Gaussian function window. The windowing process creates discontinuity at the segmented end points. Therefore, an overlapping is applied in the process of segmentation with one sample shift 50% overrun. The components are made analytic using HT to get only positive frequencies of real signals. In the end, the WVD is applied to each segmented component. The final WVD is the sum of all the WVDs of segmented components. The block diagram of the proposed method
270
R. R. Sharma et al.
Fig. 1 Block diagram of the VMD-based cross-terms suppression methodology
is presented in Fig. 1. The WVD is computed using ‘time–frequency toolbox’ [39]. The next section explains the results for the considered signals in clean and noisy environment.
5 Results and Discussion The presented technique for cross-terms reduction in WVD is an extended version of the methodology proposed in [27] which technique is not able to remove intra crossterms. The performance of the proposed technique is evaluated using the signals which have nonlinearly frequency modulated mono-component signals. Performance is also observed in a noisy environment at 10 dB signal-to-noise ratio (SNR) in additive white Gaussian noise (AWGN).
5.1 Performance Evaluation To illustrate the performance of the proposed method, Renyi entropy (RenEn) measure is taken into consideration. By comparing the value of RenEn for TFRs of
Enhanced Time–Frequency Representation …
271
Table 1 The RenEn values for TFRs measured using WVD, pseudo WVD, and the proposed method for the signals taken Signal Noise level WVD method Pseudo WVD VMD-WVD method method (proposed) Sg1 Bat echo Phoneme /ae/
Clean 10 dB Clean 10 dB Clean 10 dB
5.84 5.92 0.65 0.86 3.79 4.11
9.77 10.54 4.48 5.67 6.29 6.56
6.71 7.09 3.13 4.48 5.83 6.12
different signals obtained from WVD, pseudo WVD, and the proposed method, concentration in the TF plane can be judged. The mathematical formulation for RenEn can be expressed as follows [1, 40]: ⎛ ⎞ Q P 1 log2 ⎝ Rα = [Cs ( p, q)]α ⎠ 1−α p=−P q=−Q
(9)
In the above expression, Cs ( p, q) is a TF distribution of Cohen’s class and α is the order of information. The value of α chosen is 3 [40]. This measure gives the complexity and information content concentration of time-varying signals in its TF plane. Lower the value of RenEn results in better TFR.
5.1.1
Signal 1
A signal Sg1 has two mono-component signals out of which frequency of one component is linearly modulated and frequency of another component is nonlinearly modulated. Mathematical expression of signal Sg1 is given as follows: 1 cos Sg1 = 70
πn πn n 4πn n 1 + 487 + 28 cos + cos + 27 490 480 256 80 1200 10
(10) The TD plot of Sg1 is given in Fig. 2 and WVD of the signal is displayed in Fig. 3. The TFR of Sg1 using pseudo WVD and the proposed method are given in Figs. 4 and 5. The TFR obtained by applying the proposed method is free from cross-terms. The suggested technique is also evaluated for noisy signal. Therefore, the TF representation of signal Sg1 with AWGN of 10 dB SNR is computed and delineated in Figs. 6, 7, and 8. In these Figs., it can be noticed that the PWVD method creates inter cross-term and AWGN is spread throughout the TF plane. The proposed method gives better TFR as depicted in Fig. 8. The RenEn values computed for TFR of clean
272
R. R. Sharma et al. 0.03 0.02
Amplitude
0.01 0 -0.01 -0.02 -0.03
0
50
100
150
200
250
300
350
400
450
500
350
400
450
500
Sample number Fig. 2 TD plot of signal Sg1
Normalized frequency
0.4
0.3
0.2
0.1
0
50
100
150
200
250
300
Sample number Fig. 3 TF plot of signal Sg1 using WVD
and noisy signal for Sg1 are given in Table 1. The RenEn for Fig. 8 is better than the RenEn computed for Fig. 7. Therefore, it can be said that the proposed method is good to remove inter and intra cross-terms in WVD.
5.1.2
Signal 2
Another signal Sg2 is a multicomponent bat echo signal and this is a natural signal generated by a large brown bat called Eptesicus fuscus which is taken from [41].
Enhanced Time–Frequency Representation …
273
Fig. 4 TF plot of signal Sg1 using pseudo WVD
Normalized frequency
0.4
0.3
0.2
0.1
0
50
100
150
200
250
300
350
400
450
500
Sample number Fig. 5 TF plot of signal Sg1 using enhanced VMD-WVD (proposed method)
The components of this signal are well separated in TF domain but overlapped in TD and FD. Duration of the bat sound signal taken into consideration in the proposed method is 2.5 ms, with the sampling period of 7 ms. TD plot of signal Sg2 , bat echo, is given in Fig. 9 and WVD of bat echo signal is displayed in Fig. 10. The TFR of the bat echo using the pseudo WVD method and the proposed method is given in Figs. 11 and 12. The TFR obtained by applying the proposed method does not have cross-terms. The TFR of noisy bat echo signal with 10 dB SNR using WVD, pseudo WVD, and the VMD-WVD methods are delineated
Normalized frequency
274
R. R. Sharma et al.
0.4
0.3
0.2
0.1
0
50
100
150
200
250
300
350
400
450
500
Sample number Fig. 6 TF plot of signal Sg1 using WVD at 10 dB SNR
Fig. 7 TF plot of signal Sg1 using pseudo WVD at 10 dB SNR
in Figs. 13, 14, and 15, respectively. The RenEn values for TFR of clean and noisy signal Sg2 are available in Table 1.
5.1.3
Signal 3 (Speech Signal)
Another signal Sg3 which is voiced speech signal phoneme /ae/ [42] is taken to evaluate the superiority of the proposed method. Its 800 samples are taken at sample rate 16 kHz and depicted in Fig. 16. There are four components in the signal [13].
Normalized frequency
Enhanced Time–Frequency Representation …
275
0.4
0.3
0.2
0.1
0
50
100
150
200
250
300
350
400
450
500
Sample number Fig. 8 TF plot of signal Sg1 using enhanced VMD-WVD (proposed method) at 10 dB SNR 0.15 0.1
Amplitude
0.05 0 -0.05 -0.1 -0.15 -0.2 -0.25
0
50
100
150
200
250
300
350
400
Sample number Fig. 9 TD plot of bat sound signal
The energy of the lowest frequency component is very high as compared to other components. Therefore, only the first component is visible in the TF plane. The TFRs of signal Sg3 using WVD, pseudo WVD, and enhanced VMD-WVD (proposed method) are shown in Figs. 17, 18, and 19. As the amplitude of the signal is highly varying, cross-terms exist in components, and the location of high energy is falsified. To avoid this situation, an enhanced TFR method is proposed using VMD and WVD which is an advanced version of [27]. The RenEn values for TFR of clean and noisy signal Sg3 are given in Table 1. The proposed method gives better TFR which is
276
R. R. Sharma et al.
Normalized frequency
0.4
0.3
0.2
0.1
0
50
100
150
200
250
300
350
400
250
300
350
400
Sample number Fig. 10 TF plot of bat sound signal using WVD
Normalized frequency
0.4
0.3
0.2
0.1
0
50
100
150
200
Sample number Fig. 11 TF plot of bat sound signal using pseudo WVD
cross-terms free and as energy is reducing in TD signal, energy in its TFR is also reduced at the same time instant even in presence of noise (Figs. 20, 21 and 22). The pseudo WVD method is not able to remove cross-terms completely. This method can remove intra cross-terms but inter cross-terms are not removed. The values of the Renyi information measure are low in the case of the presented methodology when compared with the values the case of pseudo WVD. It means that the proposed methodology gives better TFR as compared to the WVD and pseudo WVD which is also shown by the graphical representation of TFR of signals in experimental results.
Enhanced Time–Frequency Representation …
277
Normalized frequency
0.4
0.3
0.2
0.1
0
50
100
150
200
250
300
350
400
Sample number Fig. 12 TF plot of bat sound signal using enhanced VMD-WVD (proposed method)
Normalized frequency
0.4
0.3
0.2
0.1
0
50
100
150
200
250
300
Sample number Fig. 13 TF plot of noisy bat echo sound signal with 10 dB SNR using WVD
350
400
Normalized frequency
278
R. R. Sharma et al.
0.4
0.3
0.2
0.1
0
50
100
150
200
250
300
350
400
Sample number Fig. 14 TF plot of noisy bat echo sound signal with 10 dB SNR using pseudo WVD
Normalized frequency
0.4
0.3
0.2
0.1
0
50
100
150
200
250
300
350
400
Sample number Fig. 15 TF plot of noisy bat echo sound signal with 10 dB SNR using enhanced VMD-WVD (proposed method)
Enhanced Time–Frequency Representation …
279
0.06 0.04
Amplitude
0.02 0 -0.02 -0.04 -0.06
0
100
200
300
400
500
600
700
800
600
700
800
Sample number Fig. 16 TD plot of Sg3 , phoneme /ae/ signal
Normalized frequency
0.4
0.3
0.2
0.1
0
100
200
300
400
Sample number Fig. 17 TF plot of Sg3 , phoneme /ae/ signal using WVD
500
280
R. R. Sharma et al.
Normalized frequency
0.4
0.3
0.2
0.1
0
100
200
300
400
500
600
700
800
600
700
800
Sample number
Normalized frequency
Fig. 18 TF plot of Sg3 , phoneme /ae/ signal using pseudo WVD
0.4
0.3
0.2
0.1
0
100
200
300
400
500
Sample number Fig. 19 TF plot of Sg3 , phoneme /ae/ signal using enhanced VMD-WVD (proposed method)
Enhanced Time–Frequency Representation …
281
Normalized frequency
0.4
0.3
0.2
0.1
0
100
200
300
400
500
600
700
800
700
800
Sample number Fig. 20 TF plot of noisy signal Sg3 , phoneme /ae/ with 10 dB SNR using WVD
Normalized frequency
0.4
0.3
0.2
0.1
0
100
200
300
400
500
600
Sample number Fig. 21 TF plot of noisy signal Sg3 , phoneme /ae/ with 10 dB SNR using pseudo WVD
282
R. R. Sharma et al.
Normalized frequency
0.4
0.3
0.2
0.1
0
100
200
300
400
500
600
700
800
Sample number Fig. 22 TF plot of noisy signal Sg3 , phoneme /ae/ with 10 dB SNR using enhanced VMD-WVD (proposed method)
6 Conclusion In this chapter, an advanced version of VMD-based cross-terms-free WVD method [27] is presented. The proposed technique can remove both inter and intra cross-terms in WVD. The process of signal decomposition using VMD removes the intra crossterms. The windowing applied in TD is able to remove intra cross-terms. Performance is evaluated using RenEn measure for the considered signals. The method is tested on two natural signals, bat echo sound signal and human voiced speech signal. On the basis of results, it can be said that the presented method can be useful for the analysis and diagnosis of natural signals for various purposes such as event detection, prediction, and classification. However, input parameters of the VMD method have to be selected which vary the performance of the proposed method. In future, the parameter optimization for best TF localization using VMD-WVD will be focused to make it a parameter-selection free approach. Acknowledgements The authors wish to thank Curtis Condon, Ken White, and Al Feng of the Beckman Institute of the University of Illinois for the bat data and for permission to use it in this chapter.
References 1. B. Boashash, Time-Frequency Signal Analysis and Processing: A Comprehensive Reference (Elsevier, Amsterdam, 2003)
Enhanced Time–Frequency Representation …
283
2. N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis, in Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 454 (1998), pp. 903–995 3. R.R. Sharma, R.B. Pachori, A new method for non-stationary signal analysis using eigenvalue decomposition of the Hankel matrix and Hilbert transform, in Fourth International Conference on Signal Processing and Integrated Networks (2017), pp. 484–488 4. B. Boashash, P. Black, An efficient real-time implementation of the Wigner-Ville distribution. IEEE Trans. Acoust. Speech Signal Process. 35, 1611–1618 (1987) 5. L. Stankovic, M. Dakovic, T. Thayaparan, Time-Frequency Signal Analysis with Applications (Artech House, Norwood, 2013) 6. S. Kadambe, G.F. Boudreaux-Bartels, A comparison of the existence of ‘cross terms’ in the Wigner distribution and the squared magnitude of the wavelet transform and the short-time Fourier transform. IEEE Trans. Signal Processcess. 40, 2498–2517 (1992) 7. N.E. Huang, Z. Wu, A review on Hilbert-Huang transform: method and its applications to geophysical studies. Rev. Geophys. 46(2) (2008) 8. Y. Meyer, Wavelets and Operators, vol. 1 (Cambridge University Press, Cambridge, 1995) 9. R.R. Sharma, R.B. Pachori, Time-frequency representation using IEVDHM-HT with application to classification of epileptic EEG signals. IET Sci. Measur. Technol. 12(1), 72–82 (2018) 10. R.R. Sharma, R.B. Pachori, Eigenvalue decomposition of Hankel matrix-based time-frequency representation for complex signals. Circuits, Syst., Signal Process. 37(8), 3313–3329 (2018) 11. R.B. Pachori, A. Nishad, Cross-terms reduction in the Wigner-Ville distribution using tunableQ wavelet transform. Signal Process. 120, 288–304 (2016) 12. L. Cohen, Time-frequency distributions-a review. Proc. IEEE 77, 941–981 (1989) 13. R.R. Sharma, R.B. Pachori, Improved eigenvalue decomposition-based approach for reducing cross-terms in Wigner-Ville distribution. Circuits, Syst., Signal Process. 37(08), 3330–3350 (2018) 14. W.J. Staszewski, K. Worden, G.R. Tomlinson, Time-frequency analysis in gearbox fault detection using the Wigner-Ville distribution and pattern recognition. Mech. Syst. Signal Process. 11(5), 673–692 (1997) 15. J. Brynolfsson, M. Sandsten, Classification of one-dimensional non-stationary signals using the Wigner-Ville distribution in convolutional neural networks, in 2017 25th European Signal Processing Conference (2017), pp. 326–330 16. Y.S. Yan, C.C. Poon, Y.T. Zhang, Reduction of motion artifact in pulse oximetry by smoothed pseudo Wigner-Ville distribution. J. Neuro Eng. Rehabil. 2(1), 3 (2005) 17. P. Jain, R.B. Pachori, Marginal energy density over the low frequency range as a feature for voiced/non-voiced detection in noisy speech signals. J. Frankl. Inst. 350, 698–716 (2013) 18. R.R. Sharma, M. Kumar, R.B. Pachori, Automated CAD identification system using timefrequency representation based on eigenvalue decomposition of ECG signals, in International Conference on Machine Intelligence and Signal Processing (2017), pp. 597–608 19. R.R. Sharma, M. Kumar, R.B. Pachori, Joint time-frequency domain-based CAD disease sensing system using ECG signals. IEEE Sens. J. 19(10), 3912–3920 (2019) 20. R.R. Sharma, P. Chandra, R.B. Pachori, Electromyogram signal analysis using eigenvalue decomposition of the Hankel matrix, in Machine Intelligence and Signal Analysis (Springer, Singapore, 2019), pp. 671–682 21. R.R. Sharma, M. Kumar, R.B. Pachori, Classification of EMG signals using eigenvalue decomposition-based time-frequency representation, in Biomedical and Clinical Engineering for Healthcare Advancement (IGI Global, 2020), pp. 96–118 22. C. Xude, X. Bing, X. Xuedong, Z. Yuan, W. Hongli, Suppression of cross-terms in Wigner-Ville distribution based on short-term fourier transform, in 2015 12th IEEE International Conference on Electronic Measurement and Instruments (ICEMI) (2015), pp. 472–475 23. R.R. Sharma, A. Kalyani, R.B. Pachori, An empirical wavelet transform based approach for cross-terms free Wigner-Ville distribution. Signal Image Video Process. 1–8 (2019). https:// doi.org/10.1007/s11760-019-01549-7
284
R. R. Sharma et al.
24. R.B. Pachori, P. Sircar, A novel technique to reduce cross terms in the squared magnitude of the wavelet transform and the short time Fourier transform, in IEEE International Workshop on Intelligent Signal Processing (Faro, Portugal, 2005), pp. 217–222 25. P. Flandrin, B. EscudiÃl’, An interpretation of the pseudo-Wigner-Ville distribution. Signal Process. 6, 27–36 (1984) 26. D. Ping, P. Zhao, B. Deng: Cross-terms suppression in Wigner-Ville distribution based on image processing, in 2010 IEEE International Conference on Information and Automation (2010), pp. 2168–2171 27. P. Meena, R.R. Sharma, R.B. Pachori, Cross-term suppression in the Wigner-Ville distribution using variational mode decomposition, in 5th International Conference on Signal Processing, Computing, and Control (ISPCC-2k19) (Waknaghat, India, 2019) 28. R.B. Pachori, P. Sircar, A new technique to reduce cross terms in the Wigner distribution. Digital Signal Process. 17, 466–474 (2007) 29. N.A. Khan, I.A. Taj, M.N. Jaffri, S. Ijaz, Cross-term elimination in Wigner distribution based on 2D signal processing techniques. Signal Process. 91, 590–599 (2011) 30. T.A.C.M. Claasen, W.F.G. Mecklenbrauker, The Wigner distribution- A tool for time-frequency signal analysis, Part I: continuous-time signals. Philips J. Res. 35(3), 217–250 (1980) 31. R.B. Pachori, P. Sircar, Analysis of multicomponent nonstationary signals using Fourier-Bessel transform and Wigner distribution, in 14th European Signal Processing Conference (2006) 32. R.B. Pachori, P. Sircar, Time-frequency analysis using time-order representation and Wigner distribution, in IEEE Tencon Conference, Article no. 4766782 (2008) 33. K. Dragomiretskiy, D. Zosso, Variational mode decomposition. IEEE Trans. Signal Process. 62(3) 531–544 (2014) 34. S. Mohanty, K.K. Gupta, Bearing fault analysis using variational mode decomposition. J. Instrum. Technol. Innov. 4, 20–27 (2014) 35. A. Upadhyay, M. Sharma, R.B. Pachori, Determination of instantaneous fundamental frequency of speech signals using variational mode decomposition. Comput. Electr. Eng. 62, 630–647 (2017) 36. A. Upadhyay, R.B. Pachori, Instantaneous voiced/non-voiced detection in speech signals based on variational mode decomposition. J. Frankl. Inst. 352(7), 2679–2707 (2015) 37. A. Upadhyay, R.B. Pachori, Speech enhancement based on mEMD-VMD method. Electron. Lett. 53(07), 502–504 (2017) 38. http://www.math.ucla.edu/zosso/code.html 39. F. Auger, P. Flandrin, P. Goncalves, O. Lemoine, Time-Frequency Toolbox, vol. 46 (CNRS France-Rice University, 1996) 40. L. Stankovic, A measure of some time-frequency distributions concentration. Signal Process. 81, 621–631 (2001) 41. R. Baraniuk, Bat Echolocation Chirp, http://dsp.rice.edu/software/TFA/RGK/BAT/batsig.bin. Z/, (2009) 42. R.B. Pachori, P. Sircar, Analysis of multicomponent AM-FM signals using FB-DESA method. Digital Signal Process. 20, 42–62 (2010) 43. J. Burriel-Valencia, R. Puche-Panadero, J. Martinez-Roman, A. Sapena-Bano, M. PinedaSanchez, Short-frequency Fourier transform for fault diagnosis of induction machines working in transient regime. IEEE Trans. Instrum. Meas. 66, 432–440 (2017)