284 53 114MB
English Pages 214 [216] Year 2021
Kamal Kant Hiran, Deepak Khazanchi, Ajay Kumar Vyas, Sanjeevikumar Padmanaban (Eds.) Machine Learning for Sustainable Development
De Gruyter Frontiers in Computational Intelligence
Edited by Siddhartha Bhattacharyya
Volume 9
Machine Learning for Sustainable Development Edited by Kamal Kant Hiran, Deepak Khazanchi, Ajay Kumar Vyas and Sanjeevikumar Padmanaban
Editors Kamal Kant Hiran Department of Computer Science and Engineering Sir Padampat Singhania University Udaipur-Chittorgarh Rd Bhatewar 313601, India [email protected] Ajay Kumar Vyas Department of Information and Communication Technology Adani Institute of Infrastructure Engineering Shantigram Township, Gandhinagar Hwy Ahmedabad 382421, Gujarat, India [email protected]
Deepak Khazanchi College of Information Science and Technology The Peter Kiewit Institute, PKI 172C University of Nebraska at Omaha Omaha 68182, NE, USA [email protected] Sanjeevikumar Padmanaban CTIF Global Capsule Department of Business Development and Technology Aarhus University Birk Centerpark 40, 7400 Herning, Denmark [email protected]
ISBN 978-3-11-070248-4 e-ISBN (PDF) 978-3-11-070251-4 e-ISBN (EPUB) 978-3-11-070258-3 ISSN 2512-8868 Library of Congress Control Number: 2021935725 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2021 Walter de Gruyter GmbH, Berlin/Boston Cover image: shulz/E+/getty images Typesetting: Integra Software Services Pvt. Ltd. Printing and binding: CPI books GmbH, Leck www.degruyter.com
Preface Machine learning (ML) is a part of computerized reasoning which comprises algorithms and artificial neural networks and displays qualities firmly connected with human insight. The book focuses on the applications of ML for sustainable development. This book provides an understanding of sustainable development and how we can forecast it using ML approaches. The ML models for sustainable development include weather forecasting, management of clean water, food security, life on land, product design and life cycle, sustainable development in tourism, policymaking process and e-governance renewable energy with experimental and analytical results. The book has compressive studies regarding the energy demand prediction, agriculture, weather forecasting and medical applications using models of ML that have profoundly contributed. It also covered ML approaches for green Internet of things(IoT) and environmental IoT for sustainable development. The book provides a straightforward approach to ML-based model sustainable solutions in various sectors of business and society. It is a framework for business opinion leaders and professionals, as well as an orientation for stakeholders of academia. This volume intends to deliberate some of the latest research findings of the applications of ML. The volume comprises 11 well-versed chapters on the subject. Chapter 1 describes Internet of nano-things (IoNT) with artificial intelligence (AI) and ML applications. It describes the current state of IoNT research and its implications and proposes how AI tools can be leveraged in IoNT applications, and future research possibilities of IoNT in healthcare, medicine, smart buildings and home automation, utility, environment monitoring and agriculture. Chapter 2 presents a 360-degree approach to an education model using ML, and also a conceptual framework has been proposed on how ML can be adopted in different stages of higher education to enhance the quality of education delivery and prepare job-ready graduates that show the opportunities and challenges in transforming higher education through ML. Chapter 3 discusses the traditional statistical method of autoregressive integrated moving average model for time series data analysis and forecasting for nonlinear time series data, deep learning approach of artificial neural network (ANN), namely, multilayer perceptron and long short-term memory recurrent neural network models and further it has been compared in terms of their salient features, complexity and accuracy. Chapter 4 provides the application of ML for green IoT using effective load recognition based on event-driven processing of smart meter data. It provides a thorough overview of the current research outcomes for the study of smart meter data using ML techniques. An application-oriented analysis is being addressed. The main applications, such as load profiling, load forecasting and load scheduling, are considered.
https://doi.org/10.1515/9783110702514-202
VI
Preface
Chapter 5 presents a comprehensive study of ML algorithms such as supervised, unsupervised and a sequence of both. The demonstration on various ML algorithms is used for image classification, and for clustering which also represents the summary and comparison of ML algorithms for various datasets like COREL and Face image database. Chapter 6 deals with a novel approach to identify healthy and unhealthy plants. This is done by creating an ML model based on the leaf image of the plant, which predicts the healthiness of a plant. This model thus helps in determining the quality of use of that plant. The fast growth and increase in production of crops can be learned from their leaves. The prediction of quality analysis for crop is based on ML model. Chapter 7 presents the impact of ML in business development, application development through various models and their practicability. This chapter discusses the different data models for real-time ML applications. The analysis of Alzheimer’s disease using support vector machine, and ML in agriculture are the two contextual investigations presented in the chapter. Chapter 8 covers how ML approaches add sustainability to broad categories of agriculture management, namely, crop management, water management and soil management. Water and soil management in agriculture entails substantial efforts and plays a very significant role in agronomical, hydrological and climatological balance. By applying ML-driven analytics to sensor data, the evolving agriculture management systems can really help a lot in disease detection, weed detection, yield prediction, crop quality prediction, water management, nutrition management, soil management and so on, as well as provides recommendations, remedies and great insights for timely decision and action. Chapter 9 presents the application of ML in simultaneous localization and mapping (SLAM) algorithms, and discusses ANN, k-nearest neighbor, convolutional neural network–based optimization technique for optimizing precious in localization applied on extended Kalman filter–based SLAM algorithm. Chapter 10 contains the use of regression decision tree, linear regression, clustering techniques, decision tree classification, binary logistic regression and principal component analysis of ML to forecast weather achieving higher accuracy. Chapter 11 is a case study on applications of conventional ML and deep learning for automation of diagnosis to investigate the suitability of ML and deep learning methods for two different applications, namely, peripheral blood smear images and cervix image analysis. Kamal Kant Hiran Deepak Khazanchi Ajay Kumar Vyas Sanjeevikumar Padmanaban
Contents Preface
V
About editors
IX
List of contributors
XI
Anoop Mishra, Abhishek Tripathi, Deepak Khazanchi Chapter 1 A framework for applying artificial intelligence (AI) with Internet of nanothings (IoNT) 1 Sujith Jayaprakash, V. Kathiresan, N. Shanmugapriya, Manish Dadhich Chapter 2 Opportunities and challenges in transforming higher education through machine learning 17 Radha Guha Chapter 3 Efficient renewable energy integration: a pertinent problem and advanced time series data analytics solution 31 Saeed Mian Qaisar, Doaa A. Bashawyah, Futoon Alsharif, Abdulhamit Subasi Chapter 4 A comprehensive review on the application of machine learning techniques for analyzing the smart meter data 53 Uma Maheswari V., Rajanikanth Aluvalu, Krishna Keerthi Chennam Chapter 5 Application of machine learning algorithms for facial expression analysis Ammu Anna Mathew, S. Vivekanandan Chapter 6 Prediction of quality analysis for crop based on machine learning model Mehul Mahrishi, Girish Sharma, Sudha Morwal, Vipin Jain, Mukesh Kalla Chapter 7 Data model recommendations for real-time machine learning applications: a suggestive approach 115
77
97
VIII
Contents
Ashok Bhansali, Swati Saxena, Kailash Chandra Bandhu Chapter 8 Machine learning for sustainable agriculture 129 Rohit Mittal, Vibhakar Pathak, Geeta Chhabra Gandhi, Amit Mithal, Kamlesh Lakhwani Chapter 9 Application of machine learning in SLAM algorithms 147 Shruti Dadhich, Vibhakar Pathak, Rohit Mittal, Ruchi Doshi Chapter 10 Machine learning for weather forecasting 161 Roopa B. Hegde, Vidya Kudva, Keerthana Prasad, Brij Mohan Singh, Shyamala Guruvare Chapter 11 Applications of conventional machine learning and deep learning for automation of diagnosis: case study 175 Index
199
About editors Kamal Kant Hiran works as an assistant professor at the School of Engineering, Sir Padampat Singhania University (SPSU), Udaipur, Rajasthan, India, as well as a research fellow at the Aalborg University, Copenhagen, Denmark. He is a gold medalist in MTech (Hons.). He has more than 16 years of experience as an academic and researcher in Asia, Africa and Europe. He worked as an associate professor and academics head at the BlueCrest University College, Liberia, West Africa; head of department at the Academic City College, Ghana, West Africa; senior lecturer at the Amity University, Jaipur, Rajasthan, India; assistant professor at the Suresh Gyan Vihar University, Jaipur, Rajasthan, India; and visiting lecturer at the Government Engineering College, Ajmer. He has several awards to his credit such as International travel grant for attending the 114th IEEE Region 8 Committee Meeting at Warsaw, Poland; International travel grant for Germany from ITS Europe, Passau, Germany; Best Research Paper Award at the University of Gondar, Ethiopia and SKIT, Jaipur, India; IEEE Liberia Subsection Founder Award; Gold Medal Award in MTech (Hons.); IEEE Ghana Section Award – Technical and Professional Activity Chair; IEEE Senior Member Recognition, IEEE Student Branch Award, Elsevier Reviewer Recognition Award and the Best Research Paper Award from the University of Gondar, Ethiopia. He has published 35 scientific research papers in SCI/Scopus/Web of Science and IEEE Transactions Journal, conferences, 2 Indian patents and 9 books with internationally renowned publishers. He is a reviewer and editorial board member of various reputed international journals in Elsevier, Springer, IEEE Transactions, IET, Bentham Science and IGI Global. He is an active member in organizing many international seminars, workshops and conferences. He has made several international visits to Denmark, Sweden, Germany, Norway, Ghana, Liberia, Ethiopia, Russia, Dubai and Jordan for research exposures. His research interests focus on cloud computing, machine learning and intelligent IoT. Dr. Deepak Khazanchi is full professor of information systems and quantitative analysis, associate dean for academic affairs, and community engagement and internationalization officer at the College of Information Science & Technology (IS&T), University of Nebraska at Omaha (UNO). Prior to becoming associate dean, he served as chair of the Information Systems and Quantitative Analysis Department at the College of IS&T. He is also an affiliate faculty in UNO’s International Studies and Programs and the Leonard and Shirley Goldstein Center for Human Rights (GCHR). He has served as a visiting/adjunct full professor with the Center for Integrated Emergency Management (CIEM) at the University of Agder (Kristiansand, Norway), University of International Business and Economics (Beijing, China) and Management Center Innsbruck (Innsbruck, Austria). Dr. Khazanchi currently serves on the Board of Management for the Sir Padampat Singhania University (Udaipur, India) and the Executive Council for Bennett University (Noida, India). He is also an active advisor and external examiner for a number of universities around the world.
https://doi.org/10.1515/9783110702514-204
X
About editors
Ajay Kumar Vyas has more than 15 years of teaching and research experience. He is presently working as assistant professor at Adani Institute of Infrastructure Engineering, Ahmedabad (India). He has completed his BE from Ujjain Engineering College, Ujjain, and MTech for Shri GS Institute of Technology and Science, Indore, with Honors and PhD from Maharana Pratap University of Agriculture and Technology, Udaipur (Raj.). He is a senior member of IEEE and senior member of IACSIT (Singapore). He is a reviewer of international peer-review journals in Springer, IET, OSA, IGI Global and many more. He is author of several research papers in peerreviewed international journals and conferences, textbooks and book chapters published by a renowned publisher.
Sanjeevikumar Padmanaban (Member’12–Senior Member’15, IEEE) received his PhD in electrical engineering from the University of Bologna, Bologna, Italy, in 2012. He was an associate professor at VIT University from 2012 to 2013. In 2013, he joined the National Institute of Technology, India, as a faculty member. In 2014, he was invited as a visiting researcher at the Department of Electrical Engineering, Qatar University, Doha, Qatar, funded by the Qatar National Research Foundation (Government of Qatar). He continued his research activities with the Dublin Institute of Technology, Dublin, Ireland, in 2014. Further, he served as an associate professor at the Department of Electrical and Electronics Engineering, University of Johannesburg, Johannesburg, South Africa, from 2016 to 2018. Since 2018, he has been a faculty member with the Department of Energy Technology, Aalborg University, Esbjerg, Denmark. He has authored over 300 scientific papers. S. Padmanaban was the recipient of the Best Paper cum Most Excellence Research Paper Award from IET-SEISCON’13, IET-CEAT’16, IEEE-EECSI’19, IEEE-CENCON’19 and five best paper awards from ETAEERE’16-sponsored Lecture Notes in Electrical Engineering, Springer book. He is a fellow of the Institution of Engineers, India, the Institution of Electronics and Telecommunication Engineers, India, and the Institution of Engineering and Technology, UK. He is an editor/associate editor/editorial board for refereed journals, in particular the IEEE Systems Journal, IEEE Transaction on Industry Applications, IEEE Access, IET Power Electronics, IET Electronics Letters and Wiley – International Transactions on Electrical Energy Systems, subject editorial board member – Energy Sources – Energies Journal, MDPI, and the subject editor for the IET Renewable Power Generation, IET Generation, Transmission and Distribution, and FACTS journal (Canada).
List of contributors Anoop Mishra The University of Nebraska at Omaha, USA Email: [email protected] Abhishek Tripathi The College of New Jersey, USA Email: [email protected] Deepak Khazanchi The University of Nebraska at Omaha, Omaha, NE, USA Email: [email protected] Sujith Jayaprakash BlueCrest University College, Accra Ghana, West Africa Email: [email protected] Dr. Kathiresan V. Dr. S. N. S. Rajalakshmi College of Arts and Science Coimbatore, Tamil Nadu, India Email: [email protected] Dr. Shanmuga Priya N. Dr. S. N. S. Rajalakshmi College of Arts and Science Coimbatore, Tamil Nadu, India Email: [email protected] Dr. Manish Dadhich Sir Padampat Singhania University Udaipur, India Email: [email protected] Radha Guha SRM University, Andhra Pradesh, India Email: [email protected] Saeed Mian Qaisar Energy and Technology Research Center, Communications and Signal Processing Research Lab, College of Engineering, Effat University, 21478 Jeddah, Saudi Arabia Email: [email protected]
https://doi.org/10.1515/9783110702514-205
Doaa A. Bashawyah Energy and Technology Research Center, Communications and Signal Processing Research Lab, College of Engineering, Effat University, 21478 Jeddah, Saudi Arabia Futoon Alsharif Energy and Technology Research Center, Communications and Signal Processing Research Lab, College of Engineering, Effat University, 21478 Jeddah, Saudi Arabia Abdulhamit Subasi Energy and Technology Research Center, Communications and Signal Processing Research Lab, College of Engineering, Effat University, 21478 Jeddah, Saudi Arabia Uma Maheswari V. Department of Computer Science and Engineering Vardhaman College of Engineering, Hyderabad, Telangana, India Email: [email protected] Rajanikanth Aluvalu Department of Computer Science and Engineering Vardhaman College of Engineering, Hyderabad, Telangana, India Email: [email protected] Krishna Keerthi Chennam Department of Computer Science and Engineering, Muffakham Jah College of Engineering Hyderabad, Telangana, India Email: [email protected] Ammu Anna Mathew Research Associate, School of Electrical Engineering VIT University, Vellore, Tamil Nadu, India
XII
List of contributors
Dr. S. Vivekanandan Associate Professor, School of Electrical Engineering VIT University, Vellore, Tamil Nadu, India Email: [email protected] Girish Sharma Swami Keshvanand Institute of Technology Jaipur, Rajasthan, India Email: [email protected] Mehul Mahrishi Swami Keshvanand Institute of Technology Jaipur, Rajasthan, India Email: [email protected] Sudha Morwal Banasthali Vidyapith Jaipur, Rajasthan, India Email: [email protected] Vipin Jain Swami Keshvanand Institute of Technology Jaipur, Rajasthan, India Email: [email protected] Mukesh Kalla Sir Padampat Singhania University Udaipur, Rajasthan, India Email: [email protected] Ashok Bhansali O. P. Jindal University, Raigarh, Chhattisgarh, India Email: [email protected] Swati Saxena ITM Vocational University, Vadodara, India Email: [email protected] Kailash Chandra Bandhu Acropolis Technical Campus, Indore, India
Rohit Mittal Arya College of Engineering & I.T, Jaipur, Rajasthan, India Email: [email protected] Vibhakar Pathak Arya College of Engineering & IT, Jaipur, Rajasthan, India Email: [email protected] Geeta Chhabra Gandhi Poornima University Vidhani, Rajasthan, India Amit Mithal Jaipur Engineering College and Research Centre, Jaipur, Rajasthan, India Kamlesh Lakhwani Lovely Professional University, Punjab, India Email: [email protected] Shruti Dadhich Noida Institute of Engineering Technology Noida, Uttar Pradesh, India Email: [email protected] Dr. Ruchi Doshi Department of Computer Science and Engineering Azteca University, Mexico Email: [email protected] Roopa B. Hegde NMAM Institute of Technology, NITTE, Karnataka, India Email: [email protected] Vidya Kudva NMAM Institute of Technology, NITTE, Karnataka, India Email: [email protected]
List of contributors
Keerthana Prasad Manipal School of Information Sciences, Manipal, Karnataka, India Email: [email protected] Brij Mohan Singh Kasturba Medical College, Manipal, Karnataka, India Email: [email protected]
Shyamala Guruvare Kasturba Medical College, Manipal, Karnataka, India Email: [email protected]
XIII
Anoop Mishra, Abhishek Tripathi, Deepak Khazanchi
Chapter 1 A framework for applying artificial intelligence (AI) with Internet of nanothings (IoNT) Abstract: The Internet of things (IoT) is a network of interconnected devices for exchange of data and information. Since traditional IoT applications are limited by the size of devices, the emergence of nanotechnology has resulted in the development of a new category of connected devices broadly referred to as the Internet of nanothings (IoNT). IoNT includes the miniaturized replacement for traditional IoT sensors for communication, data collection, data transfer and data processing. IoNT as a concept is still relatively new and has drawn attention from industry, academic researchers and professional practitioners but needs lot more research in terms of development and applications. In this chapter, drawing from the artificial intelligence (AI), IoT and IoNT literature, we describe the current state of IoNT research and its implications, and propose how AI can be leveraged in future IoNT applications. Keywords: Internet of things, Internet of nanothings, artificial intelligence, machine learning
1.1 Introduction The evolution of the Internet of things (IoT), along with the rapid advancement in other related technologies, has enabled the concepts of smart and intelligent environments such as smart homes and smart cities. This growth in IoT usage has paved the way for further innovation, where even smaller devices with more capabilities are becoming part of the landscape. In traditional IoT, communication and data collection are the major areas of emphasis, where tasks like computing, data storage, actuation or sensing are performed. These tasks done via the Internet generate enormous amounts of data from the sensors. Researchers have developed numerous applications using traditional IoT that focus on data collection, data transfer, learning from data, monitoring and smart apps [1, 2, 3]. Additionally, various artificial intelligence (AI) and communication techniques are used with IoT to overcome
Anoop Mishra, The University of Nebraska at Omaha, USA, e-mail: [email protected] Abhishek Tripathi, The College of New Jersey, USA Deepak Khazanchi, The University of Nebraska at Omaha, USA https://doi.org/10.1515/9783110702514-001
2
Anoop Mishra, Abhishek Tripathi, Deepak Khazanchi
challenges associated with resource management, data integrity, periodic data gathering, self-aware protocols, fault tolerance, cybersecurity and communication mechanisms including techniques such as machine learning (ML), deep learning (DL), game theory, cognitive networks, distributed storage, edge computing, mobile IoT, cellularlike wireless and 4G/5G [3, 1–4]. However, despite these technological advancements, unique application requirements in areas such as biomedical, military, infrastructure health monitoring and agriculture have slowed the progress. In particular, there is a need for a robust, secured and reliable communication platform for IoNT devices in the future. Traditional IoT applications are limited by the size of devices, battery life, power consumption, memory limitation, inability to cover larger areas and difficulty of accessing unreachable locations due to sensor size [5, 6]. In the past decade, researchers have addressed some of the challenges of traditional IoT with nanotechnology. Physicist Richard Feynman in December 1959 gave a lecture “There’s plenty of room at the bottom,” where he proposed the idea about the manipulation of individual atoms as a more powerful form of synthetic chemistry. This became the underlying vision for the field of nanotechnology that led to the development of networked components in the nanoscale range between 1 and 100 nm to replace current IoT networks. These nanoscale components include sensors and devices to perform simple computing tasks including data collection and actuation [7]. According to Akyildiz and Jornet 2010 [8], “the Interconnection of nanoscale devices with existing communication networks and ultimately the Internet, defines a new networking paradigm called ‘Internet of Nano-Things.’” Thus, IoNT is a cyber-physical system and a miniaturized replacement for traditional IoT and can be utilized in scenarios where IoT underperforms. IoNT is “built from inexpensive microsensors and microprocessors paired with tiny power supplies and wireless antennas . . .” [9]. One of the advantages of IoNT is that it can employ the existing network of IoT devices along with nanodevices. Additionally, the IoNT architecture consists of electromagnetic (EM) nanoscale communication using graphene-based nanoantennas using the terahertz (THz) band [8, 10]. IoNT research is still at a nascent stage with regard to commercial applications and work is being done to make these devices available for commercial usage at scale at a reasonable price. It is well accepted that IoNT still has many challenges that need addressing: context management, data operations, service composition and discovery, routing mechanisms, resource management, security and privacy [11, 12]. In IoT, for communication and data operation challenges, AI approaches like ML are useful for improving the operation and performance utilizing the data collected by IoT sensors [2]. In IoNT applications, the THz band will allow high data transmission and large data production from nanosensors [13]. Once data from nanosensors and other nanodevices in an IoNT have been brought together in one place, AI approaches such as ML, DL and neural networks can be used to analyze data, conduct trend analysis and look at patterns and anomalous data points in real-time data.
Chapter 1 A framework for applying artificial intelligence (AI)
3
In this chapter, we describe how IoNT applications can leverage AI/ML applications. Like IoT, employing AI/ML approaches will enhance IoNT application and usage and can help overcome challenges for creating IoNT applications. We review the current state of IoNT research including IoNT components and architecture. We then propose areas where IoNT can benefit from AI/ML approaches.
1.2 Prior literature IoNT research is still at a nascent stage and not yet available for wide-ranging commercial applications. We reviewed 31 articles relating to IoNT between 2010 and 2020. A majority of the articles published on IoNT cover themes containing definitions, architecture, trends, challenges and potential applications. All published articles seem to primarily focus on discussing theoretical applications, frameworks and simulations within the areas of communication, networking, hardware design and architecture. Akyildiz et al. 2010 briefly describe the IoNT introduction, network architecture and communication including nanonetworks protocol and information processing. They claim that nanonetworks can cover larger areas and could be placed in otherwise unreachable locations. They proposed the THz band for EM communication which enables high data transmission. They also explain nanonetwork protocols with channel sharing, information routing, reliability issues and network association. They also proposed the application of IoNT in many domains such as environment monitoring, healthcare and agriculture. Ian Akyilildiz along with others proposed molecular communication in biomedical applications as Internet of bio-nanothings (IoBNT) and Internet of multimedia nanothings considering multimedia [14]. Balasubramaniam et al. discussed two major challenges in IoNT – data collection and middleware development. In the data collection challenge, the proposed solution has focused on system architecture and routing technology. The authors proposed computational topological and graph-based solutions. In the middleware challenge, they focused on the connection between nanosensors with the microgateways. The middleware challenges include system management, data analysis and energy conservation. The authors propose the solution of self-aware mechanisms, distributed systems, graph-based interaction and time scheduling. A few authors targeted high-level applications in IoNT and discuss the role of big data, 5G, intelligence and data analysis frameworks theoretically. They argue for exploiting the data collected from nanosensors for diagnostics, monitoring and treatment with predictive, preventive, personalized and participatory capabilities. Al-Turjaman et al. discussed the challenges in the development of network components, communication and security, and also discussed IoNT architecture in 5G-based big data. Similarly, Dagnaw et al. [15] hypothesized the role of AI with nanotechnology in automobile, medical,
4
Anoop Mishra, Abhishek Tripathi, Deepak Khazanchi
space, communication and military. A summary of IoNT-related major research papers is summarized in Table 1.1.
Table 1.1: Tabular summary of key research papers on IoNT. Reference
Applications
Relevance to IoNT
[]
Nanonetworks
Introduction, network architecture, electromagnetic communication, terahertz band
[]
Theoretical explanations in biomedical applications, environmental monitoring and defense applications
Introduction, network architecture, communication challenges in nanosensor network
[]
Healthcare
IoNT middleware management, challenges, theoretical solutions
[]
Healthcare, security, IoBNT
Overview, survey, capabilities, possibilities, trends in IoNT
[]
Biomedical, environmental, industrial
Overview, state of art, current developments
[]
Healthcare, environmental, agriculture
Overview, challenges in IoNT
[]
Robotics
Nanoscale communication, nanotechnology and artificial intelligence
[]
Healthcare
Wireless nanosensor networks, intrabody disease detection
[]
Food packaging
Detecting antimicrobial agent in integrated packaging using wireless nanosensor networks
[]
G/big data
Theoretical framework for IoNT design factors with big data
[]
Nanonetwork communication combining software defined network, IoT, fog computing, network function virtualization
Use-case for nanoscale communication architecture; research challenges
[]
Industry, biomedical
Surveys, reviews, future and trends of Internet
[]
Biomedical
IoBNT, molecular communication
[]
Healthcare, agriculture, military, public safety
IoNT literature, opportunities and scope
Chapter 1 A framework for applying artificial intelligence (AI)
5
Table 1.1 (continued) Reference
Applications
Relevance to IoNT
[]
Healthcare, agriculture, military, smart city, multimedia, industry
IoNT constraints, security and privacy challenges
[]
Cloud computing, smartwatches, smartphones
Role of radio frequency identification (RFID) systems in emerging IoT and IoNT technologies
[]
In-body nanocommunication for medical and fitness applications
Challenges and opportunities
[]
Industrial applications
Introduction to IoNT
[]
Healthcare applications
Ubiquitous healthcare ecosystem utilizing IoNT
[]
Security and privacy issues in medicine and healthcare
IoNT introduction in healthcare
[]
Communication and networking
Reliability of connected devices and the protection of transmitted data in IoNT
[]
Healthcare: drug delivery and disease detection
Comprehensive survey of the network models of IoNT
[]
Transmission and optimal scheduling in IoNT
IoNT data delivery-DMDS-α algorithm (Distributed Maximum Debt Scheduling)
[]
Dairy farming
Use-case for IoNT
[]
Healthcare, smart cities, food packaging, etc.
Applications of IoNT
[]
IoNT simulation
Study of networking based on molecular communication in IoNT
[]
Biomedical, security and defense
IoMNT
[]
Monitoring purpose
Energy harvesting framework in IoNT
[]
Medical and clinical applications
Workflow, taxonomy, challenges and healthcare architecture in IoNT and IoBNT
[]
Biomedical, defense, environmental and industrial fields
Research challenges, future trends and applications in IoMNT
[]
Robotics
A simulated framework for electromagnetic nanowireless network in the terahertz band
IoT, Internet of things; IoNT, Internet of nanothings; IoBNT, Internet of bio-nanothings; IoMNT, Internet of multimedia nanothings.
6
Anoop Mishra, Abhishek Tripathi, Deepak Khazanchi
From the literature, it is apparent that IoNT implementation still has communication and data challenges. In this chapter, a theoretical AI/ML framework for the general IoNT application is proposed. This framework focuses on data processing and illustrates AI-based ML tools to be leveraged in IoNT applications and future research possibilities in IoNT.
1.3 Description IoNT sensors and devices have a nanoscale range between 1 and 100 nm and connected along with associated Internet services to form nanonetworks. As discussed by Akyildiz et al. 2010, wireless nanosensor networks (WNSN) transfer the data via Internet. Such sensors, when deployed, will consume less power, have the ability to cover larger areas for communication and could be placed in otherwise unreachable locations [Akyildiz et al. 2010, 18]. In the network architecture of IoNT and WNSNs, there are four major basic components: nanonodes, nanorouters, nanomicro-interface devices and gateways. Table 1.2 describes the IoNT architectural component’s overview as drawn from the literature. Nanorouters, interface devices and gateways can transfer the data over long distances because this allows the transmission of data to a localized hub that enables them to emit and receive data (Akyildiz et al. 2010). The classical EM communication cannot be used in IoNT because of the smaller size and properties of nanodevices; hence, EM communication with THz band is mainly focused by researchers. The THz band provides high-speed communication and supports data rates of up to 1 Tbps, which is beyond 5G communication [Akyildiz et al. 2010, 10, 39]. Figure 1.1 shows the network architecture in the healthcare and office environment proposed by Akyildiz et al. 2010. Rizwan et al. [40], Balasubramaniam and Kangasharju [11] and others from Table 1.1 describe the application of IoNT in several fields such as healthcare, agriculture, defense, robotics, transportation, environment monitoring, surveillance and smart homes. As concluded from IoT literature, large data are collected through IoT sensors [2, 3]. Similarly, as large data were produced from nanosensors from the scenario, AI approaches like ML methods in IoNT will be worthy and an important development tool. AI/ML methods can be applied for knowledge extraction and to learn from data for developing decision-making support systems. This can be helpful in monitoring systems, security, control system, routing mechanisms and resource management tasks. Figure 1.2 represents the proposed AI/ML framework for IoNT applications. The proposed framework includes three major layers including data source, data transmission and storage, and decision-making layer. Table 1.3 describes the component distribution within IoNT associated with the layers. Nanonodes and nanorouters are nanodevices that serve as data sources. Due to limited data transmission range in nanonodes, nanorouters and nanomicro-interface processes, as suggested in Akyildiz et al. 2010
Chapter 1 A framework for applying artificial intelligence (AI)
7
Table 1.2: IoNT architecture overview from the literature. Communication Electromagnetic
Classical communication processed using electromagnetic radiation. Terahertz (THz) band is used due to the smaller size and properties of nanodevices. The THz band provides high-speed communication and support data rates of up to Tbps, which is beyond G communication.
Molecular
The information encoded in molecules and processed while communication but possible with specific molecules
Nanonode
Nanosensors and nanodevices that perform limited computation and transmit data over short distances
Nanorouter
Nanodevices with the larger computational capability and control the behavior of nanonodes by actuating control commands, for example, on/off
Nanomicro-interface
Devices for communication purpose and scale the information from nanorouter and nanonodes to microscale. They can perform both nanocommunication and classical communication
Gateway
Devices enable the remote control of the nanosensor network over the Internet
Nanolink
Nanosensors are interconnected
Microlink
Nanomicro-interface interconnection with gateway
Applications
Multimedia, biomedical, security, environment, agriculture
Mainly focusing on monitoring tasks, ex-health monitoring, agriculture, smart homes
Challenges
Communication, data More basic research is required to overcome the storage and transfer, challenges and make it more commercially feasible commercial reach
Architecture components
long-distance data transmission uses Internet gateway. The data collected can be stored and processed further using cloud storage. The AI/ML layer includes data streamline processing and data analytics operations for preprocessing and analyzing the data. The AI/ML engine will process the data through operations like supervised, unsupervised and reinforcement learning algorithms. The desired results will process the actuation information through the control unit to nanomachines for a given IoNT application scenario like smart homes, environmental monitoring, healthcare and robotics. As discussed by Hartman et al. [1] and Selvaraj et al. [2], the communication challenges in IoT scenarios are overcome by AI/ML approaches. One of the key approaches to accomplish this was by analyzing and processing the historical data
Nano-micro interface
Gateway
Micro-link
Nano-node
Nano-router
Nano-link
Healthcare provider
Internet
Gateway Micro-link
Nano-link
Nano-micro interface Nano-router
Nano-node
Internet
8 Anoop Mishra, Abhishek Tripathi, Deepak Khazanchi
Figure 1.1: Network architecture of IoNT proposed by Akyildiz et al. 2010.
Chapter 1 A framework for applying artificial intelligence (AI)
9
Table 1.3: Layers in AI/ML framework within IoNT application architecture. Layers
Components
Data collection
Nanonodes, nanorouters
Data transmission and storage
Nanomicro-interface, gateway, cloud storage, user interface
AI/ML
Data streamline processing, data analytics, AI/ML engine, control unit
collected by the sensors. Since IoNT can be a miniaturized replacement for IoT, we are hypothesizing that AI/ML approaches in IoNT utilizing our proposed AI/ML framework can help overcome challenges like data collection, routing mechanisms, resource management, security and privacy.
1.4 IoNT use-cases 1.4.1 Robotics IoT is having a broad application in robotics and is utilized to solve major problems in the real world [41]. Nanosensors in IoNT range between 1 and 100 nm can be integrated in the robot body architecture and design to overcome the sensor size and weight of the robot autonomy. Weight, assembly, size of the robots and energy consumption are traditional problems for specific tasks. Communication and networking in robot or robot teams is a major challenge too [42]. Boillot et al. [38] discussed that nano-EM communications on graphene nanoantennas will enhance the communication within robots. EM communication will influence transmission facilities. In recent days, AI and ML techniques are broadly applied in robotics for developing fully automated robots to solve complex problems [43]. Utilizing IoNT in robotics will improve the efficacy in terms of communication, assembly and energy consumption. Further proposed AI/ML framework in Figure 1.2 can be utilized for more sophisticated developments in robotics like policy search, smart control and self-assembly.
1.4.2 Bridge health monitoring Akyildiz et al. 2010 mentioned about damage detection systems for bridges and in defense using wireless nanosensors. In this paragraph, we are discussing details of developing bridge health monitoring for steel and concrete bridges as a decision-making support system for bridge health inspectors utilizing IoNT. In
10
Anoop Mishra, Abhishek Tripathi, Deepak Khazanchi
Figure 1.2: AI/ML framework for general IoNT application architecture.
Chapter 1 A framework for applying artificial intelligence (AI)
11
IoNT architecture, nanosensors will be efficient to detect cracks because of their nanoscale property. This crack detection data will be processed on the cloud as shown in Figure 1.2. AI/ML approaches like supervised or reinforcement learning can be applied to formulate bridge deterioration via cracks progression and environmental effects on bridges like rust, corrosion and bleeding as a feature. These formulations and analyses will help bridge health inspectors to take the required steps to provide maintenance and operations on bridges.
1.4.3 Utility industry Advanced metering infrastructure (AMI) in the utility industry is an integrated system of smart meters, communications networks and data management systems that enables two-way communication between utilities and customers, and a collection of periodic data [10]. AMI has an application in the smart home, power grids and forest monitoring. Data collected by AMI is used to calculate per-day-power usage, optimize energy efficiency and for grid outage recovery. As per the American Council for Energy-Efficient Economy annual report, most industrial utility providers failed to deliver the time-of-use pricing, automated demand response and true cost of the energy [10]. The reasons for failure are primarily due to challenges in data collection that resulted in economic loss for customers. This has led companies to not deploy AMI anymore [10]. Furthermore, in the example of the California forest wildfires in 2019, we know that AMI systems failed to transmit accurate information and power grids could not resist damage due to the fire. This resulted in a massive loss of natural and human resources. Prior studies approached these communication and data challenges by using IoT, AI and distributed storage techniques that use large computing and energy resources [44, 45]. However, this problem can be approached at the physical level rather than the system management level by using IoNT as a miniaturized replacement for traditional IoT sensors. IoNT sensors when deployed will consume less power, have the ability to cover larger areas for communication and could be placed in otherwise unreachable locations (Akyildiz et al. 2010). Applying the proposed AI/ML framework will be impactful because along with IoNT it will allow the transmission of data at periodic intervals. The AI/ML approaches on data will be utilized to develop cost-beneficial energy delivery by avoiding data integrity challenges and potentially improve safety measures in case of natural accidents like power outages or weather challenges. Mocrii et al. [46] describe the IoT-based smart home as an application of ubiquitous computing that integrates intelligence of digital operations for comfort, safety and energy conservation. Kethineni [33] illustrates the application of IoNT in smart cities. From the utility industry, the role of IoNT employing AI/ML framework is discussed. The data collected by IoNT sensors can be explored by AI/ML engine and the control unit will manage the operation back to the smart devices. As discussed
12
Anoop Mishra, Abhishek Tripathi, Deepak Khazanchi
above, the IoNT sensors will consume less power and support to develop costbeneficial energy delivery.
1.4.4 Healthcare From Table 1.1, the literature examines healthcare as an important and major application using IoNT. The healthcare application includes drug delivery, disease detection and health information sharing. Akyildiz et al. [14] proposed the IoBNT targeting healthcare. Al-Turjman [20] discussed the big data opportunities in IoNT and Fang et al. [47] discussed the large-scale data produced in healthcare. Considering big data in healthcare and IoNT, the proposed AI/ML framework can be very effective in healthcare management, where activities like data collection, sharing, storage, analysis and developing decision support systems utilizing ML methods can take place.
1.4.5 Edge artificial intelligence In these days, edge computing is integrated with AI with attribution edge AI. It is used for accelerating the inference and prediction using AI/ML models including neural networks [48]. Edge AI optimizes the response time and accelerates AI computation when data are generated and to be inferred. The time for processing the data at the cloud data center consumes a lot of bandwidth resources. In a case where data are generated or collected and simultaneously needed to be inferred, edge AI has been widely recognized as a promising solution because the inference will be done on the device itself rather than on network connections This also allows more privacy, low latency and cost optimization for AI integration into dedicated devices. This is a favorable application in IoNT, where the AI/ML approaches are applied on the location itself. Cloud services can only be used for data transferring and storage, which can be a one-way transmission. Integrating IoNT with edge AI will have a big effect on real-world scenario like robotics, bridge-health monitoring, healthcare, utility industry and smart homes.
1.5 Conclusion Since IoNT research is still at a nascent stage and still far away from commercial usage and research practice, most of the works in relation to IoNT contains similar contents like introduction, overview, architecture, trends, challenges and applications. It is also observed that mostly theoretical applications and frameworks are discussed. In this chapter, a brief literature survey is performed to explain the IoNT
Chapter 1 A framework for applying artificial intelligence (AI)
13
introduction and the current state of research. Since IoNT is a miniaturized replacement for traditional IoT, we proposed an AI/ML-based decision-making framework in a general IoNT application and potential use-case applications. IoNT still requires more research on communication, hardware developments like memory, nanoantennas for THz band and so on. Less power consumption, ability to cover larger areas for communication and unreachable locations make IoNT a reliable cyberphysical system. It can be used in domains like healthcare, defense, agriculture, smart automation, robotics, environment monitoring and surveillance. Many researchers acclaimed research challenges in IoNT usage and application like data collection, routing mechanisms, resource management, security and privacy. The development in IoNT research in interdisciplinary domains is pervasive and has a vital role in applications in the future.
Bibliography [1]
Hartman, W. T., Hansen, A., Vasquez, E., El-Tawab, S., and Altaii, K. (2018, April). Energy monitoring and control using Internet of Things (IoT) system. In 2018 Systems and Information Engineering Design Symposium (SIEDS) (pp. 13–18). IEEE. [2] Selvaraj, S. and Sundaravaradhan, S. (2020). Challenges and opportunities in IoT healthcare systems: A systematic review. SN Applied Sciences, 2(1), 139. [3] Shirvanimoghaddam, M., Dohler, M., and Johnson, S. J. (2017, Sept). “Massive nonorthogonal multiple access for cellular IoT: Potentials and limitations,” In IEEE Communications Magazine, vol. 55, no. 9, pp. 55–61. [4] Hussain, F., Hassan, S. A., Hussain, R., and Hossain, E. (2020). Machine learning for resource management in cellular and IoT networks: Potentials, current solutions, and open challenges. IEEE Communications Surveys & Tutorials. [5] Din, I. U., Guizani, M., Hassan, S., Kim, B. S., Khan, M. K., Atiquzzaman, M., and Ahmed, S. H. (2018). The Internet of Things: A review of enabled technologies and future challenges. IEEE Access, 7, 7606–7640. [6] Younan, M., Houssein, E. H., Elhoseny, M., Ali, A. A. (2020). Challenges and recommended technologies for the industrial Internet of things: A comprehensive review. Measurement, 151, 107198. [7] Kolářová, L. and Rálišová, E. (2017, January). The concepts of nanotechnology as a part of physics education in high school and in interactive science museum. In AIP Conference Proceedings (Vol. 1804, No. 1, p. 040005). AIP Publishing LLC. [8] Akyildiz, I. F. and Jornet, J. M. (2010). The internet of nano-things. IEEE Wireless Communications, 17(6), 58–63. [9] Al-Rawahi, M. N., Sharma, T., and Palanisamy, P. N. (2018). Internet of nanothings: Challenges & opportunities. 2018 Majan International Conference (MIC), 1–5. [10] John, S. Why Most US Utilities Are Failing to Make the Most of Their Smart Meters?, Greentechmedia, 2020 Jornet, J. M., & Akyildiz, I. F. (2013). Graphene-based plasmonic nano-antenna for terahertz band communication in nanonetworks. IEEE Journal on Selected Areas in Communications, 31(12), 685–694.
14
[11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]
[23] [24]
[25]
[26]
[27]
[28] [29]
[30]
Anoop Mishra, Abhishek Tripathi, Deepak Khazanchi
Balasubramaniam, S. and Kangasharju, J. (2012). Realizing the internet of nano things: Challenges, solutions, and applications. Computer, 46(2), 62–68. Cruz Alvarado, M. A. and Bazán, P. (2019). Understanding the Internet of Nano Things: Overview, trends, and challenges. E-Ciencias de la Información, 9(1), 152–182. Akyildiz, I. F. and Jornet, J. M. (2010). Electromagnetic wireless nanosensor networks. Nano Communication Networks, 1(1), 3–19. Akyildiz, I. F., Pierobon, M., Balasubramaniam, S., and Koucheryavy, Y. (2015). The internet of bionano things. IEEE Communications Magazine, 53(3), 32–40. Dagnaw, G. A. and Endeshaw, G. G. (2019). Current Trends of Artificial Intelligence in Nanosciences Application. Nuclear Science, 4(4), 60. Mohammad, H. and Shubair, R. M. (2019). Nanoscale communication: State-of-art and recent advances. arXiv preprint arXiv:1905.07722. Nayyar, A., Puri, V., and Le, D. N. (2017). Internet of nano things (IoNT): Next evolutionary step in nanotechnology. Nanoscience and Nanotechnology, 7(1), 4–8. Lee, S. J., Jung, C., Choi, K., and Kim, S. (2015). Design of wireless nanosensor networks for intrabody application. International Journal of Distributed Sensor Networks, 11(7), 176761. Fuertes, G., Soto, I., Vargas, M., Valencia, A., Sabattin, J., and Carrasco, R. (2016). Nanosensors for a monitoring system in intelligent and active packaging. Journal of Sensors, 2016. Al-Turjman, F. (2020). Intelligence and security in big 5G-oriented IoNT: An overview. Future Generation Computer Systems, 102, 357–368. Galal, A. and Hesselbach, X. (2018). Nano-networks communication architecture: Modeling and functions. Nano Communication Networks, 17, 45–62. Atlam, H. F., Walters, R. J., and Wills, G. B. (2018, August). Internet of nano things: Security issues and applications. In Proceedings of the 2018 2nd International Conference on Cloud and Big Data Computing (pp. 71–77). Miraz, M. H., Ali, M., Excell, P. S., and Picking, R. (2018). Internet of nano-things, things and everything: Future growth trends. Future Internet, 10(8), 68. Singh, R., Singh, E., and Nalwa, H. S. (2017). Inkjet printed nanomaterial based flexible radio frequency identification (RFID) tag sensors for the internet of nano things. RSC advances, 7(77), 48597–48630. Dressler, F. and Fischer, S. (2015). Connecting in-body nano communication with body area networks: Challenges and opportunities of the Internet of Nano Things. Nano Communication Networks, 6(2), 29–38. El-Din, H. E. and Manjaiah, D. H. (2017). Internet of nano things and industrial internet of things. In Internet of Things: Novel Advances and Envisioned Applications (pp. 109–123). Springer, Cham. Ali, N. A. and Abu-Elkheir, M. (2015, October). Internet of nano-things healthcare applications: Requirements, opportunities, and challenges. In 2015 IEEE 11th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob) (pp. 9–14). IEEE. Maksimović, M. (2017). The roles of nanotechnology and internet of nano things in healthcare transformation. TecnoLógicas, 20(40), 139–153. Sicari, S., Rizzardi, A., Piro, G., Coen-Porisini, A., and Grieco, L. A. (2019). Beyond the smart things: Towards the definition and the performance assessment of a secure architecture for the Internet of Nano-Things. Computer Networks, 162, 106856. Ali, N. A., Aleyadeh, W., and AbuElkhair, M. (2016, September). Internet of nano-things network models and medical applications. In 2016 International Wireless Communications and Mobile Computing Conference (IWCMC) (pp. 211–215). IEEE.
Chapter 1 A framework for applying artificial intelligence (AI)
15
[31] Akkari, N., Wang, P., Jornet, J. M., Fadel, E., Elrefaei, L., Malik, M. G. A., . . . Akyildiz, I. F. (2016). Distributed timely throughput optimal scheduling for the Internet of Nano-Things. IEEE Internet of Things Journal, 3(6), 1202–1212. [32] Bhargava, K., Ivanov, S., and Donnelly, W. (2015, September). Internet of nano things for dairy farming. In Proceedings of the Second Annual International Conference on Nanoscale Computing and Communication (pp. 1–2). [33] Kethineni, P. (2017, April). Applications of internet of nano things: A survey. In 2017 2nd International Conference for Convergence in Technology (I2CT) (pp. 371–375). IEEE. [34] Raut, P. and Sarwade, N. (2016, March). Study of environmental effects on the connectivity of molecular communication based Internet of Nano things. In 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET) (pp. 1123–1128). IEEE. [35] Jornet, J. M. and Akyildiz, I. F. (2012). The internet of multimedia nano-things. Nano Communication Networks, 3(4), 242–251. [36] Hassan, N., Chou, C. T., and Hassan, M. (2019). eNEUTRAL IoNT: Energy-neutral event monitoring for Internet of nano things. IEEE Internet of Things Journal, 6(2), 2379–2389. [37] Pramanik, P. K. D., Solanki, A., Debnath, A., Nayyar, A., El-Sappagh, S., and Kwak, K. S. (2020). Advancing modern healthcare with nanotechnology, nanobiosensors, and internet of nano things: Taxonomies, applications, architecture, and challenges. IEEE Access, 8, 65230–65266. [38] Boillot, N., Dhoutaut, D., and Bourgeois, J. (2014, May). Using nano-wireless communications in micro-robots applications. In Proceedings of ACM the first annual international conference on nanoscale computing and communication (pp. 1–9). [39] Akyildiz, I. F., Nie, S., Lin, S. C., and Chandrasekaran, M. (2016). 5G roadmap: 10 key enabling technologies. Computer Networks, 106, 17–48. [40] Rizwan, A., Zoha, A., Zhang, R., Ahmad, W., Arshad, K., Ali, N. A., . . . Abbasi, Q. H. (2018). A review on the role of nano- communication in future healthcare systems: A big data analytics perspective. IEEE Access, 6, 41903–41920. [41] Afanasyev, I., Mazzara, M., Chakraborty, S., Zhuchkov, N., Maksatbek, A., Yesildirek, A., . . . Distefano, S. (2019, October). Towards the internet of robotic things: Analysis, architecture, components and challenges. In 2019 12th International Conference on Developments in eSystems Engineering (DeSE) (pp. 3–8). IEEE. [42] Yim, M., Shen, W. M., Salemi, B., Rus, D., Moll, M., Lipson, H., . . . Chirikjian, G. S. (2007). Modular self-reconfigurable robot systems [grand challenges of robotics]. IEEE Robotics & Automation Magazine, 14(1), 43–52. [43] Mishra, A. (2019). Intelligent Human-aware Decision Making for Semi-autonomous Human Rehabilitation Assistance Using Modular Robots (Doctoral dissertation, University of Nebraska at Omaha). [44] Ali, S., Mansoor, H., Khan, I., Arshad, N., Khan, M. A., and Faizullah, S. (2019). Hour-Ahead Load Forecasting Using AMI Data. arXiv preprint arXiv:1912.12479. [45] Boccadoro, P. (2018). Smart Grids empowerment with Edge Computing: An Overview. arXiv preprint, arXiv:1809.10060. [46] Mocrii, D., Chen, Y., and Musilek, P. (2018). IoT-based smart homes: A review of system architecture, software, communications, privacy and security. Internet of Things, 1, 81–98. [47] Fang, R., Pouyanfar, S., Yang, Y., Chen, S. C., and Iyengar, S. S. (2016). Computational health informatics in the big data age: A survey. ACM Computing Surveys (CSUR), 49(1), 1–36.
16
Anoop Mishra, Abhishek Tripathi, Deepak Khazanchi
[48] Li, E., Zeng, L., Zhou, Z., and Chen, X. (2019). Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications, 19(1), 447–457. [49] Jornet, J. M. and Akyildiz, I. F. (2012, April). The internet of multimedia nano-things in the Terahertz band. In European Wireless 2012, 18th European Wireless Conference 2012 (pp. 1–8). VDE.
Sujith Jayaprakash, V. Kathiresan, N. Shanmugapriya, Manish Dadhich
Chapter 2 Opportunities and challenges in transforming higher education through machine learning Abstract: In the twenty-first century, technology has become the driving force of education. COVID-19 pandemic has disrupted the education system and has made educators to rephrase, rethink and re-strategize the model of the current education system. Recently, there has been a paradigm shift toward online teaching and learning right from the primary schools to the university. Embracing technology for a quality higher education is becoming a new norm. Several research works are ongoing to understand how the latest advancement in technology can change the future of education. Industries such as healthcare, financial services, retail and automobile are widely using the disrupting technologies like big data, artificial intelligence, machine learning, deep learning, virtual reality and augmented reality for marketing, sales, customer engagement and many more. Despite several research works and recommendations in the field of education, there has not been a significant improvement in the use of these technologies to improve the quality of higher education. In this chapter, a 360-degree approach to education model using machine learning is discussed and also a conceptual framework has been proposed on how machine learning can be adopted in the different stages of higher education to enhance the quality of education delivery and prepare job-ready graduates. Keywords: machine learning, education data mining, higher education, COVID-19, framework, artificial intelligence
Sujith Jayaprakash, BlueCrest University College, Accra, Ghana, e-mail: [email protected] V. Kathiresan, Department of Computer Applications, Dr. S.N.S Rajalakshmi College of Arts and Science, Coimbatore – 641409, Tamil Nadu, India, e-mail: [email protected] N. Shanmugapriya, Department of Computer Applications (PG), Dr. S.N.S Rajalakshmi College of Arts and Science, Coimbatore – 641409, Tamil Nadu, India, e-mail: [email protected] Manish Dadhich, Sir Padampat Singhania University, Udaipur, India, e-mail: manish.dadhich@ spsu.ac.in https://doi.org/10.1515/9783110702514-002
18
Sujith Jayaprakash et al.
2.1 Introduction One of the sustainable development goals of the United Nations is quality education. Today, the global literacy rate has reached 90% for all males and 82.7% for all females. Education plays a significant role in the development of a nation. One of the influential factors in the increase in literacy rate is the accessibility to education. In the nineteenth and twentieth centuries, information and communication technology (ICT) transformed the education delivery and became a catalyst in increasing the literacy rate in the underdeveloped countries. Increased accessibility to education materials, rendering local contents and providing platforms for professional development to teachers were some of the positive transformational moves ICT has brought in the education system. In the twenty-first century, technology has become the driving force of education. Especially during the COVID-19 pandemic, there has been a paradigm shift toward online teaching and learning right from the primary schools to the university. Several research works are ongoing to understand how the latest advancement in technology can change the future of education. Developed countries like the United States and some of the European countries have adopted advanced technological tools to strengthen the academic delivery in both on and off-campus. These countries are supporting the EdTech startups that can change the dynamics of the education model more specifically in the higher education sector [1]. Recently, there is an increasing amount of research work happening in the field of education data mining using machine learning (ML) algorithms. ML algorithms can bring a revolutionary change in the current education system due to its predictive analysis. Using algorithms, institutions can predict student performance, attract right students, forecast enrolments, predict faculty performance, enhance academic delivery, predict graduate output and many more. One of the approaches of experiential learning is the flipped classroom which demands students to learn the course contents outside the classrooms through Learning Management System (LMS) and apply the knowledge inside the classroom. Naidu et al. [2] used ML algorithms in such teaching and learning processes to analyze student involvement during online learning and also to assess their knowledge level. Following are the facets of the education model that adopts ML to enhance the higher education system. Identifying the right students based on their interest: – Predicting student enrolment – Personalized learning – Enhancing the learning behavior of students – Student retention – Student prediction – Improving the learning experience through chatbot and personalized search engines – Graduation and employment
Chapter 2 Opportunities and challenges in transforming higher education
19
2.2 Machine learning Technological advancements are happening at breakneck pace in every industry and the major reason behind this is artificial intelligence (AI). There has been an increasing argument going on in every sector that AI will soon replace humans in many jobs. Gone were those days the web platforms were dependent on telecallers and agents to provide product description and price but within no time they were replaced by chatbots built using AI. Similarly, repetitive tasks that need utmost accuracy and precision are mostly handled by machines. Even surgical robots are used in the medical industry to perform surgeries that are complicated and that require exhaustive repetitive tasks. All these are becoming highly possible because of the complex algorithms and a massive amount of data in performing operations. ML, which is a subset of AI, applies algorithms on the massive datasets to find meaningful insights. One of the emerging fields in the education sector is academic analytics and education data mining which uses the massive amount of student dataset to find meaningful insights about student performance, course preferences, clustering students based on their behavior and so on. The data collection in the higher education sector using Enterprise Resource Planning (ERP) and LMS platforms is helping in the ongoing researches and coming up with solutions that can be widely implemented in the education institutions across the world. According to Technnavio, a market research company based in the United States has projected that AI in the US Education Sector will grow at a compound rate of 47.7% from 2018 to 2022 [3].
2.3 A conceptual framework The conceptual framework helps in providing a better direction to the research work. It clearly illustrates different variables, its associations and how it contributes to the research. In the field of Master of Education (EDM), several frameworks have proposed to identify the students at risk, predicting their performance and identifying the influential factors that can affect the performance of students and many more. These frameworks are a guideway to progress the research. Adejo and Connolly [4] proposed a holistic framework for predicting student academic performance and intervention in higher education institutions. The framework has designed to produce a comprehensive and unbiased ML model that can help the institution in predicting the performance of students. Nafea [5] proposed a framework that uses ML to enhance the personalized learning of students through virtual assistance. Student can interact with a virtual assistant to clear his/her doubts and at the backend virtual assistant will use ML algorithms to construct an appropriate response for the query. This framework helps in addressing the student needs immediately and supports in academic performance.
20
Sujith Jayaprakash et al.
Goga et al., 2015 [6] proposed a framework of an intelligent recommender system that can predict the performance of students in the first year. Performance of students is predicted based on their age, gender, parent’s marital status, education background and student’s grade point average (GPA). Timms [7] proposed that the field of AI in education can be a gamechanger, which introduces new ways of learning and teaching with the help of data through wearable devices and IoT devices in classrooms. In this research work, two main areas are examined: first in robotics and then in smart classrooms. Ranjan and Khalil [8] proposed a
A CONCEPTUAL FRAMEWORK ON 360 DEGREE IMPLEMENTATION OF AI IN EDUCATION USER INTERFACE LAYER
USER ROLES
Prospect
↓
Student
Alumni
Admin
↑
Professor
↓
↑
APPLICATIONS
↓
Learning management System
Web Apps
↑
Mobile Apps ↓
Chatbot ↑
Graduate Data
Student
Extracurricular activities
Applicant ↓
Learning analytics
↑
Skill Portfolio ↓
↑
Prediction
Figure 2.1: Conceptual framework.
Analytics
Visualization Sentimental Clustering Analysis
EXECUTION LAYER
MACHINE LEARNING MODELS
Classification Association
DATABASE LAYER
DATABASES
Academic Performance
Prospect
DESIGN LAYER
Smart Classroom
Chapter 2 Opportunities and challenges in transforming higher education
21
conceptual framework of the data mining process in management education. The framework has designed to help institutions in enhancing student recruitment, admissions and courses, quality of recruits, assessment and evaluation and many more. Considering the reach of AI in education sector today, this research proposes a conceptual framework model that will help the students, teachers and the institution as a whole in various facets of academia. From Figure 2.1, İt is evident that AI is the tool to strengthen the futuristic education that focuses more on skill development and employability. Universities across the globe are thriving to implement the new education model dubbed “Education 4.0” which focuses more of the way students are taught inside the classroom. The futuristic and disrupting education model is solely focusing on improving learning of students, enhancing the skills and preparing a future-ready graduates with the use of technology and advanced gadgets.
2.4 Use of machine learning in student recruitment and student’s selection of university A few years ago, higher education institutions were targeting their prospects through some forms of conventional marketing tools such as radio, television, billboard and flyer distribution. But today, social media marketing platforms such as Facebook Adverts, Google Adwords and many digital marketing platforms have changed the scenario. It was hard in the past to identify the source of a lead, but through digital adverts, universities and other businesses can easily target the right audience and generate leads. The underlying concept behind targeting the right people through these digital adverts is AI or ML. Finding the right university to achieve a dream career is a challenging task for any individual, especially when there are more number of universities with more attractive programs. ML algorithms used in the social media adverts target prospects through the following: a. Search history b. Keywords c. Characteristics d. Age e. Interests f. Hobbies g. Most visited sites, etc. h. Demography For example, if a potential candidate searches for “Best IT School” in his/her region, then Google Adwords directs all the adverts of IT schools around that vicinity which matches the interest of the candidate. However, through paid adverts,
22
Sujith Jayaprakash et al.
recommendations are manipulative. In this way, a candidate gets everything handy, and the information is available on time to take the right decision. Data analytics plays an important role in the recruitment process as the student acquisition cost is predominantly high in many of the institutions. On average, any higher education institution in the UK or other developed countries spends USD 2000 or above as the cost of recruitment for a student. By applying ML algorithms on the leads generated through various adverts, meaningful insights can be drawn that can help the institutions to increase their enrolment rates. ML can identify the influential parameters that lead to student enrolment. So, institutions can identify these parameters and rework on their financials to reduce student acquisition cost. Using ML institutions can also identify patterns in student recruitment. There could be a group of students from a particular region with an “A” grade in mathematics may prefer pursue only “Mechanical Engineering” program and this study can help an institution in targeting more students for this program from that region who are having “A” grade in mathematics. Also, there could be a scenario where parents’ income is less than 10 K dollar per year is not admitting their wards in the institution; in such cases, university can identify those cases and provide them brilliant but need-based scholarships to improve the quality of the recruits. Jamison [9] made a novel attempt to identify the declining rate of accepted students not taking admission into the college. A model has been built to identify the yield rate of an admission session to make more informed, tactical admission decisions. Puri [10] used neural networks to predict the enrolment of students based on the effectiveness of counseling. Few key parameters identified in this research were career guidance, academic strength, location, hostel facilities, infrastructure and the market value. Young [11] developed a supervised learning model to predict the admission of students in the physics graduate school. Applicant’s undergraduate GPA and physics graduate record examinations (GRE) score were the influential parameters in student admission. Different works were carried out in predicting the student recruitment; however, more can be achieved using the data obtained from the ERP. Some of the advantages of employing ML algorithms in student recruitments are: – ML models can generate quality leads to improve student’s enrolment. – ML models can predict potential candidates for enrolment and focus more on them to reduce student acquisition cost. – ML models can create different patterns of potential recruits and provide meaningful insights into institutions. – ML models can provide early prediction of a candidate’s academic performance and help the institution to take a timely decision on the admission. – ML models can predict the retention rate of a potential lead based on records.
Chapter 2 Opportunities and challenges in transforming higher education
23
2.4.1 Use of machine learning in personalizing learning As modern education is taking different shapes and forms, personalized learning is becoming popular due to its efficacy. Based on the student’s interest and the knowledge, a customized learning plan has made for every individual through personalized learning educational approach. It makes learning attractive, and also it produces quality graduates. However, in implementing this model, the teacher has to make some strenuous efforts in developing customized contents and provide individual support. ML helps in automating this process. As personalized learning demands learner’s to take the complete control of their learning, a timely feedback and intervention mechanism is required to guide them and achieve the objective of the course. Through ML, skill gaps can be identified and recommended to a learner on a timely basis. By taking the learners feedback, assessing learners pace and reviewing the past academic performance, ML models can recommend contents that suit the interest of the learner. These mechanisms will help in curbing the student dropout due to poor academic performance. As the higher education system is evolving day by day, the perception of students toward learning is also changing at an equal pace [12]. In this era, students have a consumerist ethos toward higher education and the wanting of “value for money” is bringing up a gargantuan change in teaching as well as learning. New teaching methods, new philosophies and new pedagogies are springing up at a constant speed that can eventually contribute to student success and satisfaction. One such move in recent times is the choice-based learning. Aher and Lobo, 2013 [13] used mining techniques such as clustering and association rule algorithm to build a recommender system that recommends courses to a student based on the choices of other students opted for a similar program. The data for this research is taken from the MOODLE LMS platform and clustering and association mining techniques such as K-means and a priori are used to build the recommender model. The proposed framework in this model used 82 courses in computer science and engineering and information technology which are then categorized into 13-course categories. For example, data structure, design and analysis of algorithms, formal system and automation are classified as theoretical computer sciences. A priori algorithm is used in this model to find the best combination of courses and then students are clustered based on their interest. As shown in Figure 2.2 an automated choice-based credit recommender model can be built to aid students in choosing the core, elective and ability enhancement courses based on their interest and academic records. This will be more like a personalized approach where the student will be able to make a decision based on the recommendations from the model. Furthermore, this model can provide the pros and cons of selecting modules based on the learner’s strength and expectation.
24
Sujith Jayaprakash et al.
Figure 2.2: Choice-based credit model using machine learning.
2.4.2 Use of machine learning in predicting academic performance of students Predicting the performance of students at the early stage helps in reducing student dropouts and produce high-quality graduates. Many researchers have proposed various ML models to predict the performance of students in tertiary programs. Rastrollo-Guerrero et al. [14] reviewed over 70 research papers to identify the modern techniques used in predicting the academic performance of students. Data
Chapter 2 Opportunities and challenges in transforming higher education
25
mining techniques are used for taking intrinsic decisions with the help of historic data. In the education data mining, these decisions range from how student’s success will be defined, which are the most influential factor that determines the success or a failure of students, the most appropriate algorithm to build a successful model and so on [15]. Research shows that majority of the works are focused on tertiary students. Collaborative filtering algorithms are considered to be the next successful technique after recommender systems. Few factors that were identified by the researchers who could influence the performance of students are their gender, parents social status, traveling distance, use of social media, use of drugs and toxic materials, use of library facilities, study hours, performance in their high school and in the first year of college and many more [16]. These factors are largely discussed in various research works, and solutions have been recommended. Classification algorithms are used in building these models. Majority of the research works are carried out in the e-learning environment due to the enormous amount of data available and a few works are carried out in the traditional teaching and learning environment. Recommender models proposed in various research works help identify the influential factors in student progression and cluster the students based on their performance (Figure 2.3). From the recommendations produced by the model, institutions can create a focus group by identifying the poor performing students and train them for academic success. This will help institutions in producing quality graduates and improve the student retention rate.
2.4.3 Use of machine learning in successful alumni management Alumni community is one of the strongest pillars of any educational institutions. As more and more universities are proliferating on yearly basis, the choices for students are also increasing. Any higher education institution must maintain a strong alumni base and increase good word of mouth to sustain their students or to attract more students to the university. Hence, alumni relations play a major role. As graduate unemployment is increasing at a rapid pace, it is the responsibility of the institutions to care for their graduates and find possible placement opportunities for them. Also, the alumni community provides alumni members with a strong network of professionals who could be potential employers. Finding the influential alumni members and connecting them with a job-seeking alumni community is a daunting task for a university that has more than 10,000 graduates. ML helps in solving these problems by building models that can cluster alumni who are job seekers and profile the members based on their skills. Markovska and Kabaivanov [17] and Vyas et al. [18] used support vector machine and K-means clustering algorithm to cluster the alumni based on their work and targeting them for a particular event or a project. Several researches have in recent times recommends an automated alumni engagement system using AI which can help the institution to focus on and optimize resource
Figure 2.3: Recommender model to predict performance of students.
26 Sujith Jayaprakash et al.
Chapter 2 Opportunities and challenges in transforming higher education
27
on alumni, hyperpersonalize omnichannel experience and target less engaged prospects to make them active in the community. Using ML model institutions can also identify skill gaps and recommend programs to the alumni. Such activities not only enhance the quality of the institution but also help the universities to improve their revenue.
2.5 Challenges in using AI to enhance the education sector One of the concerns in implementing AI-based models in the higher education system is the unavailability of data. Research works conducted in this field has highly focused on predicting student performance as those data are available for any institution. Student recruitment prediction, alumni management and personalized learning are some of the aspects that lack the focus to a greater extent in the current education system. Not many universities are keen on collecting intrinsic data from the applicants. Also, in traditional classroom teaching, it is quite impossible to focus on individual performance of students during class hours. Even if technical gadgets like CCTV and motion sensor cameras are in place to monitor their performance, the processing and analyzing of such data is a cumbersome process. Also, the installation cost of these gadgets is not affordable to many universities. However, in the online learning environment, it is feasible to implement the AI-based models in large scale due to its nature. In the online environment, it is easy to identify a student at risk of failing, attrition rate, interest toward a program, other interests and others. Online education comes with its challenges like security issues and impersonation. Also, there is a lack of policy in the usage of educational data. Hence, there is no proper guidance or clarity on moving ahead with an AI-based solution for education institutions.
2.6 Conclusion Future of education will be around AI as it helps to attain the student-centric approach to a greater extent. This chapter discussed the different aspects of the higher education sector and how AI can embrace to strengthen each of these aspects. Student enrolment, personalized learning, predicting student performance, addressing the dropout issues and engaging alumni are the key aspects discussed in this chapter. A conceptual framework is proposed in this chapter helps build a comprehensive system that focuses on student academic life. Many of the current research works are on predicting the performance of students or on personalized learning. Only a few research works focus on alumni engagement or student recruitment
28
Sujith Jayaprakash et al.
using ML algorithms. Lack of data and a proper system in place are the challenges or setback in implementing the proposed model. Today, many universities have slowly started using the learning management system that can provide academic data from students. Some of the tech industries like IBM, Oracle and Microsoft have come up with cloud-based solutions that can help institutions to measure student success, aid lecturers in teaching and learning, provide support in academic research and so on. Not every institution can afford these solutions due to the cost involved and the lack of resources to capture the required data. However, more cost-effective solutions should be introduced by the tech industries that focus on all aspects of higher education. Universities must take strident efforts in building systems and resources to capture the data of students and find meaningful insights to strengthen the academic success of students. The main objective of this research was to identify the ways and means of embracing ML in the higher education system to make it sustainable. Due to COVID-19 many of the institutions have adopted online teaching and learning that will pave ways for implementing AI solutions in future. The proposed AI solutions in this chapter help build a system that can strengthen the education system and make it more sustainable.
References [1]
[2] [3] [4]
[5] [6]
[7]
[8]
[9]
Kant Hiran, K., Henten, A., Shrivas, M. K., and Doshi, R. (2018). Hybrid EduCloud model in higher education: The case of Sub-Saharan Africa, Ethiopia. In 7th International Conference on Adaptive Science & Technology (ICAST) [8507113] IEEE. https://doi.org/10.1109/ ICASTECH.2018.8507113. Naidu, V. R., Singh, B., Farei, K., and Suqri, N. (2020). Machine learning for flipped teaching in higher education – A reflection. Doi: 10.1007/978-3-030-32902-0_16. Patel, P. (2020). EduPlusNow. Retrieved 30 September 2020, from https://www.eduplusnow. com/blog/how-artificial-intelligence-is-changing-the-future-of-education. Adejo, O. and Connolly, T. (2017). An integrated system framework for predicting students' Academic performance in higher educational institutions. International Journal Of Computer Science And Information Technology, 9(3), 149–157, Doi: 10.5121/ijcsit.2017.93013. Nafea, I. (2018). Machine learning in educational technology. Doi: 10.5772/ intechopen.72906. Goga, M., Kuyoro, S., and Goga, N. (2015). A Recommender for Improving the Student Academic Performance. Procedia – Social and Behavioral Sciences, 180, 1481–1488, Doi: 10.1016/j.sbspro.2015.02.296. Timms, M. J. (2016). Letting artificial intelligence in education out of the box: educational cobots and smart classrooms. The International Journal of Artificial Intelligence in Education, 26, 701–712. Doi: https://doi.org/10.1007/s40593-016-0095-y. Ranjan, J. and Khalil, S. (2008). Conceptual framework of data mining process in management education in India: An institutional perspective. Information Technology Journal, 7, 16–23. Jamison, J. (2017). Applying Machine Learning to Predict Davidson College's Admissions Yield. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science
Chapter 2 Opportunities and challenges in transforming higher education
[10] [11]
[12] [13]
[14]
[15]
[16]
[17] [18]
29
Education (SIGCSE '17). Association for Computing Machinery, New York, NY, USA, 765–766. Doi: https://doi.org/10.1145/3017680.3022468. Puri, P. and Kohli, M. (2007). Forecasting Student Admission in Colleges with Neural Networks. Young, N. T. and Caballero, M. D. “Using machine learning to understand physics graduate school admissions.” ArXiv:1907.01570 [Physics], Sept. 2019. arXiv.org, http://arxiv.org/abs/ 1907.01570. Kandiko, C. B. and Mawer, M. (2013). Student Expectations and Perceptions of Higher Education. Kings Institute, London. Aher, S.B. and Lobo, L.M.R.J. (2013). Combination of machine learning algorithms for recommendation of courses in E-Learning System based on historical data, Knowledge-Based Systems, 51 1–14, ISSN 0950-7051 Rastrollo-Guerrero, J. L. and Gómez-Pulido, J. A., Durán-Domínguez, A. (2020). Analyzing and predicting students’ performance by means of machine learning: A review. Applied Sciences, 10, 1042. Alyahyan, E. and Düştegör, D. (2020). Predicting academic success in higher education: literature review and best practices. The International Journal of Technology in Education, 17 (3). Doi: https://doi.org/10.1186/s41239-020-0177-7. Hiran, K. K. and Henten, A. H. (2020). An integrated TOE–DoI framework for cloud computing adoption in the higher education sector: case study of Sub-Saharan Africa, Ethiopia. International Journal of System Assurance Engineering and Management, 11(2), 441–449, Doi: https://doi.org/10.1007/s13198-019-00872-z. Markovska, V. and Kabaivanov, S. (2017). Improving alumni network efficiency with machine learning. Trakia Journal of Sciences, 15, 115–119. Doi: 10.15547/tjs.2017.s.01.021. Vyas, A., Dhiman, H., and Hiran, K. (2021). Modelling of symmetrical quadrature optical ring resonator with four different topologies and performance analysis using machine learning approach. Journal of Optical Communications. Doi: https://doi.org/10.1515/joc-2020-0270.
Radha Guha
Chapter 3 Efficient renewable energy integration: a pertinent problem and advanced time series data analytics solution Abstract: In this chapter, a pertinent problem of renewable energy source integration to main utility power, requiring a smart grid is brought up and its solutions are investigated from a data scientist’s perspective. A smart grid design aims for renewable energy integration with traditional electricity supply with a more efficient, data-driven autonomous control mechanism for better quality service at an optimal cost to end-users. This digital transformation of the power system needs the reskilling of software engineers as data scientists. From a data scientist’s perspective, it is a problem of efficiently generating accurate forecasting of energy consumption of a household or a larger region from past observations for uninterrupte1d power supply at optimal cost. In this paper, the traditional statistical method of autoregressive integrated moving average model for time series (TS) data analysis and forecasting is discussed first. Then, for more accurate forecasting for nonlinear TS data, deep learning approach of artificial neural network which are multilayer perceptron and long short-term memory recurrent neural network models are discussed and experimented with. To support the discussion, performance metrics for all three algorithms are calculated for four-TS datasets of varied length and structure and compared in terms of their salient features, complexity and accuracy. Keywords: renewable energy integration, time series forecasting, machining learning, ARIMA model, deep learning, MLP-ANN, LSTM-RNN
3.1 Introduction In recent times, we have realized that we cannot sustain our natural resources like coal, oil and gas forever for energy production [1, 2]. Moreover, the burning of coal, oil and gas are creating a harmful greenhouse effect that is causing global warming and climate change. To combat this climate change, in 2015 Paris climate agreement, world leaders of 189 countries have vowed to take steps to cut down on greenhouse gases like CO2 emission from fossil fuel and use alternate renewable energy
Radha Guha, Department of Computer Science Engineering, SRM University AP, Andhra Pradesh, India, e-mail: [email protected] https://doi.org/10.1515/9783110702514-003
32
Radha Guha
sources (RESs) like solar, wind and hydro, which have no harmful effects. Right now, China is the biggest investor in the renewable green energy sector, India being the second and the US being the third. More and more countries are investing in this green energy industry as it is eco-friendly and will be profitable in the long run. This is a $1,000 billion US dollar industry now. India’s journey to renewable energy [3] started much earlier in the 1980s and has the ambitious goal of producing 300 GW of electricity from renewables by the year 2030. Solar photovoltaic (PV) panels, wind turbines and water dams can generate electricity without harmful CO2 emission and have very little cost of operation. This green and sustainable energy can improve our lives with clean air, better health and access to energy in rural India with more energy security. Even though the operating cost is low, the capital cost of production from green sources is high. Thus, it is not on a large scale right now and is also intermittent for unpredictable weather. Still, we can use it for a local community of residential and office building lighting and heating, powering electric cars or storing it in batteries. The energy demand is ever-increasing and only 20% of global electricity demand is now met by green energy. In the world, over a hundred cities are 75% run by green energy today. Cost of renewable energy technology, that is, cost of PV panels and wind turbines are becoming cheaper over the years and so its electricity generation capacity is increasing. It is expected that in another 30 years, that is, by 2050, two-thirds of global electricity demand will be generated from renewable clean sources. This shift in technology is changing the physical appearance of building architecture with rooftop solar panels and vast land with solar panels and wind turbines. Cities implementing green energy need smart grids to integrate both the sources of energy to get uninterrupted power supply and to optimize resource management by the datadriven control system. The smart grid design is a multidisciplinary field of research, but I have taken the role of a data scientist to discuss and present this problem in the rest of the paper. This chapter contains four sections as follows. This section investigates the pertinent problem of the design of a cost-effective solution for the integration of renewable energy with traditional energy integration. Section 3.2 discusses the salient features of a time series (TS) first and then introduces the traditional autoregressive integrated moving average (ARIMA) model for time series forecasting. In Section 3.3, the artificial neural network (ANN)-based multilayer perceptron (MLP) model and long short-term memory (LSTM) recurrent neural network (RNN) model are discussed. Section 3.4 introduces five important measures of performance evaluation of TS forecasting first. Then, Diebold Mariano (DM) test for evaluating the statistical significance of forecasting differences by different algorithms are introduced. Then, it presents experiment results of applied algorithms on four datasets of varied size and structure and compares them. Section 3.5 concludes the work presented in this chapter and emphasizes that deep learning approach proves promising for higher accuracy for nonlinear TS data analytics.
Chapter 3 Efficient renewable energy integration
33
3.2 Renewable energy: advantages, challenges and motivation Traditional power plants from fossil fuels require expensive infrastructure, large area and are built far away from cities and localities. There are few centralized power plants in a country and the power is transmitted over long-distance wires to the distribution sites, thus incurring a huge amount of heat emission loss and is inefficient. In contrast, RESs are smaller facilities generating less amount of power (KVA). RESs are decentralized, distributed and are set up near the end-user load. RES is connected to a local onsite microgrid for efficient control of its components. The capital cost of building many smaller renewable energy facilities is more even though their operating cost is low. Other than the good environmental impact of RES, there are other operational advantages. Advantage of using decentralized, distributed RES is that it has no long-distance transmission loss of traditional centralized power system. Also, RESs being at the distribution site, they are more immune to a natural disaster like earthquake, storm and man-made sabotage. As solar and wind power generation depends on sunshine and wind speed, there can be a shortfall or excess energy generation by RES. Thus, for continuous power supply to load and to avoid voltage and frequency fluctuation, the local onsite microgrid is integrated into the main power grid called the macrogrid. When RES generates less power, the macrogrid will supply the remaining power and when RES generates excess power, it can sell it to the macrogrid. Excess energy by RES can also be stored in lithium-ion battery, but it is relatively expensive. A microgrid system (Figure 3.1) generally consists of PV panels, wind turbines, battery storage, backup diesel generator and backup connection to the macrogrid. As there are so many sources of energy in a microgrid, there is added reliability of full-time energy availability to a site. As the diesel power cost is higher, its consumption should be avoided as much as possible. For encouraging renewable energy penetration, its cost to the user is cheaper than power generated by fossil fuels. There is some low-price incentive also given to consumers for restricting power consumption from a macrogrid during the peak time of power demand. Consumer load profile varies during the hours of the day and from season to season like summer versus winter season. Optimal sizing of a microgrid facility with all the energy sources depends on all the above cost factors and consumer load profile. Now we look at the economic and technical challenges of smart grid design. The smooth running of the power system by switching between macrogrid to microgrid is challenging. Weather data must accurately predict renewable energy production. Production of energy must foresee the demand of individual households and aggregated power consumption of a larger region. Overall power production, transmission, distribution and delivery to the consumer have to be more tractable, viable and cost-efficient for all parties like a stakeholder, government regulators, clients and consumers. This expectation of the better quality of service needs advanced
34
Radha Guha
Utility connection or interconnection
Smart buildings. smart campuses. smart communities.
Microgrid control system Active balancing among energy sources and energy-consuming devices.
Sto rag
e
F Ce uel ll
F Ce uel ll
Ring bus microgrid perimeter Figure 3.1: Microgrid control system.
monitoring of power infrastructure. Thus, smart grid infrastructure is seeing digital transformation with proliferation of smart meters, smart sensors, smart appliances, smart buildings, drones, boats and electric cars charging stations and so on. All these digital devices are put on the internet and are collectively called the Internet of things (IoT). These IoTs are given unique address so that they can communicate both ways in a wired and wireless network and work collaboratively. These IoTs are generating an immense amount of digital data these days and it is aptly called the big data era. Fine-grain data analytics of this huge amount of collected data can help in decision making from production, load balancing to the optimal pricing strategy of the smart grid. The impact of business transformation with the use of cloud-based deployment of services are also causing digital transformation. This digital transformation needs reskilling of personnel. Reskilling of personnel at the ground level means they know that emerging technologies of IoT and data-driven autonomous control system can control better than human operators. All parties need to understand the relevance of investing in new technologies. Then, the software is needed not only to control digital devices but also needed to have the ability to optimize present use, to predict future demand, to detect the anomaly and to discover a fault in the machinery. Data-driven decision making improves business performance. Thus, the new skill required for traditional software developers is data analytics skill. Software developers should acquire data skill of handling huge amount of structured and unstructured data which needs to be wrangled, processed with artificial intelligence (AI), machine learning (ML) and natural language-processing algorithms and visualized easily. Thus, the
Chapter 3 Efficient renewable energy integration
35
role of a data scientist has become very important in the digital transformation era of big data [4]. Application of AI and ML technology in smart grid design is rapidly evolving for optimal sizing of all components of a smart grid. Optimal sizing of PV panels, wind turbines, backup battery capacity, backup diesel generator capacity and minimizing the frequency of switching to backup macrogrid gives a cost-effective solution for all parties involved. AI and ML are used for a single step or multiple steps ahead forecasting of power consumption which learns from historical data and become better with experience. Long-term aggregated power consumption characteristic of a region is much more stable than short-term load of the individual household which is difficult to predict. Because many strategic decisions can be taken from TS forecasting, much research has begun in the current few decades to improve TS forecasting accuracy.
3.3 Time series forecasting via machine learning TS data modeling and forecasting are very important and a dynamic field of research of the current decades [5–7]. The practical applications are stock price forecasting, weather forecasting, product sale forecasting, smart building and industrial process monitoring and so on, other than this electrical energy consumption forecasting for smart grid design. TS modeling helps in compact representation of data, data interpretation, anomaly detection and future data points generation by observing the past data patterns. Data forecasting is very essential for control, simulation, business decision and precautionary measure taking. Accurate forecasting is possible when the model fits the past data appropriately. A TS is non-deterministic whose future value cannot be predicted with certainty, thus forecasting is a stochastic process and it comes with a confidence band. Many models have evolved over the years for improving the accuracy of forecasting. Here I introduce the salient features of a TS first and then the traditional ARIMA model proposed by statisticians George Box and Gwilym Jenkins in 1970 [8]. A TS is a sequential series of data collected for a single random variable y at a regular time (t) interval and is represented as y(t) where t = 0, 1, 2, 3, . . . [9]. As a single variable is involved, y(t) is called a univariate TS. A TS can be multivariate if more than one variable is recorded over time. For example, when both the temperature of a region and the corresponding ice-cream sales over years are considered. The TS can be continuous or discrete. The electricity consumption data can be collected at every instance of continuous time or it can be discrete, if it is aggregate electricity consumption at hourly, daily, weekly or monthly level. A TS [10–12] can be decomposed into four components, trend T(t), cyclical C(t), seasonality S(t) and irregular I(t) components. Trend T(t), is the gradual growth,
36
Radha Guha
decline or stagnancy of data over the entire long period. Cyclical C(t) component varies over the long duration of time, usually more than a year. For example, in business, cycles of recession and inflations are noticed which spans over more than a year. Seasonal S(t) component varies within a year and repeats every year. For example, ice-cream sales during the summer season of every year is more. Seasonal variation is caused by weather condition, customs (like Christmas time) and so on. Detecting this trend, cyclical and seasonal pattern in TS helps business people plan ahead of time. Whatever remains other than the trend, cyclical and seasonality components in a TS is called irregular I(t) component which has no defined pattern. These four components can be easily detected visually when the TS is plotted as a graph (Figure 3.2). To mention beforehand, Figures 3.2–3.5 are generated by Python code for this chapter.
Daily Electricity Consumption Graph 130
Consumption
120 110 100 90 80 70 60 1987
1991
1995
1999
2003
2007
2011
2015
2019
Date Figure 3.2: An example time series.
A daily electricity consumption TS from the year 1988 to 2016 is depicted in Figure 3.2, where the x-axis is the time scale and y-axis is the electricity consumption variable. These TS are exhibiting both increasing trend and seasonality. This TS is decomposed in Figure 3.3. In Figure 3.3, trend, seasonality and irregular components are shown separately along with the original TS. Now, the way four constituent components of a TS is combined to represent the original TS y(t) gives rise to two distinct models (i) multiplicative model where y(t) = T(t) * C(t) * S(t) * I(t) and (ii) additive model where y(t) = T(t) + C(t) + S(t) + I(t). The multiplicative model implies four components of the TS are not independent of each other, whereas in the additive model the four components are independent of each other. Traditionally univariate TS data is modeled by a simple ARIMA model of ML, introduced by statisticians George Box and Gwilym Jenkins in 1970 [8]. Future value
37
Chapter 3 Efficient renewable energy integration
125 100 75 1988
1992
1996
2000
2004
2008
2012
2016
1988
1992
1996
2000
2004
2008
2012
2016
1988
1992
1996
2000
2004
2008
2012
2016
1988
1992
1996
2000
2004
2008
2012
2016
Trend
100 75
Seasonal
10 0 –10
Resid
10 0
Figure 3.3: Decomposition of a time series.
of the TS is forecasted from this model. The basic assumption of the ARIMA model is that TS data is linear and normally distributed. And before applying the ARIMA model, the TS needs to be transformed into a stationary series whose mean, variance and covariance are constant. That is, the series has to be in equilibrium that does not depend on time. A non-stationary TS can be transformed into stationary TS in many ways like decomposition, smoothing, transformation, de-trending and differencing. Smoothing subtracts the rolling average from the original TS. Nonlinear transformation operations like a log, square root, cube root and so on can be applied to a TS to transform it to stationary TS as needed. A non-stationary TS can be de-trended also. But de-trending a TS requires parameter estimation. Differencing is the easy way to make a series stationary which can also be repeated as many times as needed. First difference removes the linear trend from the series. Second difference removes the quadratic trend from the series and so forth. As the statistical property of a stationary TS will remain the same in future, its forecasting is possible. This is a necessary condition before the ARIMA model can be fitted. In Figure 3.3, after the TS is decomposed, the residual component looks stationary with constant mean, variance and covariance. If all three mean, variance and covariance are constant, then the TS is strongly stationary if only mean and variance are constant and covariance cov (yt, yt-s) varies only with s, then the TS is weakly
38
Radha Guha
Rolling Mean and Standard Deviation 0.20 0.15
Original Rolling Mean Rolling Std
Consumption
0.10 0.05 0.00 -0.05 -0.10 -0.15 1987
1991
1995
1999
2003
2007
2011
2015
2019
Date Results of dickey fuller test Test Statistics
-6.748333e+00
p-value
2.995161e-09
No. of lags used
1.400000e+01
Number of observations used
3.810000e+02
critical value (1%)
-3.447631e+00
critical value (5%)
-2.869156e+00
critical value (10%)
-2.570827e+00
dtype: float64 Figure 3.4: Stationarity check of a time series.
stationary. Figure 3.4 also shows a TS with its rolling mean and standard deviation superimposed. The TS looks stationary as the rolling mean and standard deviation looks almost constant. ARIMA model can still be applied to weakly stationary TS. To mention, this concept of stationarity is applied in mathematics to simplify the development of a stochastic process for forecasting. Other than visually checking for stationarity, there is a test named augmented Dicky Fuller (ADF) test in statistics to check for stationarity of a TS. In ADF, the null hypothesis H0 is that the TS is non-stationary and the alternate hypothesis H1 is that the TS is stationary. If the ADF p-value is less than the critical value, then the null hypothesis is rejected and the alternate hypothesis is accepted. In Figure 3.4, ADF test result is shown and it shows p-value is less than the critical values at all level of confidence (99%, 95% and 90%), so the TS is stationary. There is another statistical
Chapter 3 Efficient renewable energy integration
39
test named Durbin–Watson [13] test also seldom used to check serial autocorrelation in the residual data. After the TS is transformed into a stationary series, ARIMA model can be applied. ARIMA model is a combination of the subclass of models which are autoregressive (AR) model, moving average (MA) model and autoregressive moving average (ARMA) model. In the AR( p) model, the current value of the TS is assumed to be a linear combination of past p terms as follows: yt = c + φ1 yt−1 + φ2 yt−2 + + φp yt−p + ϵt
(3:1)
Here, yt is the current value of the TS and c is a constant. Φi{i = 1, 2, . . ., p) are the parameters of the model and p is the past p observations considered and is called the order of the AR model. ϵt is assumed to be white noise with zero mean and con stant variance, that is, ϵt ⁓wn 0, σ2 . In AR model as the current value regresses over past values, it is called the AR model. In MA(q) model, the current value of the TS is assumed to be a linear combination of past q error terms as follows: yt = μ + θ1 ϵt−1 + θ2 ϵt−2 + + θq ϵt−q + ϵt
(3:2)
Here μ is the mean of the TS and θi{i = 1, 2, . . ., q) are the parameters of the MA(q) model where past q error terms are considered. ARMA (p, q) model is a combination of AR(p) and MA(q) models where the current value of the TS is represented as a linear combination of p past observations and q past error terms as follows: yt = c + ϵt + φ1 yt−1 + φ2 yt−2 + + φp yt−p + θ1 ϵt−1 + θ2 ϵt−2 + + θq ϵt−q
(3:3)
The ARMA model can be used for stationary TS only. For TS which are not stationary, the generalized ARIMA (p, d, q) model is proposed. In an ARIMA model, a nonstationary TS is transformed to stationary TS by d number of finite differencing of its data points. Usually, d = 1 is good enough for most cases. When d = 0, it becomes ARMA (p, q) model. ARIMA (p, 0, 0) is same as AR(p) model and ARIMA (0,0, q) is same as MA(q) model. ARIMA (0, 1, 0) is called a random walk model. Now the question that comes is how to determine the value of model parameters p, d, and q. All the parameters p, d and q are integers and greater than or equal to zero. After making a TS stationary after d number of differencing, (complete) autocorrelation function (ACF) and partial autocorrelation function (PACF) plots [9] determine the model parameters (p, q), that is, AR and MA orders (Figure 3.5). ACF is a bar chart of the correlation coefficients of the TS (y-axis) and the time lags (x-axis). PACF is a bar chart of the partial correlation coefficients of the TS (y-axis) and its time lags (x-axis). Both ACF and PACF have plotted along with their confidence band. For nonstationary TS, ACF plot shows very slow decay. ACF plot determines the MA(q) order
40
Radha Guha
and PACF plot determines the AR(p) order. The lag at which PACF cuts off determines the AR(p) order. For example, in Figure 3.5, p = 2 as PACF plot cuts off after 2 lags. And as ACF plot shows geometric decay it is ARMA (2,0,0) model. So, in this case: yt = c + ϵt + φ1 yt−1 + φ2 yt−2
(3:4)
Autocorrelation 1.00 0.75 0.50 0.25 0.00 –0.25 0
10
20
30
40
50
40
50
Partial Autocorrelation 1.00 0.75 0.50 0.25 0.00 0
10
20
30
Figure 3.5: ACF and PACF plot to determine AR(p) and MA(q) order.
Choosing p and q can also be done by the criteria which give minimum values for Akaike information criteria (AIC) and Bayesian information criteria (BIC). AICs and BICs are statistical measures considered as a goodness of fit of a model to the data. When comparing between two models the model (p, q) which has minimum AIC is a better model. After knowing p and q values, parameter estimation for φt and θt is done by maximum likelihood estimation. Exogenous influencing factors like temperature, holiday and COVID19 pandemic outbreak in 2020 can also be incorporated in more complex ARIMAX model [12] for better accuracy. But the basic assumption of the ARIMA model is linearity in the TS which is not always true and incorporating exogenous factors for forecasting is often not possible. This limitation of the classical statistical model of ARIMA can be overcome by nonlinear models like deep learning in ANN for better accuracy as discussed next.
Chapter 3 Efficient renewable energy integration
41
3.4 Time series forecasting via deep learning neural network Currently, the use of the deep ANN known as deep learning [14–17] is showing better accuracy in prediction than other classification, decision tree and regression models of ML. By definition, deep learning is an ANN with at least three layers: one input layer, one output layer and at least one hidden layer. Even though biologically inspired ANN that mimics human brain was proposed long ago in the 1940s, it was not used that much before because of computing power deficiencies. Only in the past two decades, deep learning field is gaining a lot of attention with applications of a big company like Google’s Deep Mind, AlphaGo, IBM’s Watson and Apple’s Siri [18–21]. To ease the implementation of complex ANN-based ML model, Google has come up with TensorFlow software platform in Python language, which is now open source (from 2015). A lot of new open-source APIs are getting written these days to perform the big data analytics tasks in Python using deep learning neural network. Because of these facilities, researchers can take a deep learning approach for big data analytics including large scale TS modeling now which was not possible earlier. ANNs are in general compute-intensive. However, parallel processing in GPU is possible. Python API PySpark also makes it easy to use Apache Spark distributed cluster computing system for real-time streaming analytics of big data. Google has also created an application-specific integrated circuit hardware framework called tensor processing unit (TPU) as AI accelerator, specifically for ANN-based ML algorithms. Because of these conducive factors like open-source software framework of TensorFlow and huge parallel processing power of GPUs, TPUs (from 2018) and PySpark distributed computing, large scale TS modeling can be researched by deep learning techniques now for increased accuracy and faster processing [22–25]. TS data can be very complex and nonlinear. Deep learning algorithm does not make any linearity assumption in TS data as in the ARIMA model we have seen earlier. ANN is a universal function approximator and can map any input to output even in presence of noise, that is, missing values and anomalies. ANN is purely data-driven to understand the structure of the training data. No model and parameters need to be specified. As the ANN trains, it becomes better with experience. When the trained data is transformed from the original scale to standard scale (between 0 to 1) by min–max standardization or z-score normalization, the training happens faster, and it gives better accuracy. Figure 3.6 shows the architecture of a simple feed-forward fully connected ANN. Each neuron in ANN aggregates all incoming signals and passes it through a nonlinear activation function such as sigmoid [g(x) =ð1=ð1 + e −x ÞÞ] or tanh[g(x) = ððex − e −x Þ=ðex + e −x ÞÞ] or ReLU [g(x) = max(0, x)]. Sigmoid function output values from zero to one, tanh function output values from minus one to one and rectified linear unit (ReLU) function output values from zero to one. A value of zero means
42
Radha Guha
the neuron is not letting its input pass to its output. A MLP is a simple feed-forward ANN architecture as in Figure 3.6 with at least three layers: one input layer, one output layer and at least one hidden layer. This network is fully connected and so it is called a dense NN. The simple architecture of MLP with one hidden layer is also called a vanilla MLP. MLP uses supervised training with a back-propagation algorithm [26] to minimize a loss function by adjusting the connector weights so that input maps output more accurately. MLPs can classify nonlinearly separable input data like XOR function. Since the 1980s, MLP was a popular research tool for solving speech recognition, image recognition and language translation problems. Later different architecture like convolution neural network (CNN) and RNN won the competition. CNN proved to be more effective in image recognition and RNN proved to be more effective in sequence learning like speech recognition and machine translation. Hidden layer
Input layer wi
x1
wi
Hidden layer
Output layer x2
x3
y1
wn wn
Figure 3.6: Multilayer perceptron (MLP) architecture.
I have used an MLP architecture like Figure 3.6, for TS forecasting in my experiment. Even this basic MLP ANN architecture gave better accuracy than the ARIMA model for all four datasets presented in Section 4. For MLP ANN output, neuron has no activation function as it is a regression problem and we want original numerical values instead of scaling them between minus one and one or zero and one.
Chapter 3 Efficient renewable energy integration
43
As MLP is fully connected where each perceptron is connected to every other perceptron in the next layer, it is computationally very expensive. Another shortcoming of MLP is that it has no memory and it cannot make past influence the future. The different architecture of ANN called RNN with a loop in it can solve this issue. Figure 3.7 shows an RNN architecture (LHS) and how it will look when unfolded over time steps (RHS). RNN architecture is especially suitable for TS analysis. In RNN architecture, short-term dependency can be captured, but because of vanishing or exploding gradient problem, capturing long term dependency is difficult as it makes the network stop learning. The solution to this problem is using LSTM block instead of classical neuron in RNN architecture as shown in Figure 3.8. Output
RNN
=
RNN (t-1)
RNN (t)
RNN (t+1)
State
Input Figure 3.7: Recurrent neural network (RNN) architecture.
In Figure 3.8, unrolled LSTM memory cells for three-time steps are shown with input (xt), hidden states (ht) and outputs (ht). LSTM architecture is specifically designed to retain long term dependency by avoiding vanishing and exploding gradient problem of RNN. LSTM cell architecture is complex. It uses several gate mechanisms to control the flow of information through it. There are forget gate, input gate and output gate to update the cell state (memory) and pass it to the next cell. At every time step t, each gate is presented with current input (xt) and output (ht–1) of the previous time step. By use of sigmoid and tanh activation function, the gates forget irrelevant information, add relevant information to the cell state and output it to the next cell. Figure 3.8 shows a standard LSTM cell. In literature, there are many variations in the gate architecture in LSTM cell. More detail of LSTM cell architecture and its working can be found in references [27, 28]. LSTM is a state-of-the-art architecture for sequence learning, that is, given an input sequence, it will generate an output sequence. This LSTM architecture popularly used for speech recognition and language translation is proposed recently for multi-step ahead TS forecasting. Seq2seq model consists of a LSTM encoder layer and an LSTM decoder layer. The network is iteratively trained by backpropagation algorithm to adjust the weights of the connectors to minimize a loss function.
44
Radha Guha
ht-1
ht
x
ht-1
+ tanh
A σ
Xt-1
σ
tanh
A
x
x σ
Xt
Xt-1
Figure 3.8: A long short-term memory (LSTM) RNN architecture.
From the training dataset, several input sequences of multiple time steps or lags (depending on window size) are created to produce one or multi-time steps output. This is the way a TS is transformed as a supervised learning problem. This is called windowing and the window size can be tuned to get more accuracy. A fixedsize sliding window scans the TS and presents the input sequence to the encoder for internal representation and trains the decoder to output expected sequence. Another way to experiment with any ANN architecture is to make it wider or deeper by adding more neurons in each layer or adding more layers to see what gives more accuracy. Configuring a deep learning architecture is an art. Adjusting the weights of the connectors goes on in batches and is repeated for several epochs. After training, the test dataset is used to get forecasted data. Forecasted data is inverted back to the original scale to compare with the expected value. Model accuracy is calculated from the difference between forecasted data and expected data. Every forecasting comes with uncertainty estimation, and narrower the uncertainty, better is the forecasting. Minute-by-minute forecasting for several months ahead of time with less uncertainty is a challenging task and need a lot of tweaking in the RNN layers.
3.5 Experiment: forecasting and comparison of performance measures In the above sections, three-TS forecasting algorithms which are (i) statistical ARIMA model, (ii) MLP ANN and (iii) LSTM RNN is discussed. In this section, all three algorithms are implemented using the Python programming language for experimenting. The experiment aimed to generate forecast and see which algorithm gives more forecasting accuracy. Python’s stats-model module provides TS analysis
Chapter 3 Efficient renewable energy integration
45
with ARIMA model. Google’s TensorFlow is a specialized library to work with an ANN. Keras API was used on top of TensorFlow. After using all these libraries the TS analysis program needs very few lines of codes. To mention, Figure 3.9, Tables 3.1 and 3.2, Figures 3.10–3.12 are generated by Python code for this chapter.
Figure 3.9: Consumption forecasts for 10,000 Data Sets.
The source of the data is from the world’s largest data science community Kaggle. The link is given here: https://www.kaggle.com/twinkle0705/state-wise-powerconsumption-in-india. Electricity consumption dataset of all states of India for the years 2019–2020 is available there. Four datasets are curetted of various lengths for the experiment. The lengths of the datasets are cut short as the algorithms are run on a laptop with a single processor. The dataset varied in length and structure as shown in Table 3.1. Dataset 1 has 397 data points of daily electricity consumption, Dataset 2 has 503 data points of daily electricity consumption, Dataset 3 has 7,000 data points of minute by minute electricity consumption and Dataset 4 has 10,000 data points of minute-by-minute electricity consumption. In training a model, first, the dataset is divided into a training set and test set with a split of 75:25. A training set is used to learn the structure of the data. For the ARIMA model, the entire dataset is used at a time to learn the structure of the data. For MLP-ANN and LSTM-RNN, several input sequences and corresponding expected outputs are created beforehand by the previously discussed windowing method. Once the model is trained with training data, the forecasted data set is generated from the model. Now the forecasted dataset is compared with the expected data set and model accuracy is computed. Table 3.1 shows four datasets, their distribution, decomposition, ACF, PACF plots, the transformed series and forecasted series against the original series. Best forecasted series out of the three algorithms are shown.
46
Radha Guha
3.6 Performance measures Figure 3.10 shows the original dataset and three forecasts generated by three algorithms which are ARIMA model, MLP-ANN and LSTM-RNN. The difference between forecasted values and expected values is the model error. I have aggregated this error with five statistical measurements which are (i) mean error (ME), (ii) mean absolute error (MAE), (iii) root mean square error (RMSE), (iv) mean percentage error (MPE) and (v) mean absolute percentage error (MAPE). Comparison between the three algorithms are done based on these five error measurements and plotted in Figures 3.11 and 3.12. These five error measurements have different characteristics so all of them are computed. Assume yt is the actual data point, ft is the forecasted data point and n is the total number of data points.
Consumption Forecasts
1.0 0.8 0.6 0.4 0.2 0.0 0
250
500
750
1000 1250 Data Points
1500
1750
2000
Figure 3.10: Comparison of three forecasts from three algorithms.
ME: It is given by as follows: n 1X ðyt − ft Þ n t =1
(3:5)
ME cancels out positive and negative errors and the aggregated value gives the final bias either positive, negative or zero. Here zero ME does not mean forecast is perfect for all data points, it just means they cancelled out. The mean error depends on the scale of the data points and how the data have been transformed.
Chapter 3 Efficient renewable energy integration
47
MAE: It is given by the formula as follows: n 1X jðyt − ft Þj n t =1
(3:6)
MAE does not cancel positive and negative error like ME. It does not show the direction of bias of the error. It shows the true magnitude of the error. For a good forecast, MAE should be as small as possible. MAE also depends on data scale and data transformations. RMSE: It is given by the formula as follows: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n 1X ðyt − ft Þ2 n t =1
(3:7)
Like MAE, RMSE also does not cancel the positive and negative error. It does not show the direction of bias of the error. It gives the overall idea of error occurred in the forecast. RMSE penalizes large error more. For a good forecast, RMSE should be as small as possible. RMSE is also sensitive to data scale and data transformation. MPE: It is given by the formula as follows: n 1X ðyt − ft Þ × 100 n t=1 yt
(3:8)
MPE gives the average forecast error percentage. Other properties are the same as ME. MAPE: It is given by the formula as given in Equation (3.9): n 1X ðyt − ft Þ × 100 n t=1 yt
(3:9)
MAPE gives the average absolute forecast error percentage. MAPE is independent of the scale of measurement but still affected by data transformation. Other properties are same as MAE. Because of the different properties of the five measurements, all five of them are calculated to compare the three algorithms. The result is shown in Figures 3.11 and 3.12. Figure 3.11 shows ARIMA model errors on four datasets. As ARIMA model errors are the order of magnitude higher, they are plotted separately. It can be seen that when the number of data points is more to train the model, the magnitude of error is less. Overall, all we can see is that deep learning method outperformed the classical ARIMA model. In Figure 3.11, MLP-ANN and LSTM-RNN errors are compared. It can be seen that more data points gave better accuracy than fewer data points in
48
Radha Guha
both MLP-ANN and LSTM-RNN. Also, it can be seen that Dataset 4 with highest data points of 10,000 gave higher accuracy with LSTM-RNN model for all five performance metrics.
ARIMA MODEL-PERFORMANCE MEASURES
90.00 80.00 70.00 60.00 50.00 40.00 30.00
Dataset1(397 pts)
20.00
Dataset2(503 pts)
10.00
Dataset3(7000 pts)
0.00
Dataset4(10000 pts)
-10.00 measure
MPE
Mean Error
MAE
Dataset1(397 pts)
84.05
84.41
85.78
0.24
0.24
Dataset2(503 pts)
-8.48
18.91
39.81
-0.01
0.03
Dataset3(7000 pts)
25.10
25.11
25.12
0.01
0.01
Dataset4(10000 pts)
7.32
10.52
10.54
0.01
0.01
RMSE
MAPE
ARIMA
Figure 3.11: ARIMA model error measurements.
0.30
COMPARING MLP VS. RNN-LSTM
0.25 0.20 0.15 0.10
Dataset1(397 pts) Dataset2(503 pts)
0.05
Dataset3(7000 pts) Dataset4(10000 pts)
0.00 -0.05
Dataset1(397 pts)
Measu Mean MAE RMSE MPE MAPE re Error ANN-MLP 0.01 0.04 0.05 0.03 0.07
Mean MAE RMSE MPE MAPE Error RNN-LSTM -0.02 0.06 0.07 -0.04 0.19
Dataset2(503 pts)
0.01
0.11
0.14
0.03
0.28
0.00
0.08
0.15
0.01
0.22
Dataset3(7000 pts)
0.00
0.01
0.01
0.00
0.00
0.00
0.07
0.08
0.00
0.01
Dataset4(10000 pts)
-0.03
0.05
0.08
-0.01
0.01
-0.01
0.04
0.07
0.00
0.01
Figure 3.12: MLP and LSTM-RNN error comparison.
Chapter 3 Efficient renewable energy integration
49
3.6.1 Statistical significance evaluation Even though the abovementioned five measurements ME, MAE, RMSE, MPE and MAPE in traditional statistics are simple and easily understood, they have limitations in performance evaluation. Suppose the difference between two MSEs is small between two algorithms MLP and RNN-LSTM, then it is hard to decide whether the difference is significant or for some randomness in datasets. In modern statistics, we have to further establish if there is any statistical significance to the forecast error differences or they are by chance due to specific datasets. So, DM test [29] is used to quantitatively evaluate the forecast differences. DM test is performed as follows. If fi and gi are two forecasts generated by two different methods for the original TS yi, then ei = yi – fi and ri = yi – gi are two forecast errors. Then, in DM test, a loss differential function, di is defined as one of the following: di = e2i − ri2 or di = jei j − jri j. First loss differential function is related to error statistics, MSE and the second one to MAE. In practice, MSE and MAE are most Xthe n popular error indices. Now mean loss differential is defined as: d = ð1=nÞ di and Xn i−1 ðdi − dÞ di −k − d . Now, for μ = E½dI . An ACF γ is defined at lag k as γ = ð1=nÞ k
k
i= k+1
h ≥ 1, DM statistics is calculated as follows: d ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi DM = s h −1 P n γ0 + 2 γk
(3:10)
k=1
1= 3
where it is sufficient to consider h up to h = n + 1. Under the null hypothesis (H0), two forecasts have the same accuracy, the mean loss differential μ = 0, and DM follows a standard normal distribution, that is, DM ⁓ N(0,1). In the alternate hypothesis (H1), two forecasts are significantly different if jDMj > Zcrit , where Zcrit is the two-tailed critical value of standard normal distribution. At 95% confidence level, the null hypothesis is rejected if jDMj > 1.96 which means two forecasting accuracies are significantly different. If the sample size is small, the DM test tends to reject the null hypothesis. So in a Harvey, Leybourne and Newbold test, the DM value is adjusted as follows: HLN =
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½n + 1 − 2h + hðh − 1Þ=n × DM
(3:11)
Tables 3.1 and 3.2 show the DM statistics and P-values of forecasts error significance between a pair of algorithms for the same datasets. In Table 3.2, dataset-4 has been forecasted by ARIMA, MLP and LSTM models and DM test is applied for error indices MSE and MAD both. It is seen that for each pair of algorithms all jDMj values are greater than 1.96 and P-values are very small so null hypothesis can be rejected. And it can be concluded that there are significant differences in forecast accuracy between ARIMA-MLP, ARIMA-LSTM and MLP-LSTM. In Table 3.2, two algorithms,
50
Radha Guha
Table 3.1: DM test statistics for all algorithms: dataset-4. DM test comparison MLP versus ARIMA
DM test comparison LSTM versus ARIMA
DM test comparison LSTM versus MLP
−.
−.
−.
.e-
.e-
.e-
−.
−.
−.
.e-
.e-
.e-
DM-MSE P-value MSE DM-MAD P-value MAD
Table 3.2: DM test statistics for all datasets: LSTM versus MLP. Criteria
Dataset- LSTM versus MLP
Dataset- LSTM versus MLP
Dataset- LSTM versus MLP
Dataset- LSTM versus MLP
DM statistic
P-value
DM statistic
P-value
DM statistic
P-value
DM statistic
P-value
MAD
.
.e-
−.
.
−.
.
−.
.e-
MSE
.
.e-
.
.
−.
.
−.
.e-
MLP and LSTM, are compared for all datasets. Here it can be found that LSTM performed better than MLP for dataset 4. And MLP performed better than LSTM for dataset 1. But there is no significant difference in forecasting accuracy for dataset-2 and dataset-3 between MLP and LSTM algorithms.
3.7 Conclusion In this chapter, I have discussed that smart grid design for integrating RES into the main power grid is an important problem of today and is a multidisciplinary research area. Then I looked from a data scientist’s perspective to the problem to improve TS forecasting accuracy by statistical models and deep ML algorithms. From my experiment and literature review, I found that deep learning is more effective for tackling nonlinear TS data and for getting higher forecasting accuracy and efficiency on a parallel processing platform. Deep learning in the ANN is a dynamic field of research where new neural network architecture and approach is getting introduced very rapidly. Deep learning is proving to be a promising technique and because of the availability of all open-source library functions and parallel programming infrastructure, this field can be researched thoroughly now. In continuation to research in this field next, I am going to look into the scope of parallelism in the algorithms to exploit GPU and TPU power for processing speed up.
Chapter 3 Efficient renewable energy integration
51
References [1]
[2] [3] [4]
[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]
[19] [20] [21] [22] [23]
Erdinc, O. and Uzunoglu, M. (2012). Optimum design of hybrid renewable energy system: Overview of different approaches. Renewable and Sustainable Energy Reviews, 16(3), 1412–1425. Spear, B. et al. (2015, May). Role of smart grid in integrating renewable energy. ISGAN Synthesis report, NREL/TP-6A20-63919. Govt. India MST, DST. (2017, June). India Country Report on Smart Grid-DST. Guha, R. (2020, September). Impact of artificial intelligence and natural language processing on programming and software engineering. IRJCS: International Research Journal of Computer Science, VII, 238–249. Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neuro Computing, 50, 159–175. Commandeur, J. J. F. et al. (2007). Introduction to State Space Time Series Analysis. Oxford University Press. Hyndman, R. J. et al. (2018). Forecasting: Principles and Practice. Box, G. E. P. and Jenkins, G. (1970). Time Series Analysis, Forecasting and Control. Holden-Day, San Francisco, CA. Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, NY. John, H. C. (1997). Time Series for Macroeconomics and Finance, Graduate School of Business. University of Chicago, Spring. Scott Armstrong, J. (2006). Findings from evidence-based forecasting: Methods for reducing forecast error. International Journal of Forecasting, 22, 583–598. Williams, B. (2001). Multivariate vehicular traffic flow prediction evaluation of ARIMAX modeling. Journal of the Transportation Research Record, 1776, 194–200. Durbin, J. and Watson, G. S. (1951). Testing for serial correlation in the least squares regression. Biometrika, 38, 159–177. Taelab, A. et al. (2017). Forecasting nonlinear time series using ANN. Future Computing and Informatics Journal, 2(39–47). Hatcher, W. G. and Wei, Y. (2018, April). A survey of deep learning: Platforms, applications and emerging research trends. IEEE Access, 6. Gamboa, J. (2017, January). Deep learning for time series analysis. arXiv: 1701.01887v1. [cs.LG]. Zhang, G. P. (2007). A neural network ensemble method with jittered training data for time series forecasting. Information Sciences, 177, 5329–5346. Arel, I., Rose, D. C., and Karnowski, T. P. (2010). Deep machine learning- a new frontier in artificial intelligence research [Research frontier]. Computational Intelligence Magazine, IEEE, 5(4), 13–18. Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. Neural networks. IEEE Transactions, 5(2), 157–166. Cortez, P., Rio, M., Rocha, M., and Sousa, P. (2012). Multi-scale internet traffic forecasting using neural networks and time series methods. Expert Systems, 29(2), 143–155. Graves, A. et al. (2012). Supervised Sequence Labeling with Recurrent Neural Networks. Springer, Vol. 385. Fischer, T. and Krauss, C. (2017). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research. Loning, M. et al. (2019, September). SKTIME: A unified interface for machine learning with time series. arXiv:1909.07872v1 [cs.LG].
52
Radha Guha
[24] Pedregosa, F. et al. (2011). Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12, 2825–2830. [25] Alexandrov, A. et al. (2019). Gluonts: Probabilistic time series models in Python. arXiv preprint arXiv:1906.05264. [26] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning Internal Representations by Error Propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, Foundation. MIT Press. [27] Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. [28] Hochreiter, S. et al. (1997). Long short term memory. Neural Computation, 9(8), 175–1780. [29] Diebold, F. X. and Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business and Economic Statistics, 13, 253–263.
Saeed Mian Qaisar, Doaa A. Bashawyah, Futoon Alsharif, Abdulhamit Subasi
Chapter 4 A comprehensive review on the application of machine learning techniques for analyzing the smart meter data Abstract: The deployment of smart meters has developed through technical developments. A fine-grained analysis and interpretation of metering data is important to deliver benefits to multiple stakeholders of the smart grid. The deregulation of the power industry, particularly on the distribution side, has been continuously moving forward worldwide. How to use broad smart meter data to improve and enhance the stability and efficiency of the power grid is a critical matter. So far, extensive work has been done on smart meter data processing. This chapter provides a thorough overview of the current research outcomes for the study of smart meter data using machine learning techniques. An application-oriented analysis is being addressed. The main applications, such as load profiling, load forecasting and load scheduling, are taken into account. A summary of the state-of-the-art machine learning–based methodologies, customized for each intended application, is provided. Keywords: smart meters, datasets, machine learning, deep learning, load analysis, load forecasting, load management
4.1 Introduction The smart meter is a digital meter used to record power generation and consumption. The characteristics of the smart meter involved in facilitating remote monitoring, control and cutoff by establishing the interface between the utility company and the customer. It was initially called automated meter reading (AMR) facilitating one-way communication and was able to provide monthly reading, one-way blackout,
Acknowledgment: This work was supported by the Effat University, Jeddah, Saudi Arabia, under grant number UC#9/29 April.2020/7.1-22(2)2. Saeed Mian Qaisar, Communication and Signal Processing Lab, Energy and Technology Research Center, Effat University, 22332 Jeddah, Saudi Arabia, e-mail: [email protected] Doaa A. Bashawyah, Futoon Alsharif, Abdulhamit Subasi, Communication and Signal Processing Lab, Energy and Technology Research Center, Effat University, 22332 Jeddah, Saudi Arabia https://doi.org/10.1515/9783110702514-004
54
Saeed Mian Qaisar et al.
tamper detection and simple load profiling. After some time, the AMR ability was increased into recording data on-demand or at hourly intervals and capability of connecting with other commodities. The upgrade in the functionality introduced the term advanced metering integrated (AMI) by the integration of two-way communication; it does not only measure the energy data in real time but it also enables extensive authority and monitoring on both energy and operation situations at both ends, customers and utility companies. The smart meter can be installed at any level of electrical circuits of residential, commercial or industrial buildings [1]. Technological developments have increased the use of smart meters as a replacement of the traditional ones [2, 3]. These meters are key components of smart grids that provide significant civic, environmental and economic benefits to various stakeholders [4]. The massive smart meter installations require a huge amount of data collection with a desired granularity [4]. Automated data acquisition, storage, processing and interpretation are the main factors behind the performance of smart meters. The process is demonstrated in the block diagram in Figure 4.1.
Figure 4.1: Components of the smart meter data intelligence chain [4].
The applications and benefits of smart meters are mainly involved in smart grids due to the necessity to establish an intelligent system for power generation, transmission and distribution. This intelligent approach will address the problems of traditional electric power systems and enable the utilization of clean energy systems such as wind and solar. Due to this enhanced technology, the utility management will be able to possess enough information on peak-load requirements and create pricing mechanisms depending on the consumption requirements [5]. The smart meters benefit the consumers by providing accurate and on-time billing and usage patterns of electrical equipment during expensive hours. This facilitates switch/ delay the operation of electrical equipment with significant consumption to less expensive hours. The governments are also profited from the utilization of smart meters by reducing the operating costs and maintaining the security and efficiency of power systems. The smart meters help reduce the extensive disruptions and provide enhanced load forecasting for power sectors [6]. Due to these benefits, the installations of smart meters are increasing yearly in the United States. Based on the research at the end of the year 2018, the power sector companies chose 70% of residents and installed about 88 million smart meters. Also, there is an expectation of reaching 107 million smart meters by the end of 2020 [7]. Additionally, in the European countries, around 34% of all electricity metering points were implemented with smart
Chapter 4 A comprehensive review on the application of machine learning techniques
55
meters (99 million smart meters), and there is an estimation of 24 million for the installation of additional smart meters in 2020 [8, p. 28]. Tokyo Electric Power Company planned a large-scale project to install approximately 29 million smart meters for all customers by the end of 2020 [9]. Kingdom of Saudi Arabia launched Smart Metering Project in 2020 to install around 10 million smart meters, connect these meters into communication girds and generate a fully automated billing process [10]. The government of Dubai started the operation of 200,000 smart meters in 2016 and expected that more than 1 million smart meters will be installed by 2020 [11]. The core technology involved in smart meters is information and communication technologies, which made the infrastructure of smart grids secure and efficient. The achievement of collecting and processing high volume information is made possible by the integration of Internet of things technology with the smart power grids. The types of communication media used between smart meters and utilities are wired and wireless; this differs with different data transmission necessities [12]. First, the transmission of data is acquired from the appliances to smart meters, and second, between the smart meters and the utility’s data centers. The communication networks comprise WAN (wide area networks), NAN (neighborhood area networks) and HAN (home area networks). WAN covers a vast geographic zone and transfers the data from the power generation stations to the utilities’ central collection center. It can be equipped with wired communication technologies, such as power-line communications and optical fibers, and wireless communication technologies like WI-MAX. NAN covers a smaller geographic zone by establishing communication among neighboring meters. Multiple wireless or wired technologies can be utilized such as power-line communication, digital subscriber line and wireless mesh. HAN interfaces the smart meters with appliances inside the customer premises and home energy management tools. Due to the high demand for electricity in the market, the highest possibility shows that the smart meters will become a part of life eventuality, although one needs to focus on the challenging factors such as deployment process, duration of deployment, operational costs and the availability of the technology [13]. Smart meters and other automated devices that form part of the smart grids can generate massive electrical data. To increase the efficiency of data analysis, it is prudent for these data to be compressed to minimize storage space of data centers and pressure on communication lines. Smart metering entails several approaches such as load analysis, load forecasting and load management. Each customer has a different load profile from the other, for that reason, different patterns can be drawn from the varying consumption of electricity received from various customers. With the necessary insights on the uncertainty and the volatility of varying load profiles, load analysis will help select the accurate data that can be utilized to train a forecasting model. After that, short- and long-term load forecasting is essentially used by the power distribution companies to enhance their planning and operation processes. Moreover, retail providers of electricity develop decisions on procurement
56
Saeed Mian Qaisar et al.
and pricing based on the forecasted loads of their customers. The obtained data of smart meters can contribute to the implementation of load management. The management will be able to understand the sociodemographic information of consumers and generate demand response programs to target potential customers [14]. There are several machine learning methods implemented in various residential and industrial applications to improve the functionality of smart meters. A presented case study in the UK is a typical example. The case study has approximately 27 million electrical power customers with a data generation rate of 13,000 records produced per second through the platform Smart Meter Analytics Scaled by Hadoop. Another case study involves Kunshan city in China with 1,312 customers utilizing a fuzzy clustering method and fuzzy cluster validity index (PMBF) to establish load forecasts [15].
4.2 Contribution and organization The key publicly accessible smart meter datasets which are used by different smart meter implementations are reported in this chapter. In addition, usage of advance machine and deep learning techniques for efficient smart meter data processing and classification is summarized. Applications like theft detection, nontechnical losses (NTL) and load profiling are taken into account in this context. Machine and deep learning algorithms usage for forecasting of residential, industrial, electric cars and hybrid electric vehicles is also researched and reported. In addition, it also addresses the use of advanced machine learning techniques for automatic load management. The remainder of the chapter is organized as follows: The current publicly accessible smart meter databases are reviewed in Section 4.2.1. The machine and deep learning techniques used for the analysis and classification of smart meter data are reported in Section 4.3. The use of the smart meter database for load forecasting is addressed in Section 4.4 using machine and deep learning techniques. The load automatic management approaches, built on the basis of machine learning techniques, are eventually described in Section 4.5. Opportunities and challenges are discussed in Section 4.6. Finally, the conclusion is made in Section 4.7.
4.2.1 Smart meter datasets Many power companies have kept their smart meter data confidential due to privacy and security issues. However, several governmental or nongovernmental organizations took the initiatives to release some residential or industrial data to the public over the past few years. This is in order to assess the researchers to study the performance of smart meters.
Chapter 4 A comprehensive review on the application of machine learning techniques
57
4.2.1.1 Commission for Energy Regulation smart metering project The Commission for Energy Regulation established a smart metering project in 2007 with the help of Irish energy industry participants such as Electric Ireland, SEAI (sustainable energy authority of Ireland) and other energy suppliers. Over 5,000 Irish households and businesses participated on this project from 2009 to 2010 by installing electricity smart meters in their premises. The project aimed to evaluate customer behavior trials on energy consumption patterns. The generated data were available online to researchers on anonymized format without any personal or confidential information [16].
4.2.1.2 Low Carbon London Project UK Power Networks established the Low Carbon London Project to understand the energy consumption patterns in London. The dataset comprises the operation of 5,567 households from 2011 to 2014 and the records acquired every half hour in kWh. The consumers selected carefully as the representative of UK population and divided into two groups. The first group of approximately 1,100 consumers subjected to different energy prices depends on the dynamic time of usage. The consumers were informed a day ahead of the specific times when the tariff prices would be higher or lower than the normal price. The second group subjected to the standard plan with a constant flat-rate tariff throughout the day. The generated signals will help anticipate the future supply energy operations and lessen the stress on distribution grids [17].
4.2.1.3 Pecan Street: energy research Pecan Street collaborated with over 1,000 volunteer participants in Austin, Texas, to take a more detailed look at the electricity consumption of residential and small commercial properties. The project installation completed by installing green button protocols, smart meters and home energy monitoring system. The properties were equipped with high-consumed electrical appliances such as cooling and heating systems and other new technologies such as rooftop solar generation, electric vehicle charging and energy storage. The project was aimed to study the effect of these modernizing technologies on the energy loads along with a variation of consumer behavior interventions, including time-of-use pricing [18].
4.2.1.4 Smart dataset for sustainability The Smart Project commenced a data collection infrastructure in 2012 intending to optimize energy consumption from real households. The project deployed to gather
58
Saeed Mian Qaisar et al.
comprehensive data from the household environment rather than gathering data from as many households as possible. The data collection was comprised of electrical, environmental and operation data. The infrastructure supported monitoring average real and apparent power at the main distribution panels every second, average electricity usage at every circuit and nearly every outlet, dimming and switching events from wall switches, average electricity generation from solar and microwind systems and other important binary events from different sensors such as the thermostat, motion and door sensor. The project consisted of two datasets: the first high-resolution dataset (UMass Smart Home) covered a wide variety of traces from three separate smart households. The second low-resolution dataset (UMass Smart Microgrid) covered minute-level electrical data from anonymous 443 households. The data collection and processing were updated in the 2017 release and expected to publish periodic releases of the Smart Home dataset every 6 months [19].
4.2.1.5 Ausgrid distribution zone substation data and solar home electricity data Ausgrid operated 180 substations within Australian distribution and transmission and the load data for each substation collected by SCADA and metering systems. The historical demand data sampled every 30 min in megawatts. The data might contain a few issues such as occasional metering errors and spikes and dips due to switching within the distribution network. The company is releasing 12 months worth of interval data every year since 2005 [20]. In addition, Ausgrid provided solar home electricity data for 300 residential consumers from 2010 to 2013. The consumers selected randomly without detailed checks for occupancy. The data was measured by a gross metering configuration and recorded every 30 min. Another monthly electricity data from over 2,600 solar homes and over 4,000 nonsolar homes were compiled on a monthly basis from 2007 to 2014. The purpose was to distinguish electricity usage patterns between datasets of solar and nonsolar homes [21].
4.2.1.6 ISO New England – zonal information New England Independent System Operator (ISO-NE) published load data for eight load zones in the entire New England system since 2011. The publication was made available for the purpose of teaching and researching the power market design and performance in order to help market participants make knowledgeable decisions. The data consisted of hourly day-ahead and real-time demands, daily summaries of real-time demands and monthly summaries of real-time demands. The data was associated with the weather variables, location regional prices, market clearing prices and interchanges with other power systems [22].
Chapter 4 A comprehensive review on the application of machine learning techniques
59
4.2.1.7 Tokyo Electric Power Company (TEPCO) TEPCO had been transparent about both electricity demand and supply. The company provides real-time data sources of historical actual electricity demand data recorded at hourly intervals. These data were available online since 2016 and will be announced with updates each year. Also, the company provides monthly records of daily actual maximum demand, consumption rate in peak usage times and that day’s maximum capacity [23].
4.3 Load analysis with machine and deep learning In a variety of key applications such as NTL detection, theft detection and load profiling the smart meter data is analyzed by using machine and deep learning algorithms. The principle is shown in Figure 4.2.
Figure 4.2: Principle of smart meter data processing for load analysis.
Figure 4.2 shows that the smart meter data is first preprocessed to denoise and prepare it for the post-data analysis, feature extraction, dimension reduction and classification modules. The final outcome of the system is in terms of load identification such as home appliances and electric vehicles or classification of losses type such as theft detection and technical losses.
4.3.1 Nontechnical losses Due to the massive amounts of load profiles at the power utilities, having a better understanding of the volatility and uncertainty of these profiles is a challenging issue. The practice of manipulating electricity data in order to reduce the electricity bill is referred to as electricity theft. It causes a significant loss of energy resources and damages the revenue of power utilities and the overall economy. There are several approaches that had been proposed by the researchers to detect electricity theft by employing, for example, machine learning methods.
60
Saeed Mian Qaisar et al.
The common techniques for theft detection were SVM (support vector machine) and logistic regression. However, these techniques did not work well with large and extremely imbalanced datasets. Therefore, Khan et al. [24] have proposed a novel approach based on advanced supervised learning techniques. The approach involved of five stages: data preprocessing, data balancing, features extraction, classification and validation. Electricity consumption data were preprocessed to resolve the missing and impossible data values, and these data were balanced with the help of ADASYN algorithm. VGG-16 (visual geometry group) module was used for features extraction of abnormal patterns in electricity consumption data. Afterward, the refined features were treated and classified with the help of FA-XGBoost (firefly algorithmbased extreme gradient boosting) module. Various performance metrics were applied to validate the performance of the proposed approach. Hasan et al. [25] have contributed to the detection of electricity theft by proposing a hybrid model composed of CNN (convolutional neural network) and LSTM (long- and short-term memory). The model was used to explore the time-series nature of the electricity consumption datasets of 10,000 consumers in China over a 1year period. CNN, which is a subclass of the neural network, was applied for global feature extraction through a number of stacked layers. After that, LSTM used refined features to classify normal consumers and electricity theft consumers. It is a special class of RNN (recurrent neural networks) intended to solve the problem of short-term memory in RNNs. Furthermore, synthetic minority oversampling technique was used to counter the problem of imbalanced datasets and get better performance. Zheng et al. [26] have presented an approach of two novel data mining techniques: MIC (maximum information coefficient) and CFSFDP (clustering technique by fast search and find of density peaks). This approach combined the advantages of clustering and state-based methods to detect numerous types of theft attacks with high accuracy. MIC was used to find the correlation between NTL and tampered consumer load profiles with normal behaviors. Compared to Pearson’s correlation coefficient which can detect only the linear correlation between two vectors, MIC is able to detect more sophisticated correlations such as time-variant relations. Unsupervised density-based clustering CFSFDP was applied to detect the theft attacks that cannot be detected by the MIC correlation method. It defines the density features in order to capture abnormal behaviors of load profiles. Compared to the existing data-driven electricity theft detection, Gao et al. [27] have developed a physically inspired data-driven proposal to detect theft attacks. The unsupervised proposal used a modified linear regression model to leverage the approximate relationship between the electricity usage and voltage magnitude on distribution secondary obtained from the smart meters. After that, the regression residuals of the estimate to the true consumption were calculated for each consumer. The residuals were used to distinguish both dishonest and honest customer activities. Finally, the anomaly scores were defined for each consumer in order to rank the thieves or identify the consumers with faulty smart meters. The proposal does
Chapter 4 A comprehensive review on the application of machine learning techniques
61
not require training samples of NTL and a documentation of information topology and parameters. Sahoo et al. [28] have proposed a novel predictive temperature-dependent model for electricity theft detection instead of the typical methods of load profile analysis of consumers. These methods cannot detect electricity theft in case there is a complete bypass of meters. Therefore, this proposal used the data from branches of distribution systems and smart meters or conventional power meters to detect electricity theft. The proposal started with applying constant resistance technical loss model by correctly identifying the network total losses and estimating the technical losses. NTLs were computed by subtracting the technical losses from total losses. After that, temperature-dependent technical loss model used the temperature dependency of resistances in the consumption points to the distribution transformers.
4.3.2 Load profiling Zhu et al. [29] proposed a method to measure the daily load behaviors and to classify irregular energy consumption. Their method consists of three core phases. The first phase is preprocessing the time series of a high-frequency building load into a variety of important statistics [29]. These statistics are divided into two groups: one group is about the load profiles taken from the building such as average daily load, the minimum and maximum load, the median load, near-base load, near-peak load, high-load duration, rise time and fall time [29]. The other group is about the daily weather conditions such as the mean, mode, median, minimum and maximum of the temperature and humidity. The second phase is constructing the prediction models. This phase requires two actions: firstly, choose the best method for mining the information to obtain the best example from training data. Secondly, choose the most appropriate predictor to construct the model [29]. The authors constructed three predictors for cooling, heating and seasonal transitions. The last phase is evaluating the residues of chosen prediction models based on the concept of statistical quality control. They built a control table for every load profile with a suitable upper bound. Then these charts of control are used to track everyday load profiles and classify irregular energy usage [29]. Real-world building and weather information computing experimental results show that their proposed model and methods are successful. Their approach is easy to integrate in current energy management systems in buildings and needs no complex submeters. Stephen et al. [30] presented multiple models of linear Gaussian that can predict efficiently many patterns that are typically thought to be homogenous for residential customers. The combination of the modeling strategy and the smart meter advance data has allowed a representation that not only expresses load magnitudes at a given time of day but also their variability and how these variabilities influence other times of use. The mixture model frame in which the mixture is inserted enables the most likely predictive use of multiple behaviors in order to categorize a given home consumer on a given day. In
62
Saeed Mian Qaisar et al.
this case, complex consumer behavior patterns can be recorded as they grow with seasons or differences in activity. These models have theoretical properties that allow the ready application of sampling techniques to illustrate precision gains compared to existing load profile strategies. These developments are important to the management of smaller and insular power systems. Loss of output in the Multi Frequency Averaging (MFA) model may have been due to over-fitting of the covariance matrices. Intelligent meters generate detailed data on energy usage. In intelligent power grids, it is very difficult to model energy consumption at each node, especially for systems that require linear processing of the nonlinear data. Khan et al. [31] introduced to use a series of weighted linear profiles to provide a new way to stochastically model extremely nonlinear smart meter data patterns. An advanced algorithm for clustering data with a high intrinsic pattern is designed to obtain energy consumption profiles. Compared to ordinary linear approximations without losing precision, the method linearizes nonlinear energy knowledge with certain linear profiles. In this research paper, the reliability and accuracy of the approach were continuously demonstrated, taking into account the multiple cluster cases and repetitive testing. The advantages of this approach include extracting patterns that are close to high intracluster patterns, transforming extremely nonlinear functions to linear functions, reducing smart meter huge data and reducing process time. The method allows the analyzer of nonlinear charges through linear analytical methods and increases the flexibility to model the systems of smart grid simulation.
4.4 Load forecasting with machine and deep learning In potential applications and services, it is required to forecast the requirement of different load types such as industrial load, residential load and electric vehicles. It can be effectively achieved by using the modern machine and deep learning approaches. The principle is shown in Figure 4.3.
Figure 4.3: Principle of smart meter data processing for load forecasting.
Chapter 4 A comprehensive review on the application of machine learning techniques
63
Figure 4.3 shows that the smart meter data is firstly preprocessed to denoise and prepare it for the post-feature extraction, dimension reduction and classification modules. The overall dataset is tactfully divided into two parts: testing set and training set. The training set is intelligently used to train the intended classifiers while targeting the performance criteria such as accuracy and noise thresholds. Once the chosen performance criteria are achieved, then the trained classifier is used for load forecasting based on the testing dataset. Categories of loads such as residential, industrial and electric vehicles are frequently forecasted by such approach.
4.4.1 Residential load forecasting Load forecasting has become increasingly crucial for the energy consumption of residential consumers. The consumption at the residential level is much more irregular than the transmission or distribution levels. In addition, it has a high correlation with many variable factors like resident behavior, household size, individual loads, weather conditions and many other factors. Different statistical and machine learning techniques can be used in order to figure out a suitable prediction algorithm. Shabbir et al. [32] compared the performance of three machine learning algorithms in load forecasting: linear regression, tree-based regression and SVM-based regression. The linear regression simply modules the relationship between the dependent variable and several independent variables. The tree-based regression works very well with a large number of variables and module higher order nonlinearity and great interpretability. However, SVM was found to perform accurately compared to the other algorithms as it uses more observation values. The performance is validated in terms of RMSE (root mean square error) value. Humeau et al. [33] adopted linear regression, SVR for regression, and MLP (multilayer perceptron) for residential load forecasting at individual meter level and aggregate level. Compared to these algorithms, the linear regression delivered a high performance on forecasting at the level of single household load. Nevertheless, SVR and MLP perform better on forecasting aggregated load than a single household load. It determined an aggregation size (which is 32 households in this study) at which SVR starts to outperform linear regression. The forecasted values were evaluated with MAPE (mean of average percentage error) and RMSE as well. Normalized RMSE values decreased with the increase in the number of households in clusters. Ahmadiahangar et al. [34] introduced the generalized linear mixed effects module, which is a machine learning–based regression module. It defines the correlation between a response variable and independent variables using coefficients. The module generates load patterns and predicts the possible flexibility of residential consumers. Different examples of load patterns can be generated from regression modules such as getting load patterns for a longer period or getting separate patterns for
64
Saeed Mian Qaisar et al.
workdays and holidays. It is possible to achieve a detailed prediction by studying the patterns between different residential applications. The advantage of this module is that it can be used in online and real-time control approaches. Jetcheva et al. [35] presented a building-level neural network-based ensemble module for a short-term (24 h ahead) and a small-scale electricity load forecasting. Each neural network predicts the load for a specific time point, and the set of predictions will form the load prediction of the whole day. The neural network architecture for individual networks composes the ensemble module trained with different subsets provided by a preceding clustering in order to find the most suitable combination of variables for the module. The module performance was compared with the SARIMA (seasonal autoregressive integrated moving average) module, as it produced up to 50% higher forecast accuracy. However, SARIMA is a linear statistical module that might not be able to predict high nonlinear residential loads. Kong et al. [36] proposed a framework of a deep LSTM recurrent neural network for short-term residential load forecasting. This framework addresses the high volatility and uncertainty of residential loads. Some household daily profiles can be organized into major clusters and others do not have clusters at all. The authors compared the forecasting performance of two cases, aggregating the individual forecasts at the household level and forecasting the aggregated load at the system level. The result shows that aggregating all individual forecasts yields to precise forecasting than directly forecasting the aggregated load. LTSM performance benchmarked with the conventional BPNN (back-propagation neural network). LTSM performed better with single-meter load forecasting as it can establish a temporal cooperation with loads on high level of granularity. Lusis et al. [37] examined calendar effects, forecasting granularity and length of training dataset that affect the accuracy of residential short-term forecasting. Forecasting techniques such as MLR (multiple linear regression), regression trees, SVR and artificial neural networks (ANNs) were typically trained and tested. RMSE and normalized RMSE were applied as forecast error metrics. Combining the seasonality and weather data with the load profiles yields a very low forecasting performance for calendar effects. An increasing number of households might smooth the load profiles and enhance the forecasting of periodic load patterns. Furthermore, forecast errors can be reduced by using coarser forecast granularity, increasing forecast intervals and training datasets. Support vector regression showed better performance on the accuracy of one-day-ahead forecasting. Kong et al. [38] studied occupant’s life patterns to achieve high forecasting performance at the household level. The specialty of this study was integrating appliance-level consumption at a single household to train the LSTM module. Feed forward neural network and k-nearest neighbor regression utilized to validate the performance of LTSM. The module improved the forecasting performance and established a temporal correlation between consumptions over time intervals.
Chapter 4 A comprehensive review on the application of machine learning techniques
65
4.4.2 Industrial load forecasting The accuracy and robustness of forecasting industrial load profiles are difficult to be implemented due to variable factors. These variables could be load measurements, scheduled processes, manufactured units, work shifts, weather conditions and so on. This will result in improving the energy efficiency and reducing the operational costs by managing electricity consumption and planning the production and maintenance schedule. Almalaq and Edwards [39] have introduced famous deep learning algorithms that applied to the forecasting of smart grid load. The algorithms are autoencoder, RNN, LSTM, CNN, RBM (restricted Boltzmann machine), DBN (deep belief network) and deep Boltzmann machine. RBM consists of two layers for hidden units and variable units and with the restriction that there is no connection between variables on the same layer. DBN architecture has several layers of hidden units called stacked RBM, and it is trained with a backpropagation algorithm. The architecture of DBM has stacked RBM as well but with bidirectional connections between every two layers. These algorithms showed a great percentage of error reduction for load forecasting. For instance, combining autoencoder with LSTM predicted renewable energy power plants, CNN with k-mean clustering predicted short-term summer and winter forecasting and DBM predicted wind speed for the smart grid. Ensemble DBN with SVR had the highest error reduction compared with other methods. Park et al. [40] have suggested the necessity for short-term load prediction to operate power generation and CCHP systems (combined cooling, heating and power) efficiently. The approach consisted of two stages: two single algorithms used in the first stage, which are gradient boosting and random forest. These algorithms generate better prediction accuracy as gradient boosting can work with different loss functions and random forest can manage high-dimensional data well. However, the over-fitting during training occurs with gradient boosting and the average of all predictions from the subset trees form with random forest. Due to these reasons, the prediction accuracy of these single algorithms will be affected. The predicted values are used as input variables to train the deep neural network (DNN) model and enhance the prediction performance. This is in order to take advantage of each algorithm and expand the domain of applicability and coverage. Several machine learning methods are used to demonstrate the proposed approach predicting one-day ahead on a factory electric energy consumption dataset. It showed the best performance in terms of coefficient of variation of RMSE and MAPE (mean absolute percent error). Ahmad et al. [41] have presented an accurate and fast converging one-day loadforecasting model for industrial applications in a smart grid. The model is based on MI (mutual information) for feature extraction and ANN to forecast the future load. The inputs of historical load time series ranked through the MI. After that, the ranked inputs passed through filters to remove redundancy and irrelevancy. The selected inputs of more relevant information are given to the ANN to generate training
66
Saeed Mian Qaisar et al.
and validation samples, and thus forecast the load of the next day. Finally, the enhanced differential evolution optimizer reduced the overall forecast error. The limitation of this model is that it was not designed to forecast the load of 2 or more days ahead. In addition, the convergence rate was improved but at the cost of accuracy. Hu et al. [42] have proposed a forecasting model for a short-term electric load in the process industry. This will help to optimize electricity consumption and reduce the operation cost. BPNN model was hybrid with GA (genetic algorithm) and PSO (particle swarm optimization). In order to address the drawback of BPNN falling into local optimum, combining the local search function of GA and the global search function of PSO will be necessary. GA and PSO optimize the weights and thresholds of BPNN due to the overfitting problem and after that BPNN will perform the forecasting task. PSO-BPNN and GA-BPNN implemented and compared with the hybrid GA-PSOBPNN. The proposed model proved with the highest accuracy and low MAPE value. Porteiro et al. [43] focused on studying different models for forecasting the next hour load. With the optimized model for single hour prediction, a hybrid strategy was applied to build a complete 24 h ahead hourly electricity load-forecasting model. The hybrid strategy combined both direct strategy and recursive strategy. The direct strategy develops different forecasting models for each time step to be predicted and it eventually requires a heavier computational load. The recursive strategy develops one-step forecast multiple times and it ultimately allows prediction errors to accumulate. Extra trees were selected as the best method for forecasting the next hour with the lowest training time. The complete model was evaluated against MAPE and showed its effectiveness in addressing the 24 h ahead of industrial demand forecasting. Bracale et al. [44] developed a set of MLR models for forecasting industrial loads 24 h ahead for a factory that manufactures Medium Voltage (MV)/Low Voltage (LV) transformers. The past measurements recorded at a 15-min interval as candidate quantitative variables and processed through the MLR model. Other candidate qualitative variables included as well such as hour of day, type of day and work shift. Interactions between variables were suggested, and the interaction effects between past measurements of power and calendar variables were found to be effective. MLR performance benchmarked with SN (seasonal naïve) and MLR-B model. SN assumed that the unknown load is the same as the observed load measured at the same time in the last season. Normalized mean absolute error and normalized RMSE were used for validation and testing procedure. Kim et al. [45] addressed the issue of the data absence in small industrial facilities due to the lack of infrastructure. The authors developed a generalized prediction model through the aggregation of ensemble models for peak load forecasting. The industrial datasets were nonstationary with different statistical characteristics; thus, it is impossible to develop a generalized prediction model using a single time-series analysis technique. Machine learning methods such as ExtraTrees, bagging, random forest, AdaBoost and gradient-boosting failed on load prediction in terms of RMSE
Chapter 4 A comprehensive review on the application of machine learning techniques
67
and MAPE and verified that past data cannot be used as prediction variables. Therefore, the proposed model utilized the load pattern in the same day to construct powerful predictor with combining bagging and boosting methods into several decision trees. However, the forecast performance was still weak for practical environments.
4.4.3 Electric vehicles load forecasting Many studies have been done proposing new approaches for load forecasting. Dabbaghjamanesh et al. [46] presented a Q-learning method for the plug-in hybrid car load forecasting relying on the ANN and the RNN. The Q-learning method uses Markov decision processes, which determine the best approach between all possible actions. Q-learning is considered as one of the model-free enhancement learning methods [46]. The forecasts for the ANN and RNN techniques for previous days should be used to apply the Q-learning method for predicting plug-in hybrid car load [46]. The Q-learning method, based on the findings of ANN and RNN, picks the best possible action to forecast the plug-in hybrid car load for the next 24-h horizon to find an optimal prediction of the plug-in hybrid car load. Various types of plug-in hybrid cars were used by the authors such as intelligent, coordinated and uncoordinated [46]. Their results show that plug-in hybrid cars charge can be accurately predicted by the Q-learning method that is based on ANN and RNN information [46]. Their simulation results proved that the Q-learning method reliably predicts the load for the plug-in hybrid car more than the ANN and RNN methods [46]. When comparing between Q-learning and ANN and RNN under the worst charging case of plug-in hybrid cars, the Q-learning achieves more than 50% improvement over ANN and RNN methods [46]. The suggested methodology for Q-learning will generally monitor the plug-in hybrid cars load more quickly, reliably and flexibly than the ANN and RNN methods [46]. Mansour-Saatloo et al. [47] have followed the transformation of the energy market and the introduction of the plug-in electric cars by proposing a new approach for the load forecasting of the plug-in electric cars [47]. They used one of the machine learning techniques called generalized regression neural network. For training, the researchers used old data on the arrival and leaving times of the plug-in electric cars as long as it could approximate the distance driven by each electric car [47]. Moreover, to extend the data based on suitable historical data on a suitable distribution function, the Monte Carlo simulation was used in this research [47]. The system was improved following training and testing by learning the movement pattern of each electric car, which is used to forecast the distance of new electric cars traveled. New data with 98.99% precision, mean square error of 1.3165, root mean square error of 1.1474 and mean absolute error of 0.8199 errors were predicted by the system [47]. Four deep learning techniques were used by Zhu et al. [48] to forecast the shortterm charging load of real electric car charging stations in China. The four techniques
68
Saeed Mian Qaisar et al.
are the DNN, RNN, LSTM and gated recurrent units. The database used in their research was a big data platform of a company that has a large proportion of electric car charging stations in China. Their findings indicate that the methods clearly proven their efficiency on the database. In comparison with the other three methods, the gated recurrent unit method with one hidden layer achieved the best performance [48]. These findings cannot, however, indicate which method in the application has the absolute advantage. The key explanation to that is that the model training used only restricted data of one charging station [48].
4.5 Load management with machine and deep learning Many studies have been done on load management; the following is a review of some recent ones. Nan et al. [49] introduced a smart home group demand response scheduling model focused on the residential load transmission through load aggregator [49]. Demand response has been extensively deployed in recent years as the key tool for communicating between the power grid and the consumers for developing the energy market [49]. In conjunction with the interruptible load program, the model program efficiently programs the residential loads under various demand response programs [49]. The purpose of their model is to reduce the usage of electricity by the consumer [49]. The whole group of demand response services is optimally planned under various demand response programs, without interacting with consumer convenience, which lowers user energy costs and at the same time reduces the residential peak pressure, the peak load and the energy usage [49]. Their model greatly stimulates the demand response potential of the residential population. Furthermore, from what the study suggests, it can be noted that an analysis of the expected charge curves can be carried out to examine various demand response programs that can determine an optimal demand response program [49]. The model can therefore help in the determination of electricity prices in line with the growth of the electricity markets [49]. Another study was done by Ahmed et al. [50] using a new binary backtracking search algorithm to handle energy consumption with a real-time home energy management system optimal schedule controller. In order to restrict total load demand and to plan home appliances service at certain times during the day, the binary backtracking search algorithm provides an optimal schedule for home devices [50]. Smart socket hardware model and graphical user-to-interface applications were developed to illustrate the suggested home energy management system and to provide the load-to-scheduler interface [50]. A number of household appliances most frequently used were considered regulated, including air conditioning, water heaters, fridge and washing machine [50]. They implemented their suggested scheduling algorithm in two
Chapter 4 A comprehensive review on the application of machine learning techniques
69
cases where the first case takes account of operations during the week from 4 PM to 11 PM [50]. For the sake of checking the precision of the established controller in the home energy management system, the researchers compared their experimental results with binary PSO schedule controller [50]. The binary backtracking search algorithm achieved better results in lowering the electricity usage and the overall energy bill than binary PSO schedule controller and saves energy at high loads. The binary backtracking search algorithm schedule controller achieved energy saving of 4.87 kWh a day on weekdays and 6.6 kWh a day on weekends (26.1% a day) [50]. The binary PSO schedule controller’s energy saving results are 4.52 kWh/day (20.55% daily) on weekdays and 25% on demand response hours (6.3 kWh/day) [50]. Another study presented by Adika and Wang [51], who proposed both appliance scheduling and smart charging techniques to households’ electricity management. By autonomous response to price alerts, the programming of electric appliances by the smart meter was obtained [51]. Electric consumers also should not be involved in demand side management directly [51]. This method has been used with the idea of an aggregator for the synchronization of appliance energy demands and the introduction of an intelligent cluster-based storage charging based upon a daily price forecast [51]. Consumer loads and their related batteries are divided into clusters that synchronize load patterns with their respective electricity consumptions with the pricing system for the day [51]. Their findings indicate that the consumers can save substantially on their electricity bills with sufficient scheduled storage equipment and carefully planned real-time prices [51]. The storing of electricity reduces the peak to average grid load and thus therefore helps the providers of electricity. The proposed method presented an effective electricity management approach particularly for residences with adequate and accurate historical load and electricity price data. Uncoordinated battery charge and discharge could, however, threaten grid stability and lead to customer indifference. Thus, the storage devices must be operated effectively to ensure that every electricity consumer receives full and equal value [51].
4.6 Discussion and challenges The smart meters are used in real time for calculating power usage at the finest granularities and are seen as the basis of the future smart grid technological developments that have evolved the utilization of smart meters, instead of traditional ones. Such meters are the essential elements of smart grids and provide important advantages for various stakeholders. These multitude of advantages can be categorized as social, environmental and economical. The massive installations of smart meters are producing a large volume of data collection with a desired granularity. Automated information collection, storage, processing and classification are the main factors behind the performance of smart meters.
70
Saeed Mian Qaisar et al.
The collection of fine-grained metering data is important to give practical advantages to several stakeholders of smart grid in terms of performance and sustainability. There are different goals for different stakeholders: vendors, for example, would like to reduce the operating expenses involved with traditional meter reading and significantly increase the loyalty of the consumer. The operators of the transmission systems and distribution networks want to take advantage of a more robust demand side that enables lower carbon technology penetration. Governments aim to achieve the objectives of reducing the carbon emissions by increasing the energy efficiency in the consumer side, which is given by smart meters. Consumers are having better awareness of energy. Therefore, they are expecting to benefit from the reduced electricity bills. With these goals, it is not surprising that smart meters are having a faster growth period. From a generalized viewpoint, identifying the type of device is a difficult task because of the several reasons. Firstly, there are possible overlaps among a wide variety of types, for example, laptops and tablets. Secondly, there is a wide range of devices that fall under the same group because of their unlike operating mechanisms and technical changes that occur among appliances. Generally, the appliance identification must guarantee the precise ability to simplify without exceeding a certain number of appliances. In a number of uses, the data collected through smart meters are analyzed. A smart meter’s performance is dependent on algorithms that can run and to perform real-time intelligent operations. A smart meter chipset module can perform the appropriate function by programming within hardware limitations. The smart meter processor typically performs multiple tasks such as measuring electricity, showing electrical parameters, reading smart cards, handling data and power, detecting malfunctions, interacting with other devices. To consumers, one of the key benefits of smart metering is to help them conserve more money. Comprehensive knowledge on usage would allow consumers to make better decisions on their schedules for energy use. Despite the recent increases in electricity prices, consumers should move their high-load household appliances to off-peak hours in order to reduce their energy costs and billing. One feature that would help the consumers directly is the applications that can offer details on intermittent electricity and overall energy consumption. In this context, the data can be collected on hourly basis, daily basis or weekly basis. Another useful use for energy disaggregation is to break down the gross power consumption of a household into individual appliances. In this way, the consumer will be aware about how much energy every appliance uses to help them control their consumption better. Smart metering services would allow the distribution system operators (DSOs) to properly control and sustain its network, and more effectively deliver the electricity, while reducing its running costs. Automatic billing systems will decrease billingrelated concerns and visits to the site. It helps the dealer to remotely read the consumer meter and to submit correct, timely bills without daily on-site meter
Chapter 4 A comprehensive review on the application of machine learning techniques
71
readings. This also offers a two-way link that allows detection and management of the remote issue possible. Energy stealing is one of DSOs’ most critical issues, which causes a major profit deficit. A variety of methods for the prevention of energy theft have been suggested by incorporating the AMI into electrical power systems [52]. The installation of smart meters often poses many obstacles, while adding important benefits to the community. One of the challenges in deploying modern metering technology is from an economic standpoint. The AMI’s construction, implementation and servicing entail several problems and include budgets of many billion dollars for deployment and servicing. Therefore, a cost–benefit study will be a fair starting point for potential smart metering infrastructures to develop. The advantages can be distinguished between those who are primary and secondary. Primary benefits are that which will specifically influence the bills of consumers, while indirect benefits are in terms of efficiency and changes in environmental standards and would have potential economic impacts. In [53], the current value of potential profits with respect to costs is analyzed while presuming a project life of 13 years, including 3 years of execution and 10 years of service. It is reported that, by considering both primary and secondary economic advantages, smart grid provides a favorable economic benefit if the obtained profit-to-cost ratio is between 1.5 and 2.6. There are also protection and privacy issues. Customers collaborate together with the utilities through the installation of smart meters and two-way communication capability to control electricity consumption. The details they exchange shows consumer preferences and behaviors, how they use electricity, the number of people in their homes and the devices in use that subject them to privacy breaches. In a smart grid context, a stable and scalable distributed computing network is required. In this viewpoint, one of the big challenges is to make real-time data accessible from smart meters to all the stakeholders who use these details to meet those requirements. The use of smart meter data is also growing in other applications, in addition to the load analysis, load forecasting and load management applications. Other potential applications are power network link verification, failure management, data compression and privacy analysis. Although considerable attention has been paid to smart meter data analytics and rich literature studies have been published in this field, advances in information and communication technologies and the energy grid itself will undoubtedly lead to new challenges and opportunities. The big data dilemma is one of the main issues. This is due to the multivariate convergence of data such as economic statistics, meteorological data and charging data for electric vehicles, apart from data on energy usage. In addition, integration of the bidirectional and efficient smart grid with distributed renewable energy sources and electric vehicles is another obstacle. Additionally, maintaining the confidentiality of smart meter data is another important problem. Several encryption algorithms and secure communication architectures have been proposed in this context.
72
Saeed Mian Qaisar et al.
4.7 Conclusion The recent technological advancement in Internet of things, informatics and communication is evolving the deployment of smart meters. In this chapter, a review of the latest research findings in the use of advance machine and deep learning approaches for the smart meter data analysis is presented with an application-oriented analysis. The major applications are taken into consideration such as load analysis, load forecasting and load management. A description of state-of-the-art techniques and methodologies for machine learning tailored for each considered application is presented. It is also addressed that smart meter data will be used for other possible uses in the future, such as verification of power network links, failure control, data compression and analysis of confidentiality. In the framework of smart meter data analysis, key upcoming issues such as big data, incorporation of distributed renewable energy sources, electric vehicle to grid connectivity, and data privacy and security are also identified. The analysis of smart meter data is a new and exciting research field. We hope that this study will provide suitable information to potential readers about smart meter data analysis based on machine learning and deep learning techniques, its applications and upcoming challenges.
References [1]
[2] [3] [4] [5]
[6]
[7]
[8] [9]
Alahakoon, D. and Yu, X. (2016, February). Smart electricity meter data intelligence for future energy systems: A survey. IEEE Transactions on Industrial Informatics, 12(1), 425–436. Doi: 10.1109/TII.2015.2414355. Darby, S. (2010). Smart metering: What potential for householder engagement? Building Research and Information, 38(5), 442–457. Sun, Q. et al. (2015). A comprehensive review of smart energy meters in intelligent energy networks. IEEE Internet of Things Journal, 3(4), 464–479. Alahakoon, D. and Yu, X. (2015). Smart electricity meter data intelligence for future energy systems: A survey. IEEE Transactions on Industrial Informatics, 12(1), 425–436. Sun, Q. et al. (2016, August). A comprehensive review of smart energy meters in intelligent energy networks. IEEE Internet of Things Journal, 3(4), 464–479. Doi: 10.1109/ JIOT.2015.2512325 Zheng, J., Gao, D. W., and Lin, L. Smart meters in smart grid: An overview, In 2013 IEEE Green Technologies Conference (GreenTech), April 2013, pp. 57–64, Doi: 10.1109/ GreenTech.2013.17. Cooper, A. and Shuster, M. (2019). Electric Company Smart Meter Deployments: Foundation for a Smart Grid (2019 Update). The Edison Foundation, Institute for Electric Innovation, Washington, D.C. Tounquet, F. and Alaton, C. (2019). Benchmarking Smart Metering Deployment in the EU-28. Tractebel, European Commission, Brussels. TEPCO. “Smart Meter Project,” 2018. https://www.tepco.co.jp/en/pg/development /domestic/smartmeter-e.html (accessed Oct. 30, 2020).
Chapter 4 A comprehensive review on the application of machine learning techniques
73
[10] SEC. “Smart Meters Project,” 2020. https://www.se.com.sa/en-us/customers/Pages/Smart Meters.aspx (accessed Oct. 30, 2020). [11] DEWA. “DEWA Supports The Smart City Initiative with Smart Meters and Grids,” 2015. https:// www.dewa.gov.ae/en/about-us/media-publications/latest-news/2015/07/dewa-supportsthe-smart-city-initiative-with-smart-meters-and-grids (accessed Oct. 30, 2020). [12] Xu, L. D., He, W., and Li, S. (2014, November). Internet of things in industries: A survey. IEEE Transactions on Industrial Informatics, 10(4), 2233–2243. Doi: 10.1109/TII.2014.2300753. [13] Kuzlu, M., Pipattanasomporn, M., and Rahman, S. (2014, July). Communication network requirements for major smart grid applications in HAN, NAN and WAN. Computer Networks, 67, 74–88. Doi: 10.1016/j.comnet.2014.03.029. [14] Wang, Y., Chen, Q., Hong, T., and Kang, C. (2019, May). Review of smart meter data analytics: Applications, methodologies, and challenges. IEEE Transactions on Smart Grid, 10(3), 3125–3148. Doi: 10.1109/TSG.2018.2818167. [15] Zhou, K., Yang, C., and Shen, J. (2017, February). Discovering residential electricity consumption patterns through smart-meter data mining: A case study from China. Utilities Policy, 44, 73–84. Doi: 10.1016/j.jup.2017.01.004. [16] ISSA. “CER smart metering project,” Irish Social Science Data, 2012. https://www.ucd.ie/ issda/data/commissionforenergyregulationcer/ (accessed Oct. 17, 2020). [17] UK Power Networks. “SmartMeter Energy Consumption Data in London Households – London Datastore,” 2014. https://data.london.gov.uk/dataset/smartmeter-energy-use-data-inlondon-households (accessed Oct. 19, 2020). [18] Pecan Street. “Energy research,” Pecan Street Inc., 2014. https://www.pecanstreet.org/ work/energy/ (accessed Oct. 19, 2020). [19] UMass. “Smart* dataset for sustainability,” UMass Trace Repository, 2012. http://traces.cs. umass.edu/index.php/Smart/Smart (accessed Oct. 20, 2020). [20] Ausgrid. “Distribution zone substation data,” Ausgrid, 2019. https://www.ausgrid.com. au:443/Industry/Our-Research/Data-to-share/Distribution-zone-substation-data (accessed Oct. 21, 2020). [21] Ausgrid. “Solar home electricity data,” Ausgrid, 2014. https://www.ausgrid.com.au:443/ Industry/Our-Research/Data-to-share/Solar-home-electricity-data (accessed Oct. 21, 2020). [22] ISO New England. “Pricing Reports – Zonal Information,” 2020. https://www.iso-ne.com/iso express/web/reports/pricing/-/tree/zone-info (accessed Oct. 21, 2020). [23] TEPCO. “TEPCO: Electricity Forecast,” 2020. https://www.tepco.co.jp/en/forecast/html/ index-e.html (accessed Oct. 30, 2020). [24] Khan, Z., Adil, M., Javaid, N., Saqib, M., Shafiq, M., and Choi, J.-G. (2020). Electricity theft detection using supervised learning techniques on smart meter data. MDPI Sustainability, 12 (8023). Doi: https://doi.org/10.3390/su12198023. [25] Hasan, M. N., Toma, R. N., Nahid, -A.-A., Islam, M. M. M., and Kim, J.-M. (2019, Art. no. 17, Jan.). Electricity theft detection in smart grid systems: A CNN-LSTM based approach. Energies, 12(17). Doi: 10.3390/en12173310. [26] Zheng, K., Chen, Q., Wang, Y., Kang, C., and Xia, Q. (2019, March). A novel combined data-driven approach for electricity theft detection. IEEE Transactions on Industrial Informatics, 15(3), 1809–1819. Doi: 10.1109/TII.2018.2873814. [27] Gao, Y., Foggo, B., and Yu, N. (2019, September). A physically inspired data-driven model for electricity theft detection with smart meter data. IEEE Transactions on Industrial Informatics, 15(9), 5076–5088. Doi: 10.1109/TII.2019.2898171. [28] Sahoo, S., Nikovski, D., Muso, T., and Tsuru, K. Electricity theft detection using smart meter data, In 2015 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 2015, pp. 1–5, Doi: 10.1109/ISGT.2015.7131776.
74
Saeed Mian Qaisar et al.
[29] Zhu, J., Shen, Y., Song, Z., Zhou, D., and Zhang, Z. (2019, May). Data-driven building load profiling and energy management. Sustainable Cities and Society, 49, 101587. Doi: 10.1016/ j.scs.2019.101587. [30] Stephen, B., Mutanen, A. J., Galloway, S., Burt, G., and Järventausta, P. (2014, February). Enhanced load profiling for residential network customers. IEEE Transactions on Power Delivery, 29(1), 88–96. Doi: 10.1109/TPWRD.2013.2287032. [31] Khan, Z. A., Jayaweera, D., and Alvarez-Alvarado, M. S. (2018, December). A novel approach for load profiling in smart power grids using smart meter data. Electric Power Systems Research, 165, 191–198. Doi: 10.1016/j.epsr.2018.09.013. [32] Shabbir, N., Ahmadiahangar, R., Kütt, L., and Rosin, A. Comparison of machine learning based methods for residential load forecasting, In 2019 Electric Power Quality and Supply Reliability Conference (PQ) 2019 Symposium on Electrical Engineering and Mechatronics (SEEM), June 2019, pp. 1–4, Doi: 10.1109/PQ.2019.8818267. [33] Humeau, S., Wijaya, T. K., Vasirani, M., and Aberer, K. (2013). Electricity Load Forecasting for Residential Customers: Exploiting Aggregation and Correlation between Households. In 2013 Sustainable Internet and ICT for Sustainability (SustainIT), pp. 1–6. Doi: 10.1109/ SustainIT.2013.6685208. [34] Ahmadiahangar, R., Häring, T., Rosin, A., Korõtko, T., and Martins, J. Residential load forecasting for flexibility prediction using machine learning-based regression model, In 2019 IEEE International Conference on Environment and Electrical Engineering and 2019 IEEE Industrial and Commercial Power Systems Europe (EEEIC / I CPS Europe), 2019, pp. 1–4, Doi: 10.1109/EEEIC.2019.8783634. [35] Jetcheva, J., Majidpour, M., and Chen, W.-P. (2014). Neural network model ensembles for building-level electricity load forecasts. Energy and Buildings, 84, 214–223. Doi: https://doi. org/10.1016/j.enbuild.2014.08.004. [36] Kong, W., Dong, Z. Y., Jia, Y., Hill, D. J., Xu, Y., and Zhang, Y. (2019). Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Transactions on Smart Grid, 10(1), 841–851. Doi: 10.1109/TSG.2017.2753802. [37] Lusis, P., Khalilpour, K., Andrew, L., and Liebman, A. (2017). Short-term residential load forecasting: Impact of calendar effects and forecast granularity. Applied Energy, 205, 654–669. Doi: 10.1016/j.apenergy.2017.07.114. [38] Kong, W., Dong, Z. Y., Hill, D. J., Luo, F., and Xu, Y. (2018). Short-term residential load forecasting based on resident behaviour learning. IEEE Transactions on Power Systems, 33(1), 1087–1088. Doi: 10.1109/TPWRS.2017.2688178. [39] Almalaq, A. and Edwards, G. “A review of deep learning methods applied on load forecasting,” 2017, pp. 511–516. [40] Park, S., Moon, J., Jung, S., Rho, S., Baik, S. W., and Hwang, E. (2020). A two-stage industrial load forecasting scheme for day-ahead combined cooling, heating and power scheduling. Energies, 13(2), 443. [41] Ahmad, A., Javaid, N., Guizani, M., Alrajeh, N., and Khan, Z. A. (2016). An accurate and fast converging short-term load forecasting model for industrial applications in a smart grid. IEEE Transactions on Industrial Informatics, 13(5), 2587–2596. [42] Hu, Y. et al. (2019). Short term electric load forecasting model and its verification for process industrial enterprises based on hybrid GA-PSO-BPNN algorithm – A case study of papermaking process. Energy, 170, 1215–1227. [43] Porteiro, R., Nesmachnow, S., and Hernández-Callejo, L. “Short term load forecasting of industrial electricity using machine learning,” 2019, pp. 146–161. [44] Bracale, A., Carpinelli, G., De Falco, P., and Hong, T. “Short-term industrial load forecasting: A case study in an Italian factory,” 2017, pp. 1–6.
Chapter 4 A comprehensive review on the application of machine learning techniques
75
[45] Kim, D.-H., Lee, E.-K., and Qureshi, N. B. S. (2020). Peak-load forecasting for small industries: A machine learning approach. Sustainability, 12(16), 6539. [46] Dabbaghjamanesh, M., Moeini, A., and Kavousi-Fard, A. (2020). Reinforcement learningbased load forecasting of electric vehicle charging station using Q-Learning technique. IEEE Transactions on Industrial Informatics, 1–1. Doi: 10.1109/TII.2020.2990397. [47] Mansour-Saatloo, A., Moradzadeh, A., Mohammadi-ivatloo, B., Ahmadian, A., and Elkamel, A. (2020, July). Machine learning based PEVs load extraction and analysis. Electronics, 9, 1150. Doi: 10.3390/electronics9071150. [48] Zhu, J., Yang, Z., Guo, Y., Zhang, J., and Yang, H. (2019, April). Short-term load forecasting for electric vehicle charging stations based on deep learning approaches. Applied Sciences, 9, 1723. Doi: 10.3390/app9091723. [49] Nan, S., Zhou, M., and Li, G. (2018, January). Optimal residential community demand response scheduling in smart grid. Applied Energy, 210, 1280–1289. Doi: 10.1016/j. apenergy.2017.06.066. [50] Ahmed, M. S., Mohamed, A., Khatib, T., Shareef, H., Homod, R. Z., and Ali, J. A. Real time optimal schedule controller for home energy management system using new binary backtracking search algorithm. Energy and Buildings, 138, 215–227. [51] Adika, C. O. and Wang, L. (2014, May). Smart charging and appliance scheduling approaches to demand side management. International Journal of Electrical Power & Energy Systems, 57, 232–240. Doi: 10.1016/j.ijepes.2013.12.004. [52] Singh, S. K., Bose, R., and Joshi, A. (2019). Energy theft detection for AMI using principal component analysis based reconstructed data. IET Cyber-Physical Systems: Theory & Applications, 4(2), 179–185. [53] Pawar, S. and Momin, B. “Smart electricity meter data analytics: A brief review,” 2017, pp. 1–5.
Uma Maheswari V., Rajanikanth Aluvalu, Krishna Keerthi Chennam
Chapter 5 Application of machine learning algorithms for facial expression analysis Abstract: Nowadays, facial expression analysis (FEA) is becoming an important application on various fields such as medicine, education, entertainment and crime analysis because it helps to analyze where no verbal communication is possible. FEA is being done after face recognition and depends on the feature extraction of how efficiently it is generated. Therefore, classification plays a vital role to acquire the necessary output to analyze the correct expression. In addition, machine learning (ML) and deep learning algorithms are useful to classify the data as system requires either structured-like text or unstructured-like images and videos perhaps to analyze the expression, and image input is preferred by the system as well because the face image consists of a kind of information like texture of organized features, age, gender and shape which cannot be described properly by the textual annotation to a corresponding image. The system can be done in different ways: either it can apply the deep learning algorithms on raw data, or can apply ML algorithms on the preprocessed images based on the user requirement. This chapter discusses the challenges and potential ML algorithms and efficient deep learning algorithms to recognize the automatic expression of humans to prop up with the significant areas such as human computer interaction, psychology in medical field, especially to analyze the behavior of suspected people in crowded areas probably in airports and so on. In recent years, ML algorithms had become very popular in the field of data retrieval to improve its efficiency and accuracy. A new state-ofthe-art image retrieval called ML algorithms plays an imperative role to decrease the gap semantically between the user expectation and images available in the database. This chapter presents a comprehensive study of ML algorithms such as supervised, unsupervised and a sequence of both. Furthermore, the demonstration of various ML algorithms is used for image classification, and of clustering which also represents the summary and comparison of ML algorithms for various datasets like COREL and face image database. Finally, the chapter concludes with the challenges and few recommendations of ML algorithms in image retrieval.
Uma Maheswari V., Computer Science and Engineering, Vardhaman College of Engineering, Hyderabad, Telangana, India e-mail: [email protected] Rajanikanth Aluvalu, Computer Science and Engineering, Vardhaman College of Engineering, Hyderabad, Telangana, India e-mail: [email protected] Krishna Keerthi Chennam, Computer Science and Engineering, Muffkham Jah College of Engineering, Hyderabad, Telangana, India, e-mail: [email protected] https://doi.org/10.1515/9783110702514-005
78
Uma Maheswari V., Rajanikanth Aluvalu, Krishna Keerthi Chennam
Keywords: facial expression, image processing, image retrieval, machine learning, supervised learning
5.1 Introduction Machine learning (ML) is the subfield of artificial intelligence (AI) and is a broad and captivating field in AI. It has become the hottest fields to work in crucial areas such as crime, image processing and face recognition. It has a great diversity of areas of application from medicine to entertainment, from military to perambulator, from satellite to traffic signals and from text to image, etc. Its prominence is massively growing in many fields. It is an increased need to manage and understand abundant data to classify, analyze and retrieve. In ML, the system should be made to learn the knowledge from the experience to perform the task without explicit programming. It can distinguish the quantity of learning strategies like inference and the learner performs on the statistics afforded by the trainer. The trainee is capable of performing by increasing the magnitude of inference so that the burden on the trainer reduces. Learning can be done in many ways like learning from analogy, learning from evolutionary methods and learning from examples. In human cognition, analogy swears to become an internal inference process and also an influential computational mechanism. The analysis involves problem-solving methods comprising decisions on actions that compress the known differences between the preferred state and an existing state. It plays a vital role in image region at the level of feature extraction for image processing and retrieval, face recognition, expression analysis, etc. [1]. Furthermore, the taxonomy of ML shells out special attention to methods of learning by presupposing an example. To avoid the hyperspectral imagery dimensionality, high selection of band and reduction of dimensionality are performed prior to the process of image processing like principal component analysis algorithm. The unsupervised classification is used in remote sensing of hyperspectral due to difference in previous data on categories. For the clustering of data about remote sensing of hyperspectral, artificial DNA calculation has been employed [2]. Lei Wang and Latifur Khan [3] proposed a new hierarchical clustering for images obtaining from three features such as color, texture and shape. Detection of potholes for independent vehicle in an unstructured environment by using the image data collection about roads was published in [4]. The face recognitionbased attendance recording system was proposed by S. Viswanadha Raju by applying face recognition methods on facial image database [5]. Color and texture combinational feature extraction was done for content-based image retrieval and then extended to retrieve subimages [6]. Multimodel biometric authentication is demonstrated to improve accuracy and reliability rather than the unimodal system [7]. ML algorithms are classified as supervised learning (SL) and unsupervised learning.
Chapter 5 Application of machine learning algorithms for facial expression analysis
79
5.2 Supervised learning SL is a branch of ML technique that is used in Figure 5.1 for presuming a function of training data from labels. A set of training examples together with labels are available in training dataset for retrieving the required data. In SL, learning algorithm requires a function from input object to the target output. SL analyzes the training data and fabricates the speculative function that is used for planning examples. The algorithm allows to suitably determine the class labels for unknown instances using optimal synopsis called classification problem and if the goal state is stable and consecutive, it is called a regression problem. Supervised ML techniques are used for automatic laughter detection from full body moment of users [8]. SL algorithms are works on some parameters such as variance, the quantity of collected training data, the amplitude of the state space and noise in the goal state. Given a set of “n” number of training examples as {(a1, b1), . . ., (an, bn)}, a learning to look for a required function s: a ! b, where “a” is the initial state and “b” is the goal state. The function “s” is the factor of some space in all possible functions in state space “S” (the so-called inference space). Assume “k” is the attaining function k: A x B ! R such that “a” is defined as returning the “b” value that gives the maximum attaining value K(x) = argmax b fða, bÞ. Let “s” represent the space of attaining function. Most of the learning algorithms are either probabilistic or statistical models, where “s” takes the conditional probability model as K ðxÞ = Pðb=aÞ or as joint probability Sða, bÞ = Pða, bÞ.
Figure 5.1: Image retrieval using machine learning algorithms.
80
Uma Maheswari V., Rajanikanth Aluvalu, Krishna Keerthi Chennam
5.2.1 Linear regression Linear regression (LR) is an exceedingly fundamental algorithm used for predictive analysis like whether the set of variables are correctly predicting the outcome variables or not, and what variables are particularly significant variables. These regression estimates are employed to describe the association among one relevant variable and single or multiple autonomous variables by fitting the best line which is called as the best fitting line or regression line. The most general method for making regression line is least squares method evaluated by reducing the sum of the squares of vertical deviations to every data variable for regression line on observed data. After the best fit line has been evaluated for the collection of data, any data point lies farther from the regression line, called as outliers, lead to represent data with errors. Yuan Wang and Liang Chen proposed the face recognition algorithm for image by image using LR-based dual classification for clustering [9]. Yongsheng Dong and Dacheng Tao suggested shearlets and LR for texture classification and retrieval [10]. Generalized linear regression classification has been proposed to avoid the unexpected effects such as limited information, irregular status and continuous changes [11]. A novel approach has been proposed for face identification using LR and pattern recognition [12]. To reduce the outlier effects, use dual difference regression classification with the combination of difference regression classification and pixel difference binary pattern [13]: yi = pxi + q
(5:1)
where yi is dependent (criterion) observed response variable, “p” is regression coefficient, xi is independent (predictor) variable and “q” is a constant. Usually, regression method is divided into two types: 1. Simple LR (SLR) algorithm 2. Multiple LR (MLR) algorithm SLR is a statistical method used to recapitulate and find the association between continuous variables as depicted in Figure 5.2. The SLR is most frequently used in a causal analysis, forecasting, effect and trend forecasting. MLR with two or more independent variables are more effectively used to define the matrices. The LR algorithm is used for face recognition to capitulate more recognition accuracies without involving any preprocessing steps of image localization [12]. Probably our prediction would not be always correct but it produces some error called as prediction error or residual error: y = pxi + q
(5:2)
where y is the expected response of experimental unit “i,” and experimental unit is an object in which the dimensions are made.
81
Chapter 5 Application of machine learning algorithms for facial expression analysis
From Equations (5.1) and (5.2), the prediction error is errori = yi − yi if we equal the expected positive and negative errors can be avoided. 2 2 Xnerrori = ðyi − yi Þ : this is the squared expected error for ith data point that error2i adds the squared expected errors for all “n” number of data points. i=1
Regression line Data points
Outlier
Figure 5.2: Linear regression model: here red color dots represent data points and green color star indicates the outlier data point farther from the best fitting line.
5.2.2 Logistic regression (LoR) Logistic regression (LoR) is the method of statistical field that is used to analyze the dataset and also describes the association between the dichotomous variables and one more independent variable that resolves the output by finding the best fitting model. LoR takes probability or odds as a response for exacting the value based on the collection of variables taken as predictors. For example, LoR is mostly suitable for analyzing the data and gives the probable outcomes like the classification of students either selected or rejected based on marks. LoR produces the coefficients that are used to predict the probability of the presence for characteristic interest:
1’s 0’s
Figure 5.3: Logistic regression model: here red color dots represent data points and green color dots indicate the outlier data points further from the best fitting line.
logitðpl Þ = β0 + β1 X1 + β2 X2 + + βn Xn where pl is the presence of characteristic interest probability.
(5:3)
82
Uma Maheswari V., Rajanikanth Aluvalu, Krishna Keerthi Chennam
Equations (5.4)–(5.6) represent that the logit transformation can also be defined as odds: odds =
pl 1 − pl
(5:4)
Here, (
probability of presence
pι
1 − pι otherwise
p logitð pÞ = log 1 − lp = β0 + β1 X1 + β2 X2 + + βn Xn l
(5:5)
(5:6)
5.2.3 Naive Bayes classification Naive Bayes classification stands for SL method and is also used as a statistical method for classification. Thomas Bayes had proposed this classification method, named as naive Bayes classifier. This classification uses the probabilities of each object present in each class to formulate the prediction. Naive Bayes classifiers are extremely amendable; they require the count of parameters as linear in the count of variables such as input features and required predictors in a learning process. Maximum likelihood training uses the hypergeometric expression calculation. The naive Bayes classifier technique is built based on the popular Bayesian theorem and is the most probably preferable one to the high dimensionality input. Naive Bayes algorithm works based on the prior probabilities, likelihood and posterior probabilities, and prior probabilities are taken from the past experience and are used to predict outcomes before they execute: no. of required objects
probability for required object ∝ total no. of available objects
(5:7)
Likelihood is calculated from randomly selected area from the space no. of objects in the vicinity
likelihood of particular object ∝ total no. of objects in the space
(5:8)
Posterior probability is produced by the production of prior probability and likelihood from Equations (5.3) and (5.4): posterior probability of particular object ∝ prior probability for required object × likelihood of particular object
(5:9)
Chapter 5 Application of machine learning algorithms for facial expression analysis
83
Naive Bayes classifier also deals with random number of autonomous variables as either categorical or stable. Assume a set of variables A = {a1 , a 2 , . . ., a n } to evaluate the posterior probability for the possible outcomes, O = {o1 , o2 , . . ., on }; A ! predictor; O ! categorical levels present in the independent variable pðOi ja1 , a2 , . . ., an Þ α pða1 , a2 , . . ., an jOi ÞpðOi Þ pðOi ja1 , a2 , . . ., an Þ ! posterior probability of a class where the probability A 2 Oi
(5:10) (5:11)
According to the naive Bayes rule, conditional probabilities from Equations (5.10) and (5.11) of the independent variables are statistically autonomous. Likelihood can also be written as 1 pðAjOi Þαπn− p=0 pðAp jOi Þ
(5:12)
And posterior probability as pðOi jAÞ α pðOi Þ. To achieve the highest posterior probability using the above rule, we can label a new case A with class level Oi image representation based on edge extraction using naive Bayes classifier for face detection [14].
5.2.4 Decision tree The decision tree is a classifier widely used and simple learning direct classification algorithm. It organizes the gathering of test data and conditions of tree structure without domain knowledge. In decision trees, root and all internal contain test conditions to be satisfied based on characteristics. All terminal nodes are assigned as class labels “yes” or “no.” Classification of test data based on records is directed. The decision tree had been constructed earlier; searching starts from the root node of decision tree by applying the test state to the record and continuation of the next bough depends on the result of the test. ML algorithms are divided into classification and regression tree [15] which is used for classification of six basic emotional facial expressions (happy, surprise, angry, sad, disgust, fear and neutral state). GINIðDÞ = 1 −
P +1 P
Pr ðijtÞ2
(5:12)
P=2
where “D” is the dataset, “p” is the number of classes in dataset “D,” pr ðijtÞ is the probable occurrence of class “p.”
84
Uma Maheswari V., Rajanikanth Aluvalu, Krishna Keerthi Chennam
Using Equation (5.13), Gini impurity is the best measure to calculate the decision tree used for computing the data splitting. Figure 5.3 depicts the sample labeled data. Decision trees are: – Easy to understand the output without having statistical data and it is instinctive to use in any area. – It works well for raw data and is not affected by missing data. – It handles numerical as well as categorical data. – Prestructure is not required. – Create new instances to predict the selected instance in a better way. – Decision tree is not suited for continuous data variables because it loses the data variables during working. It suffers from the over-fitting problem because the model contains low bias and high variance problem.
5.2.5 Random forest Random forest is the most reputed and dominant algorithm among ML algorithms that use a bagging method to construct the branches of decision trees from the randomized dataset. Random forest is a group of classification decision trees. It is used to categorize a new example from the input feature vector to get the classification from each tree. It contains a random number of decision trees supportive to describe the output. Random forest uses the bagging approach to create the set of trees with the random set of responses that take the form of a class membership. A random sample of the dataset is trained multiple times to attain the fine prediction. According to Age Young
Middle
About to retired
Employee
Student Fair No
Old
No
Yes
Excellent
Fair Yes
No
Excellent Yes
Figure 5.4: Sample decision tree for labeled data with prediction of different age group people credit card status.
Chapter 5 Application of machine learning algorithms for facial expression analysis
85
the ensemble learning technique in the random forest, all the outputs of the random decision trees are combined to build the ultimate prediction. It is achieved by polling the results of every individual decision tree that come out more number of times in the decision trees (Figure 5.4).
Procedure – Consider the train data and generate a whole forest of trees. – Each tree is grown independently by the bootstrap sample from the training data. – Find the best split on “m” randomly selected variables without pruning the tree. – Grow up deep trees and get the prediction for a new set of data by voting the predictions from all the trees. – Evaluate the skeleton. – Evaluate the out-of-bag data. – Out-of-bag data serves as a test set of the tree. – Random forest parameter. – Random forest generates the trees on the best splits of “m” randomly selected variables. 1 – Short with m = ðMÞ2 , where “M” is the whole number of variables: Mean error = ðobserved − tree responseÞ2
(5:14)
– Run a few trees and record out of bag error rate from (5.14) and then increase or decrease, m-value until getting the low error rate. – Variable importance. – Random forest is used to grade the prominence of variables by evaluating the number of votes for the correct class in out-of-bag data. Predictions of the random forest are considered as the average of the decision trees: Random forest prediction p = N1
n P n =1
N th
(5:15)
In Equation (5.15), tree response, where “N” runs over the trees in the random forest.
5.2.6 Support vector machine Support vector machines (SVMs) algorithms are a kind of most widely used SL algorithms that analyze the data used for classification and regression analysis. Frank Rosenblatt introduced a simple linear method called perceptron to do classification.
86
Uma Maheswari V., Rajanikanth Aluvalu, Krishna Keerthi Chennam
Furthermore, Vapnik and Chervonenkis proposed the another method, that is, maximal margin classifier. Vapnik enhanced his ideas by applying the Kernel trick that allows utilizing the SVM to classify linearly no separable data. Later, Cortes and Vapnik invented the soft margin classifier which allows accepting some misclassifications while using an SVM. The aim of SVM is to locate the best splitting hyperplane which maximizes the training data margin illustrated in Figure 5.5. SVM Linear Classifier with C = 100
5 4.5 4 3.5 3 2.5 2 1.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Figure 5.5: Image classification using linear SVM with optimal hyperplane.
SVM will help us in identifying the correct hyperplane to classify the two classes among many: – Maximizing the distances between the nearest class and hyperplane assists to decide the correct hyperplane and the distance is called as margin. – Choosing the higher margin hyperplane is robustness. A hyperplane with low margin always leads to the chance of misclassification. – SVM is robust to ignore the outliers. Let ðpi , qi Þ, 0 ≤ i ≤ n − 1 be the collection of training samples, pi 2 Rn belong to classes that are labeled by qi 2 ð − 1, 1g. The main object of the SVM is to divide the set of samples with the same labels to keep on the same side of the hyperplane linearly [16]: qi ðmpi + cÞ > 0,
i = 1, 2, . . ., n
(5:16)
It is possible to change the parameters of “m” and “c” min qi ðmpi + cÞ ≥ 1,
1≤ i ≤n
i = 1, 2, . . .
(5:17)
Chapter 5 Application of machine learning algorithms for facial expression analysis
87
Here, Equations (5.16)–(5.18) represent the hyperplane based on distances, and the nearest point to the hyperplane distance is 1=jjMjj in Equation (5.17), so qi ðmpi + cÞ ≥ 1
(5:18)
SVM is popular to classify the facial expressions based on the feature extracted from the image. It employs the hypothesis function to predict the required output. Kotsia et al. have used SVM for the classification of given expressions and also applied Gabor wavelet [17]. Multiclass SVM algorithms are also introduced to develop the rules for the better classification [2]. SVM can be used for linear classification and regression to solve the problems [18].
5.2.7 KNN algorithm The k-nearest neighbor (KNN) algorithm is the better algorithm of the easiest algorithms among all ML algorithms. It is used for classification regression as well as memory-based statistical model defined by the collection of objects called as examples. These are labeled when outcome is known. KNN achieves classification and regression by finding k examples, which are nearer to the point named KNN algorithm (k = 3 and k = 6 as shown in Figure 5.6). In fact, k is a vital factor of an algorithm to affect the eminence of predictions. To do predictions by using KNN, we need a metric to measure the distance between the selected point and nearest example cases. Using Equation (5.19), the most popular measures for KNN and Equation (5.20) are used for prediction average to be evaluated: 8 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi > > ðp − qÞ2 Euclidian > > > < 2 (5:19) Dðp, qÞ = ðp − qÞ Euclidean squared > > > Absðp − qÞ city block > > : maxðjp − qjÞ Chebyshew when “p” and “q” are the selected point and the example cases, respectively. The regression of KNN predictions is the average of KNN’s outcomes: p=
−1 1 nP k i=0 Pi
(5:20)
where Pi is the ith case of example and “p” is the prediction outcome of the selected point. In classification, KNN predictions are labeled based on the selection from Equation (5.19). Fuzzy KNN classifier and KNN classifier are proposed for cattle muzzle matching based on expectation maximization (EM) [19]. The class provides with membership array which describes fuzzy membership in every training instance so that
88
Uma Maheswari V., Rajanikanth Aluvalu, Krishna Keerthi Chennam
Class X Class Y
K=3 K=6
Figure 5.6: Example for k-nearest neighbor.
every neighbor uses its array [20]. Smiling and nonsmiling dataset are trained using KNN algorithm to extract the individual part features like mouth, eyes, etc.
5.2.8 Ensemble classifier The core initiative of SL is to classify the data based on labels into a set of categories. A set of instances are also called as training dataset and the ultimate aim is to create the structure using new label instances. Sometimes algorithms are weakened to construct the structure using instances for particular data [21], so ensemble classifiers are introduced to boost up the structure to get better results. The main motive of the ensemble classifiers is to evaluate the multiple different classifiers and combine them (Figure 5.7), in turn, to get the classifier and performs on the dataset particularly for better abundant data, single classifiers may not work properly. In the 1970s, Tukey introduced the combination of LR models, whereas one is for initial and second is for the remaining. Dasarathy introduced the partition of the input data space with more than one classifier when data is quite less. Schapire broached probably approximately correct from AdaBoost algorithm combined with weak classifiers. Ensemble methods can be acquired by manipulating the training set, input data, class labels and learning algorithms like boosting regression. Multilayer perceptron and nearest neighbor classifiers are used for color image classification and finally selects weighted majority vote to improve the efficiency and SVM binary classifier ensemble for image classification [22, 23]. Based on the ensemble of extreme learning machines (ELMs) neural network and stacked autoencoder (SAE), namely SAE-ELM, a remote sensing image classification is done and is proposed for low-, medium- and high-resolution remote sensing images [24]. Memory management approach is based on the Kullback–Leibler divergence for face recognition [25]. Different classification methods are used for capturing the gender from facial images [24, 26, 27]. The mixture of decision tree and SVMs is used for classification based on ethnic and gender for human faces and poses [28]. Fully automotive facial expression detection is done by using the new combinational classifiers of random forest algorithm and SVM algorithm for classification of spontaneous facial expressions [29].
Chapter 5 Application of machine learning algorithms for facial expression analysis
89
Figure 5.7: Ensemble classifier (combination of different classifiers).
5.3 Unsupervised learning Unsupervised learning is the field of ML algorithms that are used to extract the implications as outputs from the datasets without having input labels. Unsupervised learning consists of clustering, dimensionality reduction, deep learning, recommender systems, etc. In this learning, data are available in different clusters and those are formed by dividing the data based on similar features called clustering. Unsupervised learning is more skewed than SL because this learning is not so easy as prediction is found in the supervised, though it acquires unlabeled data instead of labeled so that human interference is less. Furthermore, it works on large databases such as in the medical field, online browsing data and online shopping data.
5.3.1 k-Means k-Means algorithms are the most widely used unsupervised algorithm that solves the well-being problems upon clustering (Figure 5.8). Partitioning the data into groups as a tiny number of clusters is called clustering, and the grouped collection of objects should be in the similar group as in the shopping mall arrangement of the clothes that have been clustered in categories such as men, women, children and newborn for picking up easily. Assume that we have “n” data points Aj, j = 1, 2, . . ., n, being partitioned into “k number of clusters, select the center to each cluster and find the
90
Uma Maheswari V., Rajanikanth Aluvalu, Krishna Keerthi Chennam
distance between the data points with corresponding positions δj , j = 1, 2, . . ., k, where k is the total number of clusters. Always try to diminish the distance from the data points to the center point; mostly k-means clustering algorithm uses the squared Euclidean distance [30]. For example, k = 3 means, total data points are categorized into 3 number of clusters based on the similar distances calculated from center data point. 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0
0
1
2
3
4
5
Figure 5.8: k-Means clustering for COREL database having three clusters where k = 3 for COREL database.
There are no proficient solutions for some problems, and formulations are considered as NP-hard, so the k-means algorithm merely anticipates finding the minimum, perhaps when getting locked into a different solution: argmin cp
k−1 P P j=0 a2cj
dða, ∂j Þ = argmin cp
k−1 P P j=0 a2cj
k ða − ∂j k22
(5:21)
where cj is the set of data points that belong to cluster “j,” “k” is the number of clusters, δj′ is the position “j” of the cluster. – Find the center of the clusters δj , j = 1, 2, . . ., k – Evaluate the closer cluster to each center using the following equation: cj = ði:d ai , δj ≤ dðai , δl Þ, l ≠ j, i = 1, 2, . . . , n g
(5:22)
– Find the position of the clusters by calculating the mean using Equation (5.23) for all data points that belong to the cluster:
Chapter 5 Application of machine learning algorithms for facial expression analysis
μj =
1 ðcj j
cj P
91
(5:23)
Aj
j=1
ðcj is the number elements in cluster; repeat the above two steps until converged to the point. Diminish the squared mean distance from every data point to its nearer center point in a cluster called as squared error distortion [30]. Fuzzy k-means algorithm will work better rather than the performance of k-means algorithms for clustering and applying the image input dataset [31]. An improved k-means algorithm is used to handle the large datasets, which could be useful to reduce the compassion to the random point and data defluxion to overcome some orientation caused by the imbalance in data partitioning [32].
5.3.2 Expectation maximization EM algorithm was introduced by Hartley in 1958 to evaluate the maximum likelihood from a random population [33]. EM is an unsupervised clustering iterative method alike k-means algorithm; it creates the clusters as in k-means but in elliptical forms. To maximize the differences between them, the means of continuous variables will be used instead of actually assigning samples to the clusters when the observations are deficient data [33]. This is used to exploit the overall probability or data variable’s likelihood in a cluster. EM algorithm calculates the probabilities of the cluster data points using probability functions and distributions. EM algorithm works on both the datasets: continuous and categorical [27, 34]. The main motive of EM algorithm is to calculate standard deviations and that of each cluster data points to maximize the likelihood of the data. E level is responsible for evaluating the probability of every cluster data point. “M” level is for calculating the probability distribution of every class variable for the next level using Equations (5.25) and (5.26). EM algorithm is used for color segmentation which is applied initially for the image from RGB to HSV and then applied for the Gaussian mixture based on EM to evaluate parameter values to provide classified images [18, 35]. Assume training having “n” samples {X1 , X2 , . . ., Xn }, Equation (5.24) describes the probability model p(x, y) and the likelihood as follows: lð∅Þ =
n−1 P i=0
log Pðx; ∅Þ =
n+1 P i=2
log
P
Pðx, y;∅Þ
(5:24)
y
E step: Qi ðyi Þ: =pððyi jxi Þ;∅Þ
(5:25)
92
Uma Maheswari V., Rajanikanth Aluvalu, Krishna Keerthi Chennam
M step: ∅ := argmax∅
PP i
yi
pðxi , yi ;∅Þ Qi yi log Q yi ið Þ
(5:26)
where “l” is likelihood and Qi be the posterior distributions yi and xi ; here yi is continuous and ∅ is the special parameter. Assume ∅t and ∅t+1 are two successive iter ations of EM algorithm until convergence, and l ∅t ≤ l ∅t+1 always shows the improvement of the log likelihood.
5.3.3 Hierarchical clustering Hierarchical clustering is a clustering technique inspired from the concept of merge sort and binary tree which clusters the “N” data items into “N” clusters and successively merge the pairs of clusters by comparing the similar cluster level by level until forming the single cluster. Hierarchical clustering is of two types according to the top-down and bottomup approach: 1. Agglomerative hierarchical clustering (AHC) algorithm 2. Divisive hierarchical clustering (DHC) algorithm AHC works from top to bottom by merging the similar clusters based on the minimum distance between one cluster and another cluster until it forms a single cluster. Here the distance can be either minimum, maximum, average or centroid. Let S = {(A), (B), (C), (D)} – Initiate the clustering from beginning. – Evaluate the minimum distance between the similar pairs of clusters A and B as follows: DisðA, BÞ = MIN½Disðr, cÞ
(5:27)
Here, “Dis(A, B)” is the distance between A and B clusters, “r” and “c” are the rows and columns of the minimum distance matrix. – Get into the next level and merge clusters A and B into one cluster to the next clustering: N ðcÞ = Dis½ð AÞ, ðBÞ
(5:28)
– Remove the rows and columns of respective clusters A and B, and then insert the new cluster into the matrix. – Then evaluate the distance using Equation (5.29) between the new cluster (A, B) and old cluster (C) and form the single cluster:
93
Chapter 5 Application of machine learning algorithms for facial expression analysis
Dis½ðA, BÞ, ðCÞ = MIN½DisðA, CÞ, DðB, CÞ
(5:29)
– Similarly it continues until the single cluster is formed. Lei Wang and Latifur Khan [3] proposed a new hierarchical clustering model for images. It uses three features such as color, texture and shape by dispersing the combinational values based on the similarity between objects by comparing the threshold value. Table 5.1 consists of various data sets used for face recognition and expression analysis. Table 5.1: Sample datasets with format and description for face recognition and expression analysis. Name of dataset
Format
Application
No. of images
Description
FERET (Facial Recognition Technology)
Image
Face recognition, image classification
,
It consists of , different pose images of , different people
GFED (Grammatical Facial Expression Dataset)
Text
Facial emotion recognition, gesture identification
,
Facial expression recognition from Brazilian significant language
CASIA
Text and images
Classification and face recognition, expression analysis
,
Consists of expressions such as smile, anger, surprise, closing eyes and laugh
SoF
Images (mat type)
Age detection, gender classification, face recognition and expression analysis
,
Consists of men and women ( persons) images in different illumination
Activitynet
Images, videos and text
Action classification and detection
,
Consist of images of different activities like reading, sports, etc.
5.4 Conclusion The chapter describes the literature survey on ML algorithms used for image classification especially for face recognition and facial expressions like happiness, surprise, sad, disgust and angry. Algorithms such as SVM, k-means, LR, naive Bayes and KNN especially ensemble classifiers (combination algorithms) most effectively work for image classification to improve the accuracy and reduce the time. Unfortunately, some of the algorithms have not been used frequently for images but can be
94
Uma Maheswari V., Rajanikanth Aluvalu, Krishna Keerthi Chennam
tried with the combination of other to improve the efficiency. In addition, for the essential phase of ML algorithms for images, dataset is crucial based on the training and testing of the datasets and it is a tedious task to acquire such datasets. Furthermore, image retrieval is the biggest task to retrieve the correct image from database using suitable similarity measurement methods. Here, ML algorithms supported a lot to retrieve the required image accurately in less time.
References [1] [2]
[3] [4] [5]
[6] [7]
[8]
[9]
[10]
[11] [12] [13] [14]
[15]
Carcagn, P., Marco, D. C., Leo, M., and Distante, C. (2015). Facial expression recognition and histograms of oriented gradients: A comprehensive study. Springer Plus, 4(1), 1–25. Jiao, H., Zhong, Y., and Zhang, L. (2014). An unsupervised spectral matching classifier based on artificial DNA computing for hyperspectral remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 52(8), 4524–4538. Wang, L. and Khan, L. et al. A new hierarchical approach for image clustering. A New Hierarchical Approach for Image Clustering, 41–57. Sachinbharadwajsundra Gollavaraprasad and Murthy. Detection of potholes in autonomous vehicle. IET Intelligent Transport Systems, 543–549, 2013. Borrasurekha and Raju, S. V. (2017). Attendance recording system using partial face recognition algorithm. Attendance Recording System Using Partial Face Recognition Algorithm, 293–319. Viswanadha Raju, S. and Sreedhar, J. (2011). Query processing for content based image retrieval. International Journal of Soft Computing and Engineering, (1), 122–131. Viswanadha Raju, S. and Madhavigudavalli. (2012). Multimodal biometrics – sources, architecture & fusion techniques: An overview. Proceedings of International Symposium on Biometrics and Security Technologies, 27–34. Niewiadomski, R., Mancini, M., Varni, G., Volpe, G., and Camurri, A. (2016). Automated laughter detection from full-body movements. IEEE Transactions on Human-Machine Systems, 46(1), 113–123. Wang, Y. and Chen, L. (2016). Image-to-image face recognition using Dual Linear Regression based Classification and Electoral College voting Sign In or Purchase. Proc. Int. Conf. Cognitive Informatics & Cognitive Computing, 106–110. Dong, Y., Tao, D., Xuelong, L., Jinwen, M., and Jiexin, P. (2015). Texture classification and retrieval using shearlets and linear regression. IEEE Transactions on Cybernetics, 45(3), 358–369. Chou, Y.-T. and Yang, J.-F. (2016). Identity recognition based on generalised linear regression classification for multi-component images. IET Computer Vision, 10(1), 18–27. Naseem, I., Togneri, R., and Bennamoun, M. (2010). Linear regression for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(11), 2106–2112. Piao, N. and Park, R.-H. (2015). Face recognition using dual difference regression classification. IEEE Signal Processing Letters, 22(12), 2455–2458. Nguyen, D., Halupka, D., Aarabi, P., and Sheikholeslami, A. (2006). Real-time face detection and lip feature extraction using field-programmable gate arrays. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 36(4), 902–912. Zahra, F. and Madani, A. (2016). Facial expression recognition using decision trees. Proc. Int. Conf. Computer Graphics, Imaging and Visualization, 125–130.
Chapter 5 Application of machine learning algorithms for facial expression analysis
95
[16] Chapelle, O. (1998). Support Vector Machines et Classification d’ Images”, Patrick Haffner et Vladimir Vapnik au sein du departement Image ProcessingReseach. AT&T, Redbank, NJ, USA Department of Image Processing Research, AT&T, Redbank, NJ, USA. [17] Anvitabajpai, K. and Chadha. (2010). Real-time facial emotion detection using support vector machines. International Journal of Advanced Computer Science and Applications, 1, 37–40. [18] Huang, Z.-K. and Liu, D.-H. (2007). Segmentation of Color image Using EM algorithm in HSV color space. Proc. Int. Conf. Information Acquisition, 316–319. [19] Mahmoud, H. A. (2015). Cattle classifications system using Fuzzy K-nearest neighbor classifier. In Proc. Int. Conf. Informatics, Electronics & Vision, 15–18. [20] Wang, L., Ruifeng, L., and Wang, K. (2013). Automatic facial expression recognition using SVM based on AAMs. Proc. Int. Conf. Intelligent Human-Machine Systems and Cybernetics, 330–333. [21] Liorrokach. (2009). Ensemble-based classifiers. Artificial Intelligence, 33, 1–39. [22] Shy Goh, E. K. and Chang. (2001). SVM binary classifier ensembles for image classification. Proc. Int. Conf. Information and knowledge management, 395–402. [23] Ahmadi, F. and Mohamad-Hoseynsigari. A rank based ensemble classifier for image classification using color and texture features. In Proc. Int. Conf. Machine Vision and Image Processing, 342–348. [24] Min Feilv, H. et al. (2017). Remote sensing image classification based on ensemble ELM with stacked autoencode. IEEE Access, 9021–9031. [25] Miguel, D. L. T., Gorodnichy, D. O., Granger, E., and Sabourin, R. (2015). Individual-specific management of reference data in adaptive ensembles for face re-identification. [26] Babackmoghaddam, M. H. et al. (2002). Learning gender with support faces. IEEE Transaction on Pattern Analysis and Machine Intelligence, 707–711. [27] Uma Maheswari, V. et al. (2020) Local directional maximum edge patterns for facial expression recognition. Journal of Ambient Intelligence and Humanized Computing. [28] Gutta, S., Huang, J. R. J., Jonathon, P., and Wechsler, H. (2000). Mixture of experts for classification of gender, ethnic origin, and pose of human faces. IEEE Transactions on Neural Networks, 11(4), 948–960. [29] Mostafa, K. and Abd El Meguid, M. D., and Levine. (2005). Fully automated recognition of spontaneous facial expressions in videos using random forest classifiers. IEEE Transactions on Affective Computing, 141–154. [30] George, T., Sumi, P., Potty, S., and Jose. (2014). Smile detection from still images using KNN algorithm. Proc. Int. Conf. Control, Instrumentation, Communication and Computational Technologies, 461–465. [31] Dehariya, V. K. and Shrivastava, S. K. (2010). Clustering of image data set using K-means and Fuzzy K-Means algorithms. Proc. Int. Conf. Computational Intelligence and Communication Networks, 386–391. [32] Patel, M. P., Bn, S., and Shah, V. (2013). Image segmentation using K-mean clustering for finding tumor in medical application. International Journal of Computer Trends and Technology, (5), 1239–1242. [33] Dempster, A. P. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), (1), 1–38. [34] Uma Maheswari, V., Varaprasad, G., and Viswanadha Raju, S. Local directional maximum edge patterns for facial expression recognition. Journal of Ambient Intelligence and Humanized Computing, 2020.
Ammu Anna Mathew, S. Vivekanandan
Chapter 6 Prediction of quality analysis for crop based on machine learning model Abstract: Agriculture plays a vital role in the livelihood and also on the world economy. More than half of the Indian population is dependent on agriculture for their source of revenue. Due to many heterogeneous diseases in plants, there is an immense effect on the production of agriculture and profitable loss. The steady production maintenance of all plants by proper monitoring of plant condition is essential. This chapter deals with a novel approach to identify healthy and unhealthy plants. This is done by creating a machine learning model based on the leaf image of the plant, which predicts the healthiness of a plant. This model thus helps in determining the quality of use of that plant. The fast growth and increase in production of crops can be learned from their leaves. At present, sorting of crops based on the quality is done manually by expertise. By incorporating technology with agriculture, we can improve the production of crops to export quality by proper maintenance. Besides this, the employment opportunities for younger generations can also be increased. Keywords: image processing, leaf image, machine learning model, preprocessing, prediction, validation
6.1 Introduction The main foundation of the Indian economy is agriculture. The basic need for all human beings is food and water. India being a land of diversities in culture and food, varieties of plants are cultivated in different parts of India to keep its promising population adequately fed as shown in Figure 6.1. The issues being faced in this area are decrease in production and challenges posed by liberalization [1]. Majority of Indian population lives in rural regions and they depend on agriculture as their source of revenue. Any trouble in this field will affect the whole population directly or indirectly. Therefore, proper maintenance techniques should be introduced for ensuring the productivity. Indian agribusiness is described by agroecological decent varieties in soil, temperature, rainfall and plantation.
Ammu Anna Mathew, School of Electrical Engineering, Vellore Institute of Technology, Vellore, India, e-mail: [email protected] S. Vivekanandan, School of Electrical Engineering, Vellore Institute of Technology, Vellore, India, e-mail: [email protected] https://doi.org/10.1515/9783110702514-006
98
Ammu Anna Mathew, S. Vivekanandan
Figure 6.1: Various agricultural fields in India.
There are several diseases in plants that show an adverse effect on productivity, and its control is vital to ensure consistent production. The natural plants as well as cultivated plants have inbuilt disease resistance but in some cases it becomes ineffective. Plant diseases are caused due to pathogens, environmental effects and susceptible host as shown in disease triangle in Figure 6.2. The scientific study based on plant diseases called plant pathology is an area of research interest for a longtime. The reasons for plant diseases include fungi, bacteria, virus and parasites which individually is a wide area of study.
Figure 6.2: Disease triangle.
Chapter 6 Prediction of quality analysis for crop based on machine learning model
99
Most of the diseases are not visible with our naked eyes, which lead to destruction in cultivation. Plant diseases that affect production are scab, black rot, rust, healthy, gray leaf spot, leaf blight, black measles, bacterial spot, leaf scorch, etc., as shown in Figure 6.3. The type of disease in a plant depends mainly on the plant variety. Scab and rust disease occurs mostly in ornamental trees such as apple, crabapple and pear. Rust is found in crops as well, for example, wheat.
Figure 6.3: Different plant diseases.
In addition to plant diseases, the traditional method followed also has an effect on the productivity in agriculture. The nonavailability of workers in season, water supply, temperature, soil and so on affects the agriculture. This can be solved to an extent by incorporating technology with agriculture. Time consumption due to manual identification, taxonomy and classification can be overcome with technology [2]. The advancement in the field of computational technology has made things much easier. Automated recognition of plant structure from the available database brought tremendous change in crop management. Machine learning along with image processing has helped in identifying plant diseases, feature extraction and classification [3]. The traditional way of farming has made youngsters to avoid opting agriculture as a profession. By implementing technology with farming will attract many to this
100
Ammu Anna Mathew, S. Vivekanandan
field, thereby increasing the productivity and solution to employment issues. Smart farming has many advantages and it is a field of emerging interest where several studies are undergoing [4]. This led to an upgradation from conventional way in a costeffective and friendly manner. This chapter introduces a supervised trained model to detect plant diseases by combining machine learning and image processing.
6.2 Related works Agriculture is not a smooth occupation and faces lot of daily challenges. Agriculture being the prime sector of food providence, researches for advancements in this field are more [5]. Technology has been implemented in combination with the conventional ways to yield high production. Several technologies have been in trial for incorporation like Internet of things, wireless communications, machine learning, artificial intelligence and deep learning [6]. A domain expert target user is used in WEKA workbench in a paper. This workbench is a combination of seven different machine learning schemes. User interface has the advantage of rapid prototype and portability. Data are extracted from the input image to a 2D plane suitable for the workbench to process in machine learning program. Accurate information about the problem domain is obtained from the extracted and transformed dataset. Once the domains are identified, rule will be set for running the machine learning program to get the required output [7]. SVM (support vector machine) is used to identify the diseases in wheat leaf. Image was captured by a digital camera, and preprocessing is performed on the image to identify the diseased and undiseased leaf. K-means cluster method was done for cluster identification. SVM classifier used two datasets: training and testing. A comparison study with the test data gave diseased and undiseased leaves. Results were in terms of statistical variables: mean, median, mode, standard deviation and variance. This technique was robust and feasible for disease identification in a speedy manner [8]. A dataset of 54,306 images was trained and validated using two different models: AlexNet and GoogleNet. The dataset has 38 class labels with 3 types of datasets: color, grayscale and leaf segmented. Both training from scratch and transfer learning were implemented. The accuracy of the work was in the range of 99.34%. The solver type used is stochastic gradient descent with step learning rate policy. They used five sets of train–test set distribution with different combinations to verify the accuracy. The overall accuracy was assessed in terms of mean precision, mean recall and mean F1 score [9].
Chapter 6 Prediction of quality analysis for crop based on machine learning model
101
6.3 Proposed work The basic process in every image-based problem is image acquisition. Here the image of a leaf is obtained followed by preprocessing of that image. The workflow is represented with a block diagram as shown in Figure 6.4. The preprocessed image undergoes feature extraction. The resultant image is given to a machine learning model, where it is trained and validated using certain training mechanism to obtain the result in terms of accuracy. If the percentage of accuracy level is not satisfactory with the desired value, the preprocessing of image is done again until the desired accuracy level is obtained. The processes are explained in detail further.
Figure 6.4: Block diagram of the disease detection process.
6.3.1 Dataset description and performance A dataset of 2,000 images of plant leaves are analyzed in this work. This is assigned to four class labels based on the strains in the leaf, which include healthy class, multiple disease leaves class, rust leaves class and scab leaves class. Classification based on the strains is done because the chemical misusage leads to the emergence
102
Ammu Anna Mathew, S. Vivekanandan
of pathogen-resistant strains. Different class labels can be assigned for this purpose but for reducing the complexity we have considered only four classes. Each class label represents a plant–disease pair from the uploaded image of plant leaf through which we are trying to predict the disease. A part of the datasets is obtained from online (kaggle: Plant Pathology 2020FGVC7). The model optimization and prediction are done on downscaled images of 224 × 224 pixel size. Different versions of dataset are utilized for obtaining better accuracy results. The other part of datasets is obtained by maintaining similar conditions to an extent to reduce the problem of accuracy. The dataset includes several images of same leaf in various orientations. Color image datasets are used mainly as the models performed best in color when compared with grayscale images [9]. The regularized data collection pattern may lead to inherent bias condition in the dataset which can be avoided by removing the extra background details. The images are preprocessed and feature extraction is done before applying the machine learning algorithm for training and validating the datasets. Almost all the potential bias conditions are removed by performing the necessary steps and verified that the neural network learns the notion of the plant diseases rather than the inherent biases in the dataset [10]. To cross-check the presence of overfitting of approach and to analyze the performance flow of the model to unexpected data, the experiment is run on two train–test set splits for whole range of dataset considered: 90–10 (90% of the entire dataset is used for training and 10% for validation) and 80–20 sets (80% of the entire dataset is used for training and 20% for validation). The mappings for each case are obtained. Since the images of the same leaf in various orientations are considered, all the images of the same leaf should be either in training set or testing set in the train–test set considered. This should be ensured before training the model. The performance of the model is assessed based on the overall accuracy in percentage at regular intervals during the whole training period. We obtain a better accuracy for the 90–10 train–test set among the experimental configurations considered though the change is not drastic. We will be discussing the results of 90–10 train–test set in detail below.
6.3.2 Image acquisition The basic step in every image processing is image acquisition where it retrieves an image from some source for processing it. Image is the 2D projection of a 3D scene in terms of a two-variable function f(x,y) [11]. Image is the digital encoded version of the visual uniqueness of an object, especially its physical view and internal structure. Usually this is done with the help of an Image sensor and the signal thus produced is digitized [12]. Major part of the dataset are obtained online from kaggle: Plant Pathology 2020 dataset and remaining images are obtained using a single
Chapter 6 Prediction of quality analysis for crop based on machine learning model
103
lens reflex digital camera with 1.5× lens focal length with DCF 2.0 file system. All the images were downscaled to 224 × 224 pixel size.
6.3.3 Image preprocessing Certain features of images at lower abstraction level, which need to be processed, are boosted by means of image preprocessing. This operation is done to improve the image data without affecting the content [13]. The undesired effects such as distortions and noise are suppressed. The visual effect of images can be enriched by this process. There are several preprocessing techniques available for different applications [14]. The processes suitable for our particular purpose include canny edge detection, flipping, convolution and blurring.
6.3.3.1 Canny edge detection The edges in an image are identified using edge detection operator called canny edge detector. A multistage algorithm is used for edge detection with the help of calculus of variations. It extracts structural information from objects as well as reduces the amount of data processed thereby providing reliable results. The generalized criteria for edge detection process are: 1. Edge detection with minimum rate of error 2. The operator identified edge point should precisely restrict on the edge center 3. Identified edges in an image should not be remarked and chances of false edge creation due to noise should be avoided There are several adjustable parameters that can affect the efficiency and computation time of the algorithm like Gaussian filter size, lower and higher thresholds in hysteresis. The steps to be followed in the process of Canny Edge Detection algorithm are: a) The noise in the image is removed by applying a Gaussian filter, thereby smoothening the image. b) Identify the intensity gradients of the image. c) False response to edge detection is avoided by applying non-maximum suppression. d) Potential edges are recognized by application of double threshold.
6.3.3.2 Flipping The reversing of image pixels in horizontal or vertical axis is performed in flipping or mirror imaging. This is usually done by performing transform. Flip class or bitmap
104
Ammu Anna Mathew, S. Vivekanandan
transforms property by specifying the flip mode. The parameters to be considered for this operation are image object, direction of flipping, image processing device and debugging device.
6.3.3.3 Convolution A small matrix called a kernel, convolution matrix in image processing performs blurring, sharpening, edge detection and many more operations. A convolution operation is performed between kernel and an image where weighted by the kernel, and each element of the image is added to its local neighbors. This resembles mathematical convolution. Kernel convolution is performed with pixels outside the image limitations. For a symmetric kernel, its origin is placed on the current pixel and the neighboring pixels are overlapped around the origin. The overlapped pixel values are multiplied with each kernel element and the results are summed up. The current pixel value is the resultant sum. For asymmetric kernel, the convolution operation is done as above after performing the flipping process in horizontal and vertical axes. The techniques used for image edge handling are extend, wrap, mirror, crop and kernel crop.
6.3.3.4 Blurring Blurred images are smooth as edges cannot be detected. An image is said to be sharper when all the details in the image are identified precisely and clearly. Shape of an object is the result of its edges. In blurring, the edge contents are reduced and color transition to other smooth occurs. The pixel number remains same for the input and blurred image. A low pass filter n(LPF) is used for this purpose as it permits low change of pixel value and blocks pixel value with high changes. Pixel values changes rapidly around edges and need to be filtered out. The smoothening effect can be improved by choosing a bigger kernel size. The LPF used for blurring includes mean filter, weighted average filter and Gaussian filter.
6.3.4 Feature extraction An image is characterized based on the visual features like shape, color and texture. These features are represented by several feature descriptors. The features and descriptors are compared to database images for ranking based on the query. Extraction process transforms rich image contents to several content features that can be used for selection and classification purposes. Features suitable for a discrimination process are selected and others are discarded. These selected features can affect the
Chapter 6 Prediction of quality analysis for crop based on machine learning model
105
efficacy of the classification task. The feature vectors obtained after feature extraction constitute an image representation and are stored in the feature database. Depending on the query, one by one comparison is done from the database and images with similar features are obtained. The application-independent features like shape, color and texture can be classified further like pixel-level features, local features, global features and domain-specific features. Once the feature is extracted, appropriate choice for classifier should be made based on the characteristics. The nearest neighbor classifier is utilized, which compares the feature vector of the model with image feature vectors stored in the database. The distance between two determines the feature vector. In this work, pixel-level features and local features are considered for feature extraction [15].
6.3.5 Machine learning approach The deep convolutional neural network is applied for the evaluation of the above classification problem. The convolutional networks have been proved efficient and accurate for training the datasets when the connections between layers are shorter near input as well as output. The architecture used is DenseNet for training and validation of the dataset. This architecture is an extension of ResNet. Here, each layer is connected to another layer through dense blocks in a feed-forward pattern as shown in Figure 6.5 [16]. Feature reuse is highly promoted by this architecture as every layer is able to access the feature maps from its previous layer [17]. The feature maps of the current layer will be used as input by the next layer. For concatenation, feature map should be consistent in size [18]. Hence, the input and output convolutional layers should be of same size. Down-sampling of layers can also be done to fit to the required size. There are L(L + 1)/2 direct connections in this model. The number of parameters required in this case will be less compared to other architectures such as ResNet and AlexNet due to the absence of relearning unnecessary feature maps. Other advantages include compactness of model, alleviation of vanishing gradient issue and a model with minimum overfitting issues. The direct connection provides direct deep supervision for each layer and avoids the loss function that may happen in shortcut path. Maximum information flow is ensured by this architecture [19]. Explicit differentiation between preserved information and added information in network is made in this model [20]. Though it has 12 filters per layer, layers are very narrow and adds small set of feature maps to network knowledge. Each layer of network implements a nonlinear transformation which is a composite function performing three consecutive operations: batch normalization, rectified linear unit and 3 × 3 convolution. For aiding down-sampling, the network is divided into multiple dense blocks. A transition layer with a batch normalization layer that exists between each block performs convolution and pooling. The size of convolution and
106
Ammu Anna Mathew, S. Vivekanandan
pooling layer is 1 × 1 and 2 × 2, respectively. The growth rate determines the amount of new information contributed by each layer to the global state which is accessible from any point within the network. This model has high computational efficiency. The DenseNet model used in this work has four five-layer dense blocks with input image size 224 × 224 (Figure 6.5). The softmax classifier makes decision based on all the feature map in the network. The performance of the architecture is analyzed on the dataset by training the model from scratch. There is no learning limitation for any layers while model is trained. Deep learning algorithm is used for training the dataset [21]. The features in images are identified by it using this algorithm.
Figure 6.5: DenseNet architecture with a five-layer dense block.
To summarize, the various parameters to be considered in the configurations are: a) Choice of deep learning architecture: DenseNet b) Choice of training mechanism: training from scratch c) Choice of training–testing set distribution: train 90%, test 10%, and train 80%, test 20%. Each of the experiments runs for a total of 20 epochs. This value, based on empirical observations, has been selected due to the convergence of learning as all the experiments were within this value. Each epoch represents the number of training iterations
Chapter 6 Prediction of quality analysis for crop based on machine learning model
107
done by the neural network to cover the whole training set completely. The following standardizations have been done for all experiments on the following hyperparameters: – Solver type: Adam – Base learning rate: 0.001 – Learning rate policy: step – Momentum: 0.99 – Weight decay: 1e-4 – Gamma: 0.1 – Batch size: 64
6.4 Results and discussion Dataset considered has 2,000 samples which have 4 class labels: 0, 1, 2, 3. The index representation for each class is as follows: a) Index 0: healthy leaf b) Index 1: leaf with multiple diseases c) Index 2: leaf with rust d) Index 3: leaf with scab Several class labels with different leaves can be considered for this application like scab, black rot, rust, healthy, gray leaf spot, leaf blight, black measles, bacterial spot and leaf scorch but in this chapter we considered only four class labels mentioned above to reduce the complexity and training and validation time. Two training–testing set distributions were considered: 90–10% and 80–20% to address the issue of overfitting. The former showed better results compared to latter (not a drastic change); hence, the results related to 90–10% training–testing set are discussed. Out of 2,000 samples, 1,800 samples were considered for training the model and 200 samples were considered for testing/validation process. The random guessing overall accuracy on an average is very less. The overall accuracy of experimental configuration with the dataset is in the range of 97.99–99.97%. This results show that deep learning approach can be opted for similar studies. The experimental configuration is run for 20 epochs each and shows a convergence after the initial step down in the learning rate. While addressing the overfitting issue with different configurations, it confirmed that overfitting is not contributor to the results obtained during the experiment as there is no deviation between validation and training losses. The model was tested to check its ability to learn some of the structural patterns usual to particular diseases. The output was reliable and accurate on the dataset as it is collected in a regularized manner.
108
Ammu Anna Mathew, S. Vivekanandan
Figure 6.6: (A) Program for image loading and (B) display of image loading.
Mainly four different programs are performed to analyze the disease present in a particular plant (here only four class labels). There is a screen displayed where the image of the leaf needs to be uploaded as in Figure 6.6. Initially the program is executed to preprocess the loaded image. Image preprocessing is performed on the loaded image and the output is given for feature extraction. This output is giving for training the dataset and finally the prediction is done in terms of accuracy. The output is analyzed based on four index values mentioned above. For every image, four index values will be displayed out of which the highest value will be selected and displayed as the output. If index 0 displays high value among others, then the result is the healthy plant as shown in Figure 6.7. If the high value is for
Chapter 6 Prediction of quality analysis for crop based on machine learning model
109
Figure 6.7: (A) Program screen with index 0 with value 0.9745 and (B) display of healthy leaf.
index value 1, then the leaf will have multiple diseases as in Figure 6.8. If index 2 shows a higher value, its rust shown in Figure 6.9 is obtained, and if index 3 shows a higher value, its scab is obtained as shown in Figure 6.10. The predefined features determine the performance of this approach. Supervised training with deep learning algorithm using deep convolutional neural network architecture is performed in this work. This work can be extended with more number of class labels, thereby increasing its reliability and acceptance. Our goal was to classify the presence and identity of disease on images by training a model on images of plant leaves using deep convolutional neural networks (CNN) architecture. This goal is achieved with an accuracy rate of around 98.98%. The model clearly classifies the disease from 4 possible classes in the 1,800 images. Training of the model is timeconsuming but classification is fast and hence this technique can be implemented in smartphones. There still exist some limitations in the model training at the current
110
Ammu Anna Mathew, S. Vivekanandan
Figure 6.8: (A) Program screen with index 1 with value 0.9519 and (B) display of multiple disease leaf.
stage, which needs to be addressed in the future like training of images captured in different conditions and orientations, training with increased number of class labels, etc. From practical perspective, the complexity level of the problem is reduced to an extent by considering less number of class labels. Overall the proposed approach is reasonable and accurate with many plant diseases and can be improved with training more data.
Chapter 6 Prediction of quality analysis for crop based on machine learning model
111
Figure 6.9: (A) Program screen with index 2 having value 0.9996 and (B) display of rust.
6.5 Conclusion The proper timely maintenance of plants will improve the crop production quality and quantity-wise, thereby improving the economy in agriculture. Implementation of technology to this field increases the job opportunities as well. Agriculture is an area with job security as human needs food for survival. Incorporation of technology to agriculture is a field of research with various scopes. Deep CNN can be utilized for real-world applications. The proposed work is a promising approach which utilizes the computational interference in plant diseases. Many heterogeneous diseases in plants have brought adverse effect on agriculture. Manual disease identification on large fields is practically impossible. The proposed work is a solution to an extent for this issue. Due to the promptness in classification, this technique can be implemented
112
Ammu Anna Mathew, S. Vivekanandan
Figure 6.10: (A) Program screen with index 3 with value 0.9999 and (B) display of scab.
in smartphones. This extends a path toward smartphone-assisted agriculture, especially in disease diagnosis on a global level.
References [1] [2] [3]
Rai, C. “Indian Agriculture & WTO,” no. February, pp. 1–39, 2009, [Online]. Available: https:// www.academia.edu/13965470/WTO. Kaur, K. (2016). Machine learning: Applications in Indian agriculture. 5(4), 342–344. Doi: 10.17148/IJARCCE.2016.5487. Ouf, N. S. (2018). A review on the relevant applications of machine learning in agriculture. 6(8), 1–17. Doi: 10.17148/IJIREEICE.2018.681.
Chapter 6 Prediction of quality analysis for crop based on machine learning model
[4] [5] [6]
[7]
[8] [9]
[10]
[11] [12] [13] [14] [15] [16]
[17] [18]
[19] [20] [21]
113
Liakos, K. G., Busato, P., Moshou, D., and Pearson, S., “Machine Learning in Agriculture: A Review,” no. Ml, pp. 1–29, Doi: 10.3390/s18082674. Computing, M., Dixit, A., and Nema, S. (2018). Wheat leaf disease detection using machine learning method- a review. 7(5), 124–129. Jha, K., Doshi, A., Patel, P., and Shah, M. (2019). Artificial intelligence in agriculture a comprehensive review on automation in agriculture using artificial intelligence. Artificial Intelligence in Agriculture, 2, 1–12. Doi: 10.1016/j.aiia.2019.05.004. Mcqueen, R. J., Gamer, S. R., Nevill-Manning, C. G., and Witten, I. H. (1995). Computers and electronics in agriculture applying machine learning to agricultural data. Computers and Electronics in Agriculture, 12, 275–293. Nema, S., Nema, S., and Dixit, A. “Wheat Leaf Detection and Prevention Using Support Vector Machine,” pp. 2294–2298, 2019. Mohanty, S. P., Hughes, D. P., and Salathé, M. (2016, September). Using deep learning for image-based plant disease detection. Frontiers in Plant Science, 7, 1–10. Doi: 10.3389/ fpls.2016.01419. Boulent, J., Foucher, S., Théau, J., and St-Charles, P. L. (2019, July). Convolutional neural networks for the automatic identification of plant diseases. Frontiers in Plant Science, 10. Doi: 10.3389/fpls.2019.00941. Sensing, I. “UNIT – 2 Image Sensing and Acquisition,” no. i. Series, L. and Processing, I., “Introduction to Signal and Image Processing Steps in Image Processing.” Beebe, K. R., Pell, R. J., and Seasholtz, M. B. (1998). Chapter 3: Preprocessing. Chemometrics A Practice Guides, 41–58. Blitzer, H., Stein-Ferguson, K., and Huang, J. (2008). Image processing tools. Understanding Forensic Digital Imaging, 32, 169–205. Doi: 10.1016/b978-0-12-370451-1.00010-x. Choras, R. S. (2007). Image feature extraction techniques and their applications for CBIR and biometrics systems. International Journal of Biology and Biomedical Engineering, 1(1), 6–15. Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. “Densely connected convolutional networks,” Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 2261–2269, 2017, Doi: 10.1109/CVPR.2017.243. Zhu, Y. “DenseNet for dense flow,” no. September 2017, 2019, Doi: 10.1109/ ICIP.2017.8296389. Zhang, J., Lu, C., Li, X., Kim, H. J., and Wang, J. (2019). A full convolutional network based on DenseNet for remote sensing scene classification. Mathematical Biosciences and Engineering, 16(5), 3345–3367. Doi: 10.3934/mbe.2019167. Tao, Y. (2018). DenseNet-based depth-width double reinforced deep learning neural network for high-resolution remote sensing image per-pixel classification. Doi: 10.3390/rs10050779. Kamilaris, A. and Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: A survey. Computers and Electronics in Agriculture, 147, 70–90. Doi: 10.1016/j.compag.2018.02.016. Czum, J. M. (2020). Dive into deep learning. Journal of the American College of Radiology, 17(5), 637–638. Doi: 10.1016/j.jacr.2020.02.005.
Mehul Mahrishi, Girish Sharma, Sudha Morwal, Vipin Jain, Mukesh Kalla
Chapter 7 Data model recommendations for real-time machine learning applications: a suggestive approach Abstract: Machine learning (ML) applications have received much coverage in today’s marketplace. From automated business strategies to educational canvases to sports analytic, these systems are included everywhere. There are a bunch of ML systems that have made life easy for company administration. These applications include market segmentation, optimize pricing, suggest treatment to the patients and job recommendations. Apart from this, ML has automated churn prediction, text analysis and summarization. There are many applications that ML has simplified in terms of understanding and feasibility. ML has influenced the implementation differently but it has also changed storytelling through visualization tools. Now we live in the era of prescriptive analytics, and ML has helped develop such applications. This chapter explores the impact of ML in business development, application development through various models and practicability. One particular model may fit well for an application but it may not be suitable for other applications, and this certainly depends on the dataset and what we want to predict. This chapter attempts to clarify many of the ML models and their particular implementations. Various researchers have indicated that a particular ML paradigm performs well for a specific program. This chapter explains the ML models and their respective applications, that is, where a particular model fits in a much better way compared to others. This chapter claims that ML makes it possible to improve the reasoning process by using inductive, abductive, neural networks and genetic algorithms. Keywords: artificial intelligence, machine learning, supervised learning, unsupervised learning, machine learning models, artificial neural networks
Mehul Mahrishi, Department of Information Technology, Swami Keshvanand Institute of Technology, Management and Gramothan, Jaipur, India, e-mail: [email protected] Girish Sharma, Department of Computer Science & Engineering, Swami Keshvanand Institute of Technology, Management and Gramothan, Jaipur, India, e-mail: [email protected] Sudha Morwal, Banasthali Vidhyapith, Niwai, India, e-mail: [email protected] Vipin Jain, Department of Information Technology, Swami Keshvanand Institute of Technology, Management and Gramothan, Jaipur, India, e-mail: [email protected] Mukesh Kalla, Department of Computer Science and Engineering, Sir Padampat Singhania University, Udaipur, India, e-mail: [email protected] https://doi.org/10.1515/9783110702514-007
116
Mehul Mahrishi et al.
7.1 Introduction Machine learning (ML) is a part of computerized reasoning which fundamentally comprises algorithms and artificial neural networks and displays qualities firmly connected with human insight. While building and concentrating on the framework, the principal concern is to gain information without being unequivocally customized. Artificial intelligence (AI) centers around improving PC programs that can encourage themselves to adjust when presented with new information. ML has been one of the biggest buzz words for more than a decade. We started with a descriptive-analytic long back, and now we have reached prescriptive analytic, that is, we started with data to information, information to knowledge, insight knowledge and finally gaining from the insight. We can define this from Table 7.1: Table 7.1: Traditional data mining to machine learning. Descriptive analytics
Summarizing the data (data to Information)
Diagnostic analytics
Why does a particular event happen?
Predictive analytics
Future predictions (inventory to keep)
Prescriptive analytics
Gaining something from that insight (to derive profit, what could be recommended)
The predictive analytic suggests creating a model, and the model is created by gathering data, finding features, training the model and evaluating the model through test data. We have reached very far from the traditional data mining techniques, where the objective was to gather only some small amount of information. Nowadays, we are not only able to predict the future but also able to prescribe the bestoptimized solutions for a particular system. Figure 7.1 shows how ML differs from traditional programming. In traditional computing, we give input and generate outputs by using certain algorithms, but in ML we give input and output and generate outputs for new data.
Input Computer Systems Output Figure 7.1: Machine learning concept.
Programmed Solution
Chapter 7 Data model recommendations for real-time ML applications
117
While answering a problem, one needs to remember that ML is not always the appropriate remedy. Without knowing the subtleties of the problem, whether it fits in the AI world or not, one can create more complications than solutions. Therefore, the learning algorithms must be based on legitimate mathematical models and must be reinforced by numerical, factual and intelligent algorithmic approaches. For any ML problem, the significant part is data. If the data is well shaped, the models could furnish results with higher accuracy. Data gathering is an extremely big challenge but it is out of the scope of this content. Since we have various data types for various problems, identifying the correct model for a problem will lead to a robust solution. In this world, we have numerical data, document data, transaction data, graph data, video data, temporal data, spatial data and whatnot. One of this chapter’s destinations is to give the right model to be utilized for a specific sort of data science problem. Statistics in ML have given much significance to understand data, visualize it and mutate it to lead to good models. Many ML techniques have been developed, which have given good results over different kinds of data. Supervised learning, which primarily deals with the labelled data, useful in binary or multiclass classification. Another mechanism is regression analysis, where the class variable is continuous, and we want to predict a number instead of a label. This kind of analysis is useful in applications such as weather prediction and stock price prediction. This chapter discusses the different types of supervised ML techniques and their practical utilities.
7.2 Types of machine learning ML is categorized as supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning (see Figure 7.2).
118
Mehul Mahrishi et al.
Clustering Meaningful Compression Big Data DimenVisualization sionality Structure Reduction Discovery Feature Elicitation
Recommender Image Classification Systems Targetted Classifica- Identity Fraud Marketing Customer Detection Customer tion Segmentation Retention Diagnostics
Supervised Learning
Unsupervised Learning
Regression
Machine Learning
Speech Analysis Web Content Classification
Semisupervised Learning
Advertising Popularity Prediction Weather Forecasting Market Forecasting Estimating Life Expentency Population Growth Prediction
Real-Time Decisions Reinforcement Game AI Robotics Learning Skill Acquisition Learning Task
Figure 7.2: Types of machine learning.
7.3 Decision tree for image classification Mathematically, we can define decision trees as given a dataset (x, (f(x)), where f(x) represents the label, the decision tree algorithm returns a function (hypothesis) that approximates function (f) for mapping inputs to outputs. The decision tree concept is based on Hunt’s algorithm (Figure 7.3). This simple algorithm uses the measures of impurity to check whether a node needs further split.
Splitting Attributes ID 1 2 3 4 5 6 7 8 9 10
Refund Marital Status Taxable Income Cheat Yes No Single 125K No No Married 100K Single No No 70K No Yes Married 120K Yes No 95K Divorced No No Married 60K No Yes Divorced 220K Single Yes No 85K No No Married 75K Single Yes No 90K
Refund Yes
No MarSt
NO
Single, Divorced TaxInc < 80K NO
NO > 80K YES
Model: Decision Tree Figure 7.3: A decision tree.
Married
Chapter 7 Data model recommendations for real-time ML applications
119
Algorithm 1: Algorithm for decision tree D 1. 2.
Procedure A: If D contains instances that belong to the same class, same class y, mark “t” as a leaf node labeled as yt. Procedure B: If D contains instances that belong to more than one class, use a feature to split data into smaller subsets.
Few questions that are to be addressed while creating decision tree are as follows: – Which feature will be at the root node? – What are the most exact features to fill in as internal nodes or leaf nodes? – How to divide a tree? – What is a measure of exactness of division? Node impurity is calculated in decision tree, which is the measure of uniformity at each node, that is, it provides the split criteria on a variable by which data could be divided. There are three types of measures of node impurity: – Gini index (Figure 7.4) – Information gain – Gain ratio – Entropy
Figure 7.4: Computation of the GINI index.
7.3.1 GINI impurity measure GINI impurity measure is defined as follows: X GinitðtÞ = 1 − jjpðjjtÞ j2
(7:1)
120
Mehul Mahrishi et al.
where p(j|t) is the relative frequency of class j at node t. Maximum value: (1–1/nc) when records are equally distributed among all the classes. Minimum value: 0 when all records belong to one class. Here C1, C2 represent the two classes, and the number shows the associated number of instances for that class.
7.4 Naive Bayes classifier for text classification Initially used for entertainment, videos are now a major communication source for social, educational and business activities. Deriving and classifying text from videos is another research area. Google’s Tesseract is one of the best known Optical Character Recognition (OCR) tools available today. For text classification, the naive Bayes classifier is one of the prominent supervised classifiers used for a long time. Some of the applications are spam versus not spam, male versus female author, positive or negative reviews. The classification problem can be defined as follows: Algorithm 2: Text classification using naive Bayes Input: A document d A fixed set of classes C=C1, C2, C3 . . . Cj A training set of n labeled documents (d1, C1), (d2,C2) . . . (dn, Cn) Output: A predicted class for the document: d–Ci
To understand Bayes classifier, we represent the words as per their frequency in the document. This frequency is given to the classifier, and the Bayes rule is applied to get the document’s class.
7.5 Support vector machine (SVM) as a diagnostic tool for Alzheimer’s patient In the traditional healthcare system, specialists recommend the drugs utilizing the experimentation approach. The reactions to drugs change from individual to individual. Subsequently, the improved social insurance framework ought to show restraint arranged like personalized healthcare system (PHS). The PHS incorporates electronic health records, the Internet of Things, web-based information and web-based life [1]. AI techniques are applied in PHS for tasks such as collecting information of patients and diseases, disease progression and prediction, and self-management [2]. ML methods help create analytical models that improve decisive capabilities [3]. The models will monitor the patient by gathering health
Chapter 7 Data model recommendations for real-time ML applications
121
data from numerous sensor gadgets and identify the relative patterns. This chapter talks about the patterns in perspective on PHS. For example, medical images, attractive reverberation imaging and processed tomography are significant methods to analyze human sicknesses viably. The traditional method is to perform a manual study of brain tumors based on the image assessment performed by a radiologist/doctor, which can lead to an incorrect classification when analyzing a large amount of magnetic resonance imaging (MRI). One of the main sources for human loss is brain tumors. If tumors are identified right near the initial stage, the probability of endurance can be enhanced. MRI is important for the identification of Alzheimer’s disease (AD). Neurologists make a diagnosis through image analysis or signal examination. Experts currently use clinical applications that help them make decisions. This type of application uses computational cleverness. These applications show high precision, whether a patient is well or has any disease. Presently, several computational cleverness technologies are exploited for the classification of AD. MRI technology is used in learning a person’s mind. In this process, a classification technology found that the support vector machine (SVM) is projected or functional to classify mind images. Therefore, convolutional neural networks (CNN) of mean, standard deviation and homogeneity will be used to extract MRI imagery characteristics. The planned process is based on SVM or is diagnosed with MRI as input and classified it as AD or common. Here, we suggested that a diagnostic model of binary AD support the extraction of deep features for classification by MRI [4]. The objective is to classify patients with AD acceptably or not with the disease (Figure 7.5).
Figure 7.5: Diagnosing Alzheimer’s disease using SVM.
122
Mehul Mahrishi et al.
A tumor is the surrendered amplification of malignant cells in a part of the body. Various types of tumors have various qualities and distinctive treatment techniques. Presently, brain tumors are grouped into essential cerebrum tumors or metastatic cerebrum tumors. The first originates in the mind, while the latter originates in cancers in other parts of the body. Segmentation of the brain tumor is one of the key procedures in surgical planning and treatment, and is done by MRI. Brain tumors can be of various sizes and shapes and can appear in different places. Changes in tumor intensity in brain MR Q1 make automatic tumor segmentation extremely challenging. Various intensity-based techniques have been proposed to segment tumors in MRI [5]. The texture is one of the most popular features in classification and recovery of images. A tumor is an anomalous development of tissue. Brain tumors are countless pointless cells that fill in the cerebrum or focal spinal channel. Cerebrum malignant growth can be considered one of the most lethal and unmanageable illnesses. Medical images, such as MRI and computed tomography, are an important way to diagnose human diseases effectively. The traditional method is to perform a manual analysis of brain tumors based on the visual inspection performed by a radiologist/doctor, which can lead to an incorrect classification when analyzing a large amount of MRI. The use of a semiautomatic intelligent classification system is proposed to avoid human error. One of the main causes of human death is brain tumors. If tumors are detected at an early stage, the chances of survival can be increased. MRI technology is used in the study of human brain [6]. In this process, SVM is used as a classifier to detect AD. Different methods can be used for detection to increase detection efficiency, then the best method is selected, and parameters such as precision, sensitivity, specificity and accuracy are controlled. Experimental results show that the texture characteristics extracted by the wave atoms have a better classification rate. A classification technology based on SVM is proposed and applied to brain image classification. Therefore, the characteristics are extracted from MRI images through standard deviations and CNNs. In this process, the SVM classifier’s maximum accuracy is 90% [7].
7.6 Machine learning (ML) in agriculture ML is a pattern of the present innovation, which the advanced agricultural industry employs. ML applications in farming add to healthy seeds [8]. AI methods are utilized in agriculture to expand precision and discover answers for issues. Agriculture plays a vital part in the ten worldwide economies. Because of populace development, there is steady tension on the agricultural framework to improve crop profitability and produce more harvests [9]. ML techniques emerge from the process of learning through experience. These are the methods to perform a specific task. A particular example is defining a set of
Chapter 7 Data model recommendations for real-time ML applications
123
attributes. These sets are known as variables. It represents a feature in binary or numerical, or regular form. Learning performance from performance measurements is calculated on the machine [10]. ML gains experience in performing models over time. It predicts different mathematical, statistical and algorithmic approach specifically designed for enhancing the performance of agriculture. After finishing the learning process, the model can then be used to estimate, sort and test. Learning machine functions can be divided into two categories, namely, supervised and unsupervised learning. Supervised learning: In this approach, the farm machine’s training example is related to the input data. The main purpose of this learning is to create rules for determining the inputs to the respective outputs. The pattern formed is used in supervised learning to predict production, after which the tested data disappear. Unsupervised learning: In this agricultural engineering ML, there is no difference between training models and testing tools when using labeled data. The purpose of this learning method is to find hidden patterns. There is no response variable or output variable defined as supervised learning.
7.6.1 Use of machine learning in agriculture AI is used in many fields, such as home and office, and also in agriculture. A smart and learned machine used in agriculture can dramatically increase crop production and quality in the agricultural field. Retailers: Seed vendors are using this agricultural technology to grow better crops, and the pest management method for the identification of different bacteria, mosquitoes, rodents, etc. AI is used to increase crop yields: AI is used to assess the corners and circumstances that yield the best results. It also decides which environmental conditions will offer the best results. AI bug predators such as Rentokil can detect both insects and pests and can also propose the use of chemicals to kill insects [11].
7.6.2 Prominent applications of ML in agriculture Agricultural robots: Most organizations currently deal with programming and planning robots to deal with the fundamental cultivating errands. These incorporate reaping and working quicker than human specialists. Rural robots are likely an incredible case of a learning machine in farming.
124
Mehul Mahrishi et al.
Culture and floor monitoring: Presently, organizations are inside and out, and can utilize innovation and learning calculations. Information is gathered utilizing drones and another programming to screen development and soil. They use programming to oversee soil fruitfulness [12]. Utilizing agribusiness innovation, ranchers can discover compelling approaches to safeguard their yields and shield them from weeds. Organizations are creating robots and mechanization instruments to accomplish them. The exact measure of splash diminishes herbicides. Reproducers consistently search for strength. They search for characteristics that help them utilize water and supplements more effectively to adjust to environmental change or sickness. If the plant gives the logical outcome, the ideal outcome is to locate the correct quality. The specific grouping of the quality is hard to track down. Use sustenance and variation to environmental change or sickness [13] if the plant needs to give the ideal outcomes to locate the logical privilege gene. ML models used in the agricultural industry: Farmers benefit from learning machine models and innovations. The use of AI and ML is good for food technology departments. Network traders created for social businesses use ML farmers and analytical tools for farmers to drive cost-outcome data. Robots manage to crop and monitor: Sensors help collect crop information. According to the research, AI and ML in agriculture will completely turn around the picture in the coming years. Learning opportunities to grow machine (ML) in digital agriculture: There is an expansion in advanced agribusiness, which utilizes a methodology with certifications to cultivate, diminishing the climate’s effect. Present-day agribusiness depends on different sensors that help see better the way of life, soil and atmosphere conditions, and climate like horticultural machines [14]. These data will assist us with settling on speedy and convenient choices based on the outcomes. To deliver more, we must apply mechanized learning farming information. Different models are trained with the common dataset, and then these models are used for crop yield prediction, and their accuracy of prediction is calculated as per the below details (Figure 7.6): Linear regression – Linear model is created using the Python library and trained with a real-time agriculture dataset. The accuracy of the model is calculated at 63.73%. Random forest regression – This model is based on the concept of decision trees. It creates multiple decision trees to predict the output and finally combine all decision trees to get the final prediction. This model is also trained with the same agriculture dataset, and the accuracy of the model is calculated at 87.08%.
Chapter 7 Data model recommendations for real-time ML applications
Data Collection
Data Preprocessing
Data PostProcessing
Testing Accuracy
125
Data Splitting
Algorithm Fitting
Figure 7.6: Data processing proposed model.
This research predicted various crop productions by ML using the open-source data of previous years’ crops. Using this accurate prediction, we can provide the information about the amount of crop production to be grown to each farmer, with the following factors: – Crop type – Location for crop production like district name – Production area – Production of the crop in the area
7.7 Neural networks (NN) Modern systems are still very far from what humans could do and how they could sense this world. However, we can put some basic capabilities in systems like humans. Artificial neural networks are just the beginning in this regard. We can develop systems like self-driven cars, which can somehow replicate behavior like humans. We can also recognize handwritten characters, speech recognition, image processing, video processing, indexing and a lot more. Neural networks, which are the base of ML, try to make systems behave like humans [15]. A CNN is a form of neural networks that aim at pixel windows instead of one pixel as they add a filter to these windows to create functionality. This move is called a “convolution.” The network uses a considerable amount of data to decide which windows and filters should be used. Some filters may help distinguish edges in the image, and others may identify features such as borderlines, horizontal lines or even more complex artifacts such as eyes or ears. These features allow the possibility that an image of more pixels can be taken, and a reduced number of features can be produced. This method is called a pooling process. The network uses these convolution and pooling features to give us some performance.
126
Mehul Mahrishi et al.
Figure 7.7: Heading detection using custom YoloV4 Object Detection Model.
Chapter 7 Data model recommendations for real-time ML applications
127
7.7.1 YoloV4 darknet object detection model YOLO stands for “You Only Look Once”. When modern accurate models require a large number of GPUs for training with a large minibatch size, without which the training becomes sluggish and impracticable [16]. You only look once (YOLO) tackles this problem by making an object detector trained on a single GPU with a smaller minibatch scale. To achieve high precision, YOLO integrates various functions and a large number of verified features to improve both the classifier and the detector’s accuracy. Through this segment, we want to state the application of YoloV4 as a custom object detector through Google Colaboratory. The major steps include gathering custom datasets, customizing configuration files that are necessary to do this custom training, testing custom object detector and seeing its accuracy in real time. Figure 7.7 represents the detection of heading text on video frames from our dataset.
7.8 Conclusion Through this chapter, the authors tried to suggest machine learning models for various real-time applications. The chapter starts with the introduction and basic machine learning concepts and further discussed two major machine learning models: decision tree for image recognition and naive Bayes for text classification. The latter part of the chapter discusses the real-time applications of machine learning. Analysis of Alzheimer’s disease utilizing support vector machine, machine learning in agriculture and neural networks in automatic heading detection are the three discussed contextual investigations.
References [1]
[2]
[3]
[4]
Tyagi, S. K. S., Mukherjee, A., Pokhrel, S. R., and Hiran, K. K. (2020). An intelligent and optimal resource allocation approach in sensor networks for smart agri-IoT. IEEE Sensors Journal. Lewy, H., Barkan, R., and Sela, T. (2019). Personalized health systems – past, present, and future of research development and implementation in real-life environment. Frontiers in Medicine, 6, 149. Doi: https://doi.org/10.3389/fmed.2019.00149. Vyas, A. K., Dhiman, H., and Hiran, K. K. (2021). Modelling of symmetrical quadrature optical ring resonator with four different topologies and performance analysis using machine learning approach. Journal of Optical Communications, 000010151520200270. Beheshti, I. and Demirel, H. (2015). Probability distribution function-based classification of structural MRI for the detection of Alzheimer’s disease. Computers in Biology and Medicine, 64, 208–216.
128
[5]
[6]
[7]
[8]
[9] [10] [11] [12]
[13] [14] [15]
[16]
Mehul Mahrishi et al.
Anandh, K., Sujatha, C. M., and Ramakrishnan, S. (2014). Analysis of ventricles in Alzheimer MR images using coherence enhancing diffusion filter and level set method. 2014 International Conference on Informatics, Electronics & Vision (ICIEV), 1–4. Bron, E. E., Smits, M. et. al. (2015). Standardized evaluation of algorithms for computer-aided diagnosis of dementia based on structural MRI: The CADDementia challenge. NeuroImage, 111, 562–579. Chaves., R. (2009). SVM-based computer-aided diagnosis of the Alzheimer’s disease using t-test NMSE feature selection with feature correlation weighting. Neuroscience Letters, 293–297. Veenadhari, S., Misra, B., and Singh, C. (2014). Machine learning approach for forecasting crop yield based on climatic parameters. 2014 International Conference on Computer Communication and Informatics, 1–5. Manjula, E. and Djodiltachoumy, S. (2017). A Model for Prediction of Crop Yield. International Journal of Computational Intelligence and Informatics, 298–305. Treboux, J. and Genoud, D. (2019). High Precision Agriculture: An Application Of Improved Machine-Learning Algorithms. 2019 6th Swiss Conference on Data Science (SDS), 103–108. Malpani, P., Bassi, P., and Mahrishi, M. (n.d.). A Novel Framework for Extracting GeoSpatial Information using SPARQL Query & Multiple Header Extraction Sources. Gupta, S., Starr, M. K., Farahani, R. Z., and Matinrad, N. (2016). Disaster management from a POM perspective: Mapping a new domain. Production and Operations Management, 25(10), 1611–1637. Doi: https://doi.org/10.1111/poms.12591. Treboux, J. and Genoud, D. (2018). Improved machine learning methodology for high precision agriculture. 2018 Global Internet of Things Summit (GIoTS), 1–6. Majumdar, J., Naraseeyappa, S., and Ankalaki, S. (2017). Analysis of agriculture data using data mining techniques: Application of big data. Journal of Big Data, 4, 1–15. Mahrishi, M. and Morwal, S. (2020). Index point detection and semantic indexing of videos – a comparative review. In Advances in Intelligent Systems and Computing (Vol. 1154). Springer Singapore. https://doi.org/10.1007/978-981-15-4032-5_94. Mahrishi, M., Hiran, K. K., Meena, G., and Sharma, P. (2020). Machine learning and deep learning in real-time applications. In IGI Global. https://www.igi-global.com/book/machinelearning-deep-learning-real/240152.
Ashok Bhansali, Swati Saxena, Kailash Chandra Bandhu
Chapter 8 Machine learning for sustainable agriculture Abstract: For centuries, agriculture is at the center of the sustainable economy and sustainable world. Along with industries and businesses, farming is also going digital and the data-driven agri-technology and precision farming are the new scientific approaches toward sustainable agriculture to enhance productivity and minimizing environmental bearings. Driven by analytics of historical data, satellite data and real-time sensor inputs, machine learning (ML) plays a pivotal role in developing models for sustainable digital agriculture. This chapter aims at explaining different applications, approaches and algorithms of ML being used in agriculture production system. This chapter covers how ML approaches add sustainability to broad categories of agriculture management that includes managing the soil, water and crop. Effectively and efficiently managing the soil and water in agricultural entails substantial efforts and is very significant for achieving agronomical, hydrological and climatological equilibrium. The evolving artificial intelligence (AI) and analytics driven agriculture systems can apply different algorithms on data from sensors and can help a lot in detecting diseases, identifying weeds, predicting quality of crop and yield along with water management, nutrition management, soil management and so on. It can provide recommendations, remedies and great insights for timely decision and action. The chapter also discusses the most important and popular ML model for different activities of agriculture management and how it helps in sustainability. Of course the adoption of technology, especially AI/ML/Internet of things, in agriculture is slow but the value addition by these technologies to the sustainability of agriculture have drawn the attentions of researchers, governments and agri-business companies to invest in this domain for greater and better tomorrow. Keywords: sustainable agriculture, machine learning, smart farming, agriculture management system, digital agriculture
Ashok Bhansali, O. P. Jindal University, Raigarh, India, e-mail: [email protected] Swati Saxena, ITM vocational University, Vadodara, India Kailash Chandra Bandhu, O. P. Jindal University, Raigarh, India https://doi.org/10.1515/9783110702514-008
130
Ashok Bhansali, Swati Saxena, Kailash Chandra Bandhu
8.1 Introduction Agriculture is the backbone of civilization and plays a pivotal role in maintaining the sustainability of the entire world. Agriculture not just provides healthy food to the people rather it is the foundation of world’s economy and contributes significantly to the global GDP. World population is growing continuously and it is estimated that by 2050, we will need 60–110% more food than is currently being produced (Tilman et al., 2011; Pardey et al., 2014). In order to produce more food and fiber at lower cost, agriculture practices changed dramatically after World War II and started exploiting natural resources in an irresponsible manner. Excessive usage of pesticides and chemicals, topsoil depletion, groundwater contamination, unthoughtful mechanization, greenhouse gas emissions, decline of family farms, industrialization of agriculture and food segment and so on are the main reasons behind fast depleting ecological system and unsustainable socioeconomic life of farmers. It is an alarming scenario and poses a great challenge to the future generations. It is high time to make a strategic shift from the current paradigm of exploiting agricultural productivity to agricultural sustainability (Rockstrom et al., 2017). Sustainable agriculture practices not only focus on enriching agricultural productivity but also help to reduce harmful environmental impacts (Kuyper and Struik, 2014; Adnan et al., 2018). Agriculture life cycle has a colossal footmark on environmental sustainability and leaves huge impacts on entire ecosystem. It adversely affects water shortage, climate change, deterioration of land, denuding of forests and other ecology processes. Traditional agriculture processes affect the environment to a large extent and at the same time being impacted by these changes. Sustainable agriculture redefines the traditional farming using sustainable approaches such that natural resources are not affected adversely for the future use while producing enough for current generation. Indeed, sustainable agricultural practices are critical, both for our present as well as our future generations. Sustainable farming techniques revolve around optimally using natural resources while not harming the environment in any way. The key is to find the right equilibrium between the production of food and the preservation of the ecosystem. Traditional approaches which can produce good crop yield persistently, but disturb the ecological equilibrium is not a sustainable form of agriculture. The prominent challenges for sustainable agriculture include traditional cultivation approaches, inappropriate land and water management and lack of awareness regarding degradation of environment and natural resources. Moreover, the poor government policies on agriculture and agri-marketing, poorly designed support system and services, unstable framework and policies and poor technology penetration among farmers makes sustainable agriculture even difficult. Today’s technologies, especially, machine learning (ML), Internet of things (IoT) and data analytics, are the backbone of digital agriculture and helping the paradigm shift toward sustainable agriculture [1]. Digital agriculture uses data-driven decisions and technologies to improve agriculture productivity and reduce environmental hostile
Chapter 8 Machine learning for sustainable agriculture
131
impacts. There are several factors behind the agricultural sector’s decision to embrace the use of artificial intelligence (AI) and related technologies to improve decision making. At the forefront is the undoubtable increase in the amount of data available, as well as the improvement in ease of accessing the data. This is made possible through developments made within the sector, such as the increased use of sensors, improved rate of access to satellite images, reduced costs of data loggers, increased use of drones and better access to government data repositories. AI-enabled technologies can help agriculture community adapting sustainable ways and means to use resources in an effective and efficient manner to get better production from the land. ML, a subset of AI, is shifting the way of decision-making and technology implementation in agriculture for increased productivity, improved efficiency and better managing the uninvited natural calamities (Saxena, 2020). Agriculture system, equipped with evolutionary ML and analytics, when works on data from field sensors can really help a lot in identifying, detecting and even preventing diseases, weeds, yield prediction, crop quality prediction, water management, nutrition management, soil management and so on. Driven by analytics of historical data, satellite data and real-time sensor inputs, ML plays a pivotal role in developing models for recommendations, remedies and great insights for timely decision and action to make agriculture sustainable. Effectively managing the water, soil and crop life cycle can help a lot improving sustainability of the agriculture. Figure 8.1 illustrates how AI / ML technology can help agriculture to be more sustainable. This chapter covers how different ML algorithms and approaches can be applied to handle each of these broad categories of agriculture management. The chapter also discusses the most important and popular ML model for different activities of agriculture management and how it helps in sustainability.
8.2 Soil management Soil is one of the most important constituent of agriculture ecosystem. There are many different kinds of soil with different features and nutrient values. These different kinds of soil are suitable for different kinds of crops under different environmental circumstances. Management of soil can be better understood if we actually know what the constituents of soil are on which we are trying to cultivate. Soil management constituents can be classified as: 1. Physical classifier: It deals with density of the soil, mineral composition, poresize spread, moisture content and water retention capacity. 2. Chemical classifier: It deals with soil PH, soil organic matter. 3. Nutrient variable: it deals with the non-mineral (carbon, hydrogen, oxygen), macronutrient (nitrogen, phosphorous, potassium, calcium, magnesium, sulfur), micronutrient (iron, manganese, molybdenum, chlorine, zinc, copper, boron, nickel).
132
Ashok Bhansali, Swati Saxena, Kailash Chandra Bandhu
Figure 8.1: Transitioning toward sustainable agriculture.
Chapter 8 Machine learning for sustainable agriculture
133
All the above variables play a vital role for preparing the soil for agriculture. The major issue being faced for years is depletion of soil nutrient which leads toward the disruption of natural ecosystem. Preservation of this natural resource is very important for ensuring that the future population are well and fed healthy [3]. Applying ML algorithms on different soil properties and figuring out the appropriate ratio relationship between them could be crucial for preserving soil properties. It includes properties that directly or indirectly impact the growth and yield of the crops such as organic matter, essential micronutrient, plant nutrients and so on. Apart from the detection and identification of nutrient variables, there are certain other properties which help the soil to be cultivable and most important of them are estimation of soil dryness, temperature condition and moisture content. Different types of soils are suitable for different types of crops and there is always a need to figure out the features and characteristics of different soil types to determine this. Although these parameters can be determined directly but direct measure is challenging, may introduces errors and expensive too. The ML techniques are best suited to this.
8.2.1 Traditional methodology for soil management Traditionally different approaches are adopted by farmers to regain the soil fertility and productivity. Soil farming is one such approach where biodegradable materials and residues are mixed with the soil upper layer which is assimilated by the soil with time. It is a natural rejuvenating processes to revive the soil but the biggest issue with this is that the rate of materialization is very poor. And that is the reason that soil is categorized as a non-recyclable resource and must be preserved with caution for coming centuries [4]. The rate at which soil is created is highly variable under different environmental condition and is too difficult to estimate. Still scientific research suggest that 0.025–0.125 mm of soil is created every year from natural soil farming processes [4–8]. Another most adopted methodology which is generally adopted by the farmers is the crop mutation. In crop mutation process, after every consequent crop yield, different crops are cultivated so that the soil regains its fertility or, in other words, we can say crop mutation is a simple method in which crops are being rotated so that soil do not lose its fertility. What if after applying all the traditional methodology soil does not reform or regain its fertility or even we can say soil has reached its saturation point which marks the soil as unfit for agriculture [9]. Traditional methods for managing the soil are responsible for loss of essential nutrients, soil erosion and making land non-fertile and this leads to unsustainable soil management. Many researchers are working on different methods and technologies to maintain the sustainability the soil. One of the approach on which we are going to throw the light is ML technique for sustaining the soil properties.
134
Ashok Bhansali, Swati Saxena, Kailash Chandra Bandhu
8.2.2 Sustainable soil management by machine learning Sustainability in agriculture can be achieved by ML approaches at different level of their master variables. To meet the above challenges faced, that is, to regain the soil master variable various ML approaches play a vital role to maintain the agriculture sustainability. Different master variable of soil makes use of ML algorithm to learn from past experiences in order to maintain the contributing variables such as soil composition, soil pH, water retention, nutrients and many others. ML approaches help us to determine the actual parametric values of the variable of soil which can be prepared for the cultivation and even how can we hold the fertility of the soil by adding appropriate amount of nutrients to it. Before moving towards the different ML approaches let us try to understand the importance of the dataset. Dataset is the set of parametric values or collection of attributes. Dataset is the key ingredient to the input of ML algorithm. More the data, faster is the rate of training and better the data, learning will be better. Many of the companies, public and private, are explicitly working to record the data and is made available [10]. We can say that dataset holds the significant meteorological variable values of the soil through which we can train the algorithm of the ML to produce the efficient and accurate result. Table 8.1 summarizes the ML techniques for the sustainable soil management.
8.3 Crop management Crop management phase of agriculture marks the beginning with the sowing of the seed, maintaining and nourishing the crop with fertilizers and ends up with the harvesting and distribution of crop. Management of the crop focuses on following subphases: 1. First, it deals with prediction of the crop to be sown with respect to the suitability of the soil. 2. Second, after the identification of the crop, estimate the crop yield based on the land condition on which the crop is cultivated and various other atmospheric parameters. 3. Third, to keep on monitoring and maintaining the crop against the diseases.
8.3.1 Traditional method of crop management Crop management is the core of the agriculture since centuries. Traditional methods adopted by farmers for sowing of the seed are broadcasting (seeds are just spread in the soil), dibbling (seeds are placed in the holes or pits at equal distance), drilling
Chapter 8 Machine learning for sustainable agriculture
135
Table 8.1: Machine learning for sustainable soil management. Paper title
Technique used
Machine learning assessments of soil drying for agricultural planning []
Property
Outcome/conclusion
Classification Meteorological data tree, K-nearest neighbor
Estimation of soil dryness
Trained the model through classification tree, K-NN and boosted perceptron to observe the soil dryness and accuracy attained by them is –%.
Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectroscopy []
Least squares Field samples: wet samples support were collected vector machines (LS-SVM) Cubist.
Estimation of Soil total Nitrogen (TN) Organic Carbon(OC) Moisture Content(MC)
Compared predictive performance of two machine learning algorithms and two linear multivariate for TN, OC and MC. Prediction observed LS-SVM for OC: RMSEP = .% RPD = . LS-SVM for MC RMSEP = .% RPD = . Cubist for TN RMSEP = . and RPD = ..
Soil moisture modeling based on stochastic behavior of forces on a no-till chisel opener. []
Artificial neural network (ANN) multiple linear regression model
Estimation of soil moisture
Results by analyzing that autoregressive error function parametric values when combined with the ML computational Models are best suited for the calculation of soil moisture content.
Extreme Using self-adaptive evolutionary algorithm learning machine to improve the performance of an extreme learning machine for estimating soil temperature []
Data collection
Dataset of forces acting on a chisel and speed
Soil Dataset was temperature collected from two Iranian station estimation of Bandar Abbas and Kerman
Temperature in two different in climatic conditions for six different depths – , , , , , and cm, Iranian regions; Bandar Abbas and Kerman. Different temperature of soil was observed.
136
Ashok Bhansali, Swati Saxena, Kailash Chandra Bandhu
(placing and covering the crop with soil), sowing behind country plough (seeds are placed into furrows ploughed in the field), planting (placing the seeds in the soil for germination) and transplanting. Using these techniques farmers always struggle to meet the estimated yield to be produced. Crop yield prediction is the decision-making subphase of the crop management which bring profits and gains. Yield prediction depends on the soil as well as on the various atmospheric conditions and weather. Apart from the sowing and yield prediction of the crop, nourishing and maintaining of the crop is also done manually which is not efficient. Traditionally botanist and farmers identify the species and the diseases present in the plant by analyzing the leaves of the plants. Many a times crops suffer from the disease but the farmers are unable to identify and detect the disease and at certain set of point if they identify the crop disease, they are unable to take appropriate measures for the removal of the disease. Traditional technique used for agro-chemical are not efficient as they relied on broadcasting. Broadcasting of pest and chemical on the land without any proper direction often leads to polluting the water and environments. Conventional methods often make use of equipment, which are bulky and of high cost, for infrared temperature, spectral reflection and chlorophyll for drought detection. So, to meet the above challenges of sustaining the different phases of crop management various researchers are working upon ML and data-driven technologies to make it more sustainable. In this section of crop management, we mainly focus on crop yield prediction and disease detection.
8.3.2 Sustainable crop management by machine learning 8.3.2.1 Crop yield prediction Crop yield prediction is the vital phase of sustainable agriculture as it a decisionmaking phase for the productivity of the crop which decides the profits/loss. In other words, we can say that crop yield production is the measurement of the food grains. With the help of ML approaches, various models have been developed which take the input as variables of soils, weather conditions, crop name, pest details and water level and perform analysis with the output stating the crop yield. Table 8.2 summarizes the ML techniques for the sustainable crop yield prediction.
8.3.2.2 Disease detection and identification Disease and pest in the crop are the factors which quite often leads to loss of production, and there is a need to retain it. Pesticide often leads in retaining productivity of the crop but simultaneously degrades the quality of it. Therefore, there is a
Chapter 8 Machine learning for sustainable agriculture
137
Table 8.2: Machine learning for sustainable crop yield prediction. Paper title
Technique used
Data collection Crop
Outcome/conclusion
Wheat yield prediction using machine learning and advanced sensing techniques []
Counterpropagation artificial neural networks (CP-ANNs), (XY-Fs) supervised Kohonen networks (SKNs), XY-fused networks
Sensors were deployed to collect the data and Satellite images
Wheat
Predicted wheat yield using multilayer soil data and satellite images. Achieved the accuracy of .% SKN, .% for CP-ANN and .% for XY-F.
Machine learning methods for crop yield prediction and climate change impact assessment in agriculture []
Artificial neural network
Dataset of corm from US Mideast
Corn
Predicted the yields of the corn using the classical statistical methods and fully nonparametric neural network.
Identification and determination of the number of immature green citrus fruit in a canopy under different ambient light conditions []
Support vector machine (SVM)
Field images
Citrus fruit Tamura texture feature along support vector machine was used to detect immature fruits in tree canopy under natural outdoor condition.
Support vector machine-based open crop model (SBOCM): Case of rice production in China []
Support vector machine (SVM)
Soil database by Chinese academy of sciences
Rice
Modeling managed grassland biomass estimation by using multitemporal remote sensing data – a machine learning approach []
Multiple linear regression (MLR), artificial neural network (ANN) and adaptive neurofuzzy inference system (ANFIS)
Satellite remote sensing data
Vegetation MLR, ANN and ANIF models were developed to predict the estimated grassland biomass.
Prediction of the rice by using the support vector machine. An open crop model was developed and observed that accuracy got improved.
138
Ashok Bhansali, Swati Saxena, Kailash Chandra Bandhu
necessity of an efficient way of detecting and identifying the crop disease and pest required. ML approaches have played important role in building up of the sustainable agriculture. Table 8.3 summarizes the ML techniques for the disease detection and identification. Table 8.3: Machine learning for disease detection and identification. Paper title
Key points
Detection and classification of rice diseases: an automated approach using textural features []
–
–
– Recognition of diseases in paddy leaves using KNN classifier []
–
– – Automated lesion segmentation from rice leaf blast field images based on random forest []
– – –
A review of visual descriptor and classification technique used in leaf species identification []
– –
–
Combination of machine learning algorithm, that is, SVM with textural features was used for detection and classification of rice plant disease. Aims to detect brown spot, bacterial leaf, blight, false smut from the collected field samples. SVM parameter gamma and nu were optimized and accuracy of 94.16% achieved. Aims to recognize only fungal diseases, that is, rice blast, brown spot present in paddy leaves with the KNN approach of machine learning. RGB images of field sample were converted into HSV color images for segmentation. The accuracy achieved is 76.59% but was limited to fungal diseases. SLIC super pixel algorithm was used for segmentation of collected images. 32 regional features were extracted from each super pixel. Automatic lesions are extracted from rice leaf blast images with complex background with the help of random forest classifier. It aims to detect and classify the species of plants from the leaf images. Machine learning approaches ANN, KNN and SVM were applied to the features and visual descriptors of collected leaf species. Efficiency attainable was more as compared to other approaches as no single method is efficient to identify species.
Chapter 8 Machine learning for sustainable agriculture
139
Table 8.3 (continued) Paper title
Key points
Detection of maize drought based on texture and morphological features []
–
– –
Crop disease classification using texture analysis []
–
– A lightweight mobile system for crop disease diagnosis []
–
–
Traditional methods often make use of equipment for spectral reflection, infrared temperature and chlorophyll for drought detection which were bulky and of high cost. Due to which it was not widely used. This automated system for drought detection was build using SVM. For the feature extraction, tamura texture and grey level co-occurrence methods was used. A comparative analysis was done on machine learning approaches such as multinomial logistic regression, KNN, SVM and naive Bayes to achieve maximum of sunflower disease. Result stated that a multinomial logistic regression is the best method. A standalone application was built for diagnosis of six wheat diseases, that is, brown rust, yellow rust, septoria, powdering mildew, tan spots on the support vector machine. The accuracy attainable by this application was approximately 88%.
8.4 Irrigation water management Water has marked the importance in human’s life but it is as important for life cycle of crops as well. Along with soil, water is an important natural resource which needs to be preserved along with its nutrients property. Before moving forward, let us understand the difference between two contradictory terminologies, that is irrigation management and irrigation water management. Irrigation management basically deals with artificial ways of distributing water to the crop fields where there is a scarcity of the water due to insufficient rainfall or any other reasons. Whereas irrigation water management aims at supplying the water to the fields regularly as per timing, in order to satisfy the water requirement of the crop without waste of water, energy and soil nutrient. Irrigation water management is considered as the nourishing phase of agriculture by the water. Therefore, the main aim of the irrigation is to supply the crop with sufficient amount of water to produce good yield and good harvested crop.
140
Ashok Bhansali, Swati Saxena, Kailash Chandra Bandhu
Irrigating the land with necessary quantity of water relies on the climatic condition, the type of soil, type of crop sown and how much the water can be retained by the roots [27]. Crops require water for evapotranspiration process. Evapotranspiration process is a combination of evaporation and transpiration which helps in the growth of the plant [28]. Managing water in the field requires a lot of efforts and helps in maintaining the hydrological and climatologically [29].
8.4.1 Traditional ways of irrigation For ages, farmers have been dependent on the rainfall for irrigating the field with water. So traditionally, water needed to irrigate the field is depended on the rainfall pattern and terrain of the region. Based on these two (rainfall pattern and terrain of the region), the design and structure of the traditional irrigation methods are decided [30]. It was observed that on the basis of the land scaling, the techniques for the irrigation is adopted. For instance, the conventional techniques for large-scale land have been divided into three categories, that is, basin (land is surrounded by the embankments to form the basin so that water can be filled in it), border (rectangular land is divided into strips with water following between strips) and furrow (similar as border method with the slighter variation lying at the division of channels by water) which were further divided into subcategories; for small scale; and others generally included the conservation of water in ponds, tanks, well, mist and pots [30].
8.4.2 Sustainable irrigation water management by machine learning Water is the most nourishing element for the field that plays a vital role in maintaining the evapotranspiration, hydrological cycle and climatologically cycle. Traditional methods of water irrigation always lead to the waste of water and because of the scarcity of it, we cannot adopt the same. There is an urgent need of evolving and practicing the irrigation system which employs water efficient methods to achieve the sustainability. Sustainable irrigation water management can be achieved by training the ML algorithm on various properties such as estimation of evaporation and transpiration on daily/ weekly/ monthly basis, dew point temperature prediction, humidity, maximum and minimum temperature and so on. Various models have been built for the estimation of maximum and minimum temperature, humidity and various factors which are associated with the hydrological cycle and evapotranspiration. Table 8.4 summarizes the ML techniques for the irrigation water management.
Chapter 8 Machine learning for sustainable agriculture
141
Table 8.4: Machine learning for irrigation water management. Title of paper
Key points
Estimation of monthly reference evapotranspiration using novel hybrid machine learning approaches. []
–
– –
– – An IoT-based smart irrigation management system using machine learning and open source technologies. []
–
–
– Irrigation runoff volume – prediction using machine learning algorithms. [] – –
– Machine learning based soil moisture prediction for Internet of things based smart irrigation system. – [] –
–
Estimation of monthly reference evapotranspiration were driven by five Machine learning algorithm named artificial neural network embedded grey wolf optimizer, multiverse optimizer, particle swarm optimizer, whale optimization algorithm and ant lion optimizer. Data was collected from Rainchauri (India) and Dar El Beida (Algeria) station. Models used for comparison are Valiantzas-1,2 and 3; NashSutchliffe efficiency; pearson correlation; Willmott inden. Result holds comparison with input variables Tmin, Tmax, RH, US, RS. Estimation shows RMSE = 0.0592/0.0808, NSE = 0.9972/ 0.9956, PCC = 0.9986/0.9978 and WI = 0.9993/0.9989. Amalgamated IoT and hybrid machine learning algorithms with open source technologies based system was developed. System worked on predicting the irrigation requirements of a field using parametric value such as soil moisture, soil temperature, environmental condition along with the weather forecast parameters such as humidity, precipitation, air temperature and UV. System was deployed for pilot scale. Three machine learning algorithm, that is, multiple linear regression, decision tree and artificial neural network were used for predicting the irrigation runoff volume. Dataset value were evaluated from two farms on real time basis. It is observed that decision tree algorithm is best suited in terms of highest R-square value and lowest mean square error whereas multiple linear regression shows the worst least R-square value and highest means square error. Comparative analysis of machine learning algorithm, that is, multiple linear regression, elastic net regression, gradient boosting regression tree and random forest repressor was carried out to predict the soil moisture. Sensory-based data was collected from fields. Parametric data included were air temperature, air humidity, soil moisture, soil temperature and other forecast data. Results show that gradient boosting regression algorithm provides more accurate results than others.
142
Ashok Bhansali, Swati Saxena, Kailash Chandra Bandhu
Table 8.4 (continued) Title of paper
Key points
Smart irrigation system using machine learning and IoT. [, ]
– –
– Water irrigation decision support – system for practical weir adjustment using artificial intelligence and machine learning – – techniques. [] –
Using MARS, SVM, GEP and empirical equations for estimation of monthly mean reference evapotranspiration. []
– –
– –
Application is an approach to determine the favorable crop to grow in a particular area. Machine learning algorithm such as artificial neural network and multilinear regression algorithm was used for prediction of rainfall. Romyan’s method is used for calculating the time when the crop needs to be watered. A system based on machine learning algorithm, that is, decision support vector was developed for practical weir adjustment. Water level was predicted by the artificial neural network. Data was collected from the field survey and telemetry stations at Chiang Rai Province, Thailand. Water level was predicted with high accuracy having standard error of prediction was 2.58 cm and mean absolute percentage error was 7.38%. Aims to investigate the performance of empirical equation and soft computing approaches. Machine learning algorithm such as support vector machine, polynomial support vector machine, radial basis function and multivariate adaption regression was used for estimating the monthly mean reference evapotranspiration. Data was collected from 44 station at Iran. Performance attained by support vector machine radial basis function and multivariate adaption regression was better than empirical equations.
8.5 Conclusion If we look into soil management, crop management and water management, then many different ML models have been developed using different algorithms and approaches like Bayesian network, decision tree, ensemble learning, regression, clustering, deep learning, ANN, SVM and so on but the most commonly used ML models are based on artificial neural network. Though the usage of ML approaches in the agriculture domain is in a nascent stage, it is getting traction faster. At this point of time, work and research cited above is at individual levels, addresses individual problems and not embedded systematically into the decision-making process for improving production, reducing water waste or avoiding environmental impacts. Higher investment in digital technologies
Chapter 8 Machine learning for sustainable agriculture
143
and ML approaches demand that government policies must support the ML research and subsidies digital tools for farmers. Faster adoption of ML and analytics driven approaches will definitely help make agriculture more sustainable for future generations.
References [1]
Mahrishi, M., Kant Hiran, K., Meena, G., and Sharma, P. (Eds.) (2020). Machine learning and deep learning in real-time applications. IGI global. https://doi.org/10.4018/978-1-7998-3095-5 [2] Vyas, A., Dhiman, H., and Hiran, K. (2021). Modelling of symmetrical quadrature optical ring resonator with four different topologies and performance analysis using machine learning approach. Journal of Optical Communications, Doi: https://doi.org/10.1515/joc-2020-0270. [3] Chabbi, A., Lehmann, J., Ciais, P., Loescher, H. W., Cotrufo, M. F., Don, A., SanClements, M., Schipper, L., Six, J., Smith, P., and Rumpel, C. (2017). Aligning agriculture and climate policy. Nature Climate Change, 7, 307–309. Doi: https://doi.org/10.1038/nclimate3286. [4] Parikh, S. J. and James, B. R. (2012). Soil: The Foundation of agriculture. Nature Education Knowledge, 3(10), 2. [5] Lal, R. (1984). Soil-erosion from tropical arable lands and its control. Advances in Agronomy, 37, 183–248. Doi: https://doi.org/10.1016/S0065-2113(08)60455-1. [6] Montgomery, D. R. (2007). Dirt: The Erosion of Civilizations. University of California Press. [7] Pimentel, D., Allen, J., Beers, A., Guinand, L., Linder, R., McLaughlin, P., Meer, B., Musonda, D., Perdue, D., Poisson, S., Siebert, S., Stoner, K., Salazar, R., and Hawkins, A. (1987, April). World agriculture and soil erosion. BioScience, 37(4), 277–283. Doi: https://doi.org/10.2307/ 1310591. [8] Wakatsuki, T. and Rasyidin, A. (1992, March). Rates of weathering and soil formation. Geoderma, 52(3–4), 251–263. https://doi.org/10.1016/0016-7061(92)90040-E. [9] Narayana Rao, T. V. Dr. and Manasa, S. (2019, January). Artificial neural networks for soil quality and crop yield prediction using machine learning. International Journal on Future Revolution in Computer Science & Communication Engineering, 5(1), 57–60. http://www. ijfrcsce.org. [10] Balducci, F., Impedovo, D., and Pirlo, G. (2018). Machine learning applications on agricultural datasets for smart farm enhancement. Machines, 6(3), 38. MDPI AG, Retrieved from: http:// dx.doi.org/10.3390/machines6030038. [11] Coopersmith, E. J., Minsker, B. S., Wenzel, C. E., and Gilmore, B. J. (2014). Machine learning assessments of soil drying for agricultural planning. Computers and Electronics in Agriculture, 104, 93–104. Doi: http://dx.doi.org/10.1016/j.compag.2014.04.004. [12] Morellos, A., Pantazi, X.-E., Moshou, D., Alexandridis, T., Whetton, R., Tziotzios, G., Wiebensohn, J., Bill, R., and Mouazen, A. M. (2016, December). Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectroscopy. Biosystems Engineering, 152, 104–116. Doi: http://dx.doi.org/10.1016/j.biosys temseng.2016.04.018. [13] Johann, A. L., De Araújo, A. G., Delalibera, H. C., and Hirakawa, A. R. (2016, February). Soil moisture modeling based on stochastic behavior of forces on a no-till chisel opener. Computers and Electronics in Agriculture, 121, 420–428. Doi: https://doi.org/10.1016/j.com pag.2015.12.020.
144
Ashok Bhansali, Swati Saxena, Kailash Chandra Bandhu
[14] Nahvi, B., Habibi, J., Mohammadi, K., Shamshirband, S., and Al Razgan, O. S. (2016, june). Using self-adaptive evolutionary algorithm to improve the performance of an extreme learning machine for estimating soil temperature. Computers and Electronics in Agriculture, 124, 150–160. Doi: 10.1016/j.compag.2016.03.025. [15] Pantazi, X. E., Moshou, D., Alexandridis, T., Mouazen, A. M., and Whetton, R. L. (2016, february). Wheat yield prediction using machine learning and advanced sensing techniques Comput. Electronics in Agriculture, 121, 57. Doi: https://doi.org/10.1016/j.compag.2015.11.018. [16] Crane-Droesch, A. Machine learning methods for crop yield prediction and climate change impact assessment in agriculture volume 13,Number 11. Published 26 October 2018, Published by IOP Publishing Ltd. [17] Sengupta, S. and Lee, W. S. (2014, January). Identification and determination of the number of immature green citrus fruit in a canopy under different ambient light conditions. Biosystems Engineering, 117, 51–61. Doi: 10.1016/j.biosystemseng.2013.07.007. [18] Su, Y. X., Xu, H., and Yan, L. J. (2017). Support vector machine-based open crop model (SBOCM): Case of rice production in China. Saudi Journal of Biological Sciences, 24(3), 537–547. Doi: https://doi.org/10.1016/j.sjbs.2017.01.024. [19] Ali, I., Cawkwell, F., Dwyer, E., and Green, S. (2017). Modeling managed grassland biomass estimation by using multi temporal remote sensing data – a machine learning approach. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(7), 3254–3264. Doi: 10.1109/JSTARS.2016.2561618. [20] Bashir, K., Rehman, M., and Bari, M. (2019). Detection and classification of rice diseases: An automated approach using textural features. Mehran University Research Journal of Engineering And Technology, 38(1), 239–250. Doi: 10.22581/muet1982.1901.20. [21] Suresha, M., Shreekanth, K. N., and Thirumalesh, B. V. Recognition of diseases in paddy leaves using knn classifier, 2017 2nd International Conference for Convergence in Technology (I2CT), Mumbai, 2017, pp. 663–666. [22] Mai, X. and Meng, M. Q. Automatic lesion segmentation from rice leaf blast field images based on random forest. 2016 IEEE International Conference on Real-time Computing and Robotics (RCAR), 255–259. https://doi.org/10.1109/RCAR.2016.7784035. [23] Thyagharajan, K.K. and Kiruba Raji, I. (2019). A Review of Visual Descriptors and Classification Techniques Used in Leaf Species Identification. Arch Computat Methods Eng, 26, 933–960. Doi: https://doi.org/10.1007/s11831-018-9266-3. [24] Jiang, B., Wang, P., Zhuang, S., Maosong, L., Zhenfa, L., and Gong, Z. (2018). Detection of maize drought based on texture and morphological features. Computers and Electronics in Agriculture, 151, 50–60. Doi: 10.1016/j.compag.2018.03.017. [25] Pinto, L. S., Ray, A., Reddy, M. U., Perumal, P., and Aishwarya, P. Crop disease classification using texture analysis, 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, 2016, pp. 825–828, doi: 10. 1109/RTEICT.2016.7807942. [26] Siricharoen, P., Scotney, B., Morrow, P., and Parr, G. (2016). A lightweight mobile system for crop disease diagnosis. ICIAR, Doi: https://doi.org/10.1007/978-3-319-41501-7_87. [27] Martin, D. L. Dr. and Gilley, J. R. Dr. Irrigation water requirements, Soil Conservation Service (SCS) 1993. [28] Hanks, R. J., Gardner, H. R., and Florian, R. L. (1969). Plant growth evapotranspiration relations for several crops in the central great plains. Agronomy Journal, 61(1), Doi: https:// doi.org/10.2134/agronj1969.00021962006100010010x. [29] Brouwer, C. and Heibloem, M. (1986). Irrigation water management: Irrigation water needs. Training Manual.
Chapter 8 Machine learning for sustainable agriculture
145
[30] Sirisha, A. (2016). Traditional to Smart Irrigation Methods in India: Review. American Journal of Agricultural Research, 1:2. Doi: 10.28933/ajar-08-1002. [31] Tikhamarine, Y., Malik, A., Kumar, A., Souag-Gamane, D., and Kisi, O. (2019). Estimation of monthly reference evapotranspiration using novel hybrid machine learning approaches. Hydrological Sciences Journal, 64(15), 1824–1842. Doi: 10.1080/02626667.2019.1678750. [32] Goap, A., Sharma, D., Shukla, A., and Krishna, C. (2018). An IoT based smart irrigation management system using Machine learning and open source technologies. Computers and Electronics in Agriculture, 155, 41–49. Doi: https://doi.org/10.1016/j.compag.2018.09.040. [33] Khan, M. and Noor, S. (2019). Irrigation runoff volume prediction using machine learning algorithms, European International. Journal of Science and Technology, 8(1). [34] Singh, G., Sharma, D., Goap, A., Sehgal, S., Shukla, A. K., and Kumar, S. Machine learning based soil moisture prediction for internet of things based smart irrigation system, 2019 5th International Conference on Signal Processing, Computing and Control (ISPCC), Solan, India, 2019, pp. 175–180, doi: 10.1109/ISPCC48220.2019.8988313. [35] Kondaveti, R., Reddy, A., and Palabtla, S. (2019). Smart irrigation system using machine learning and IOT, 2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN), Vellore, India, pp. 1–11, Doi: 10.1109/ViTECoN.2019.8899433. [36] Tyagi, S. K. S., Mukherjee, A., Pokhrel, S. R., and Kant Hiran, K. (2021). An intelligent and optimal resource allocation approach in sensor networks for smart agri-IoT. IEEE Sensors Journal, Doi: https://doi.org/10.1109/JSEN.2020.3020889. [37] Suntaranont, B., Aramkul, S., Kaewmoracharoen, M., and Champrasert, P. Water irrigation decision support system for practical weir adjustment using artificial intelligence and machine learning techniques. Sustainability, 12(5), 1763. Doi: 10.3390/su12051763. [38] Mehdizadeh, S., Behmanesh, J., and Khalili, K. (2017). Using MARS, SVM, GEP and empirical equations for estimation of monthly mean reference evapotranspiration. Computers and Electronics in Agriculture, 139, 103–114. Doi: 10.1016/j.compag.2017.05.002.
Rohit Mittal, Vibhakar Pathak, Geeta Chhabra Gandhi, Amit Mithal, Kamlesh Lakhwani
Chapter 9 Application of machine learning in SLAM algorithms Abstract: Simultaneous localization and mapping (SLAM) is computational technique for robotic system with which it moves in fixed or predefined map having unknown environment. The SLAM has an objective toward localization for the robot in unknown environment of given or predefined map. Many learned people have defined their own way of defining and designing SLAM algorithm in which mSLAM, vSLAM, FullSLAM and Extended Kalman filter (EKF)-based Gmapping SLAM are prominent. Out of which, we found that EKF-based SLAM algorithm performs better. It can be scaled for better précising by adds-on with other curve algorithm. A mixture or serial operation of these algorithms may lead to better optimization of the system. After doing experiments, we stuck with optimizing or creating better precision for localizing in the environment by applying few parametric curve algorithm over EKF. To make it more optimized, now we are experimenting EKF with machine learning (ML)-based optimization for SLAM. SLAM algorithm enables computer systems predict and update robot poses when robot is moving into given map and helps in localization. But these were not very precise in nature due to mathematical approximation of prediction, to overcome it a ML approach may be applied for better precision. In this chapter, we will discuss artificial neural network (ANN)-, k-nearest neighbor (kNN)-, CNN-based optimization techniques for optimizing precious in localization applied on EKF-based SLAM algorithm. To improve precision, an argumentation of ML can be examined, in this respect, we are designing and testing a ML system based on EKF. The system incorporates with argumentation and learning modules based on ML and deep learning. To evaluate the effectiveness of the proposed learning to prediction model, we have developed the ANN-based learning module. For the model trained with deep ML modules, the machine readjusts itself during navigation of robot on different surfaces (marble flooring, boulder tile, rough tile) due to which friction gets controlled and get better path prediction results. Various optimization algorithms are available in deep learning among which kNN,
Rohit Mittal, Poornima University, India, e-mail: [email protected] Vibhakar Pathak, Arya College of Engineering & IT, India Geeta Chhabra Gandhi, Poornima University, India Amit Mithal, Jaipur Engineering College Research Center, India Kamlesh Lakhwani, Lovely Professional University, India https://doi.org/10.1515/9783110702514-009
148
Rohit Mittal et al.
ANN is prominent one which is in great use. Out of which ANN is shallow network which is able to track complex SLAM problem and latter one has lower accuracy for tracked mobile robots. The deep learning system based approach shows significant increase in performance. Keywords: B-spline, Bezier, machine learning, SLAM
9.1 Introduction Simultaneous localization and mapping (SLAM) algorithm is a localization system which works on known map. Traditional SLAM algorithm had focuses on movement of a robot based on prediction of next step or pose, they uses various sensors and can be augmented in given map for movement purpose in a known environment [1, 2]. But in order to improve precision of the system techniques like parametric curve, data science (DS) and machine learning (ML) can be applied [3]. Today, most of the SLAM algorithms use prediction algorithms, few of them focus on study of ML algorithms [4]. ML algorithms have grown widely due to significant reduction in computation and increase in system performance [5]. In the predictive SLAM algorithm like Extended Kalman filter (EKF)-based Gmapping by applying methods like parametric curve (Bezier and B-spline) gives a little impact on precision of motion or better localization in the map. These algorithms are computationally intensive; they require specialized hardware and a little scalable in terms of sensors. To overcome this, ML techniques can be applied on SLAM algorithms. The trained prediction model captures the relationship between the input and output parameters which is then used to estimate the outcome for any given input data. This is because DS/ML is a data-driven approach and has brought various developments in numerous computer automation robotic tasks [5]. Various neural network methods are there which work as prediction algorithms like artificial neural network (ANN), convolutional neural network (CNN) and recurrent neural network (RNN) out of which ANN method is a powerful prediction algorithm which evaluates the effectiveness of proposed learning to prediction model [6, 7]. These are general-purpose learning algorithms which can be used to predict path with better prediction results. Conventional SLAM-based EKF algorithm requires input preprocessing, normalization and feature extraction using state model and measurement model these directs their performance and accuracy. By combining ML, DS and EKF-based SLAM in sequence can result in better path predication of a moving robot in known map. Hence, high-performance prediction algorithms based on ML architectures which focus on ANN module along with EKF-based SLAM algorithm has to be evaluated the effectiveness of trained model during navigation of robot on different types of surfaces [8].
Chapter 9 Application of machine learning in SLAM algorithms
149
9.2 Experimental setup We designed a wheeled robot for the purpose which has Arduino microcontroller connected with multisensors like Sound Navigation and Ranging, Infrared sensor, Inertial Measurement Unit sensor (SONAR, IR and IMU) for capturing the localized data as shown in Figure 9.1 [9, 10]. Compatibility and other issues related to sensors which are connected to wheeled robot are tested on simulator/laboratory environment.
Wheeled robot having diameter of 37.5 cm
14 Pins Microcontroller
L298 Loco motor driver with sensors like SONAR, IR, IMU Sensor Interfaced Wireless Transmission through Xbee modules
One work as Data Router
Other work as Coordinator
Extended Kalman Filter based Simultaneous Localization & Mapping algorithm applied
Optimization of EKF based SLAM using Blending function on Bezier and B Spline curve
Data Science tools applied
Figure 9.1: Sensor interfacing and designing of wheeled robot.
For transmitting the data, Xbee module is used; it is based on Zigbee protocols in which one module acts as coordinator and other as router. Any sensor that is attached with Arduino board will work as sensor object which is used to retrieve information from attached sensor synchronously. EKF-based SLAM algorithm is used for data association and prediction. It has the ability to deal with uncertainty.
150
Rohit Mittal et al.
The maps has large vector pile of data captured from sensors data with various landmark states, which is shown below and estimates 8DOF estimate for a robot moving on ground with various surfaces like marble floor, loose stone, glaze title and so on: accx accy accz gyrox gyroy gyroz distance infra In this work, for smoothing the predicted path for robot which navigates in between control points for a curve, we apply Bezier and B-spline cubic curve which consists of four control points. In this path, optimization curve is inside the convex hull of four control points, which can be optimized with help of these points by applying interpolation and extrapolation techniques and blending functions. In case above, points get blended in nonlinear way, by adding the four terms with their weights blending can be achieved for better path prediction. In Figure 9.2, Bezier cubic curve segment string is attached from P1 to P4 with upward and downward forces on P2 and P3 like momentum, impulse and friction. The Bezier cubic curve generally passes only first and last control point as because P (u = 0) = P0 and P (u = 1) = P1. P2 P4 t=1
P1 t=0 P3
Figure 9.2: Bezier cubic curve segment.
9.2.1 Relationship between parametric curves (Bezier and B-spline) A B-spline curve (piecewise polynomials) with 8DOF consists of four control points (P1 to P4) and having four Bezier curve segments. The problem with Bezier curve is that it had lack of local control points, if the number of control points gets increased then it lead to complex curve, hence it is difficult to map the path. This was easily resolved by B-spline as it has local modification scheme which change the position of control points which affects the curve at any time interval, which leads to the solution as smoothening the path. B-spline is the piece wise polynomial curves that can be differential. B-spline is combinations of control points and basis function.
Chapter 9 Application of machine learning in SLAM algorithms
151
These control points are defined in Equation (9.1) where Mbs is B-spline Basis matrix and Gbs is B-spline Geometry matrix 0 1 0 1 Pi −1 3 −3 1 B C CB P C 3 2 B B 3 −6 3 0 CB i+1 C SðtÞ = T. Mbs .Gbs = t t t 1 B (9:1) CB P C @−3 0 3 0 AB i+2 C @ A 1 4 1 0 P i+3
9.2.2 Path optimization due to blending function based on B-spline Path optimization approach used to get better estimate for the predicted path of the robot. The measurements are obtained from EKF which predicts its path for SLAM. It has ability to deal with uncertain (nonlinear) movement of robot. EKF-based SLAM algorithm used by the robot for localization estimates its position. EKF is used where the state model and measurement model dynamics are nonlinear in nature. It solves the mapping/localization problem in frame of linear filter. The EKF can be viewed as a variant of a Bayesian filter; EKFs provide a recursive estimate of the state of a dynamic system, or more precise, it solves an unobservable, nonlinear estimation problem. It is the state of a system at time T can be considered a random variable where the uncertainty about this state is represented by a probability distribution. The B-spline path optimizer is applied to the path obtained from EKF-based SLAM system by blending it with B-spline blending function based on 4-control points. The vector that contains basis splines Bbs as shown in equation 9.2: 0
−t3 + 3t2 − 3t + 1
B B 3t3 − 6t2 + 4 Bbs = B B −3t3 + 3t2 + 3t + 1 @ t3
1 C C C C A
(9:2)
Furthermore, ANN can be applied to the optimized for the purpose for prediction of better shown in Figure 9.3.
152
Rohit Mittal et al.
Ax Inertial measurement Unit
Ay
Get sensor reading
X(t) Stochastic Process
Measurement model of EKF
Gx Artificial Neural Network
Kalman gain
Gy
K(t)
z(t)
Measure orientation of robot using gyroscope & accelerometer parameter
Physical parameters like IR, SONAR, friction variable and angular velocity
Sectorial Error Probability Prediction
Update Error probability in process
X(t+1)
State updation
Figure 9.3: ANN applied to predict better path from predictive SLAM (EKF SLAM).
9.3 Datasets, experiments and results The dataset was generated by EKF SLAM, BSLAM and BS-SLAM system in simulated environment with over 500 runs. In order to validate our approach under complex and realistic condition, we used differential characteristics of the sensor and compare it with high precision system from TI [8], which has 3D accelerometer with high degree of precision along with in-built accelerometer feature [5, 11]. IMU sensor is used for dynamic acceleration and orientation of robot in motion in X, Y and Z axis. To predict path of robot in nonlinear transition model, EKF predicts and updates the robot states. The datasets are arranged in surface categories like marble flooring, glaze tile, rough tile and boulder tile. The work was based on geophysical parametric data captured by robot. Mechanical parameters used to identify force, impulse, friction, had been observed. After getting the geophysical information, robot can more accurately localize itself. The approach for optimization is based on ANN which required learning phase. For this purpose, 20% of the results obtained in first simulation set is used.
Chapter 9 Application of machine learning in SLAM algorithms
153
9.4 Results The robot is navigated in the experimental arena of 4 × 6 ft through remote area network. In the experiments, researcher analyzed robot’s locality and accuracy due to B-SLAM and BS-SLAM with respect to actual pose which is EKF-based SLAM. The experiments have been conducted by moving the robot on arena with total of 120 pulses as shown in table 9.1. Table 9.1: Robot’s locality using EKF SLAM, B-SLAM + DS and BS-SLAM + ML. EKF-based SLAM (in cm) x-axis
B-SLAM + DS (in cm)
BS-SLAM + ML (in cm)
y-axis
x-axis
y-axis
x-axis
y-axis
154
Rohit Mittal et al.
Table 9.1 (continued) EKF-based SLAM (in cm) x-axis
B-SLAM + DS (in cm)
BS-SLAM + ML (in cm)
y-axis
x-axis
y-axis
x-axis
y-axis
Chapter 9 Application of machine learning in SLAM algorithms
Table 9.1 (continued) EKF-based SLAM (in cm) x-axis
B-SLAM + DS (in cm)
BS-SLAM + ML (in cm)
y-axis
x-axis
y-axis
x-axis
y-axis
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
155
156
Rohit Mittal et al.
Table 9.1 (continued) EKF-based SLAM (in cm)
B-SLAM + DS (in cm)
BS-SLAM + ML (in cm)
x-axis
y-axis
x-axis
y-axis
x-axis
y-axis
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Chapter 9 Application of machine learning in SLAM algorithms
157
Table 9.1 (continued) EKF-based SLAM (in cm)
B-SLAM + DS (in cm)
BS-SLAM + ML (in cm)
x-axis
y-axis
x-axis
y-axis
x-axis
y-axis
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
9.5 Discussion Table 9.2 illustrates different types of error measurements mean absolute error (MAE), which evaluate the disagreement between the B-SLAM and the actual value which is used to describe average model-performance error. It is the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weights. Root mean squared error (RMSE) is the square root of the average of squared differences between prediction and actual observation. This gives improvement made by the module on error measured. The lower the percentage, the better is prediction. The global path of robot’s movement by applying EKF-based SLAM, B-SLAM + DS (SLAM technique on Bezier curve along with DS tools), BS-SLAM + ML(SLAM technique on B-spline curve along with ML) is illustrated in Figure 9.4. The path of robot by applying EKF SLAM on actual path (blue points) which shown in Figure 9.4 and it has been observe that MAE is 6.41 and RMSE is 14.63. Furthermore, for (B-SLAM + ds) path robot’s movement is superimposed till end point of path as data gets trained/learned by applying
158
Rohit Mittal et al.
ANN along with blending function on Bezier curve for smoothening the path. Hence, error gets minimized and it has been observed that MAE is 2.66 and RMSE is 9.86. For (BS-SLAM + ML), B-spline visits all control points available on curve and it shows correlation between control points and curve segments; it has been observed that MAE is 4.48 (40% more compare to B-SLAM + ds and 30% more compare to EKF-SLAM) and RMSE is 10.95 (9% more compare to B-SLAM + ds and 25% more compare to EKF-SLAM). Table 9.2: Comparison among EKF SLAM, B-SLAM + DS and BS-SLAM + ML based on MAE and RMSE. MAE
RMSE
SLAM
.
.
+ds
.
.
+ML
.
.
Figure 9.4: Navigation of robot based on EKF-SLAM, B-SLAM + DS and BS-SLAM + ML.
9.6 Conclusion SLAM enables robot to prediction and update its position when it is moved within the given map. To optimize the localization or predicted path in given map, several techniques are employed in which parametric curve based optimization which employs curve smoothing based on control points, are found to be sub-optimal and
Chapter 9 Application of machine learning in SLAM algorithms
159
computational intensive. To overcome this Ml/DS based approach is examined. The ANN-based approach is found to be better in optimizing predicted path with least to moderate computation load. To examine and get stable experiments based on observations various surfaces like marble floor, loose stone and glaze title had been utilized. The results of predicted path obtained from EKF are applied for optimization to B-spline, Bezier curve based path smoothing systems and ANN system. Stabilization of experiment had been taken care by observing deviations in result. The stable results of the dataset were taken further for analysis and found that ANN outperforms other. The ANN-based path prediction has significant updates over others by factor of 10% to 20% depending on steering angle of movement. The quality of the data (predicted path) is determined by MAE and RMSE parameters. The DS/ML approach give encouraging results in quality of data parlance as discussed in result section of the chapter. After analysis and observation of the experimental results, it is recommended further to investigate the ML/DS technique for SLAM or alike algorithms. This approach of optimization of SLAM and alike algorithms can be further investigated with CNN, long short-term memory and other learning algorithms. The hyper parameters and other optimization can also be applied for the research. It is suggested to apply on aerial vehicle if found suitable by the researcher in the area suggested.
References [1] [2] [3]
[4]
[5] [6] [7]
Bailey, T. and Durrant-Whyte, H. (2006). Simultaneous localization and mapping (SLAM): Part II. IEEE Robotics & Automation Magazine, 13(3), 108–117. Durrant-Whyte, H. and Bailey, T. (2006). Simultaneous localization and mapping: Part I. IEEE Robotics & Automation Magazine, 13(2), 99–110. K¨ummerle, R., Steder, B., Dornhege, C., Ruhnke, M., Grisetti, G., Stachniss, C., and Kleiner, A. (2009). On measuring the accuracy of SLAM algorithms. Autonomous Robots, 27(4), 387–407. Mittal, R., Pathak, V., Goyal, S., and Mithal, A. (2020). A Novel Approach to Localized a Robot in a Given Map with Optimization Using GP-GPU. In: Sharma, H., Pundir, A., Yadav, N., Sharma, A., and Das, S. (Eds.) Recent Trends in Communication and Intelligent Systems. Algorithms for Intelligent Systems, Springer, Singapore, Doi: https://doi.org/10.1007/978981-15-0426-6_17. Alsayed, Z., Bresson, G., Verroust, A., and Nashashibi, F. (2018). 2D SLAM correction prediction in large scale urban environments. Doi: 10.1109/ICRA.2018.8460773. Mahrishi, M., Hiran, K. K., Meena, G., and Sharma, P. (Eds.) (2020). Machine Learning and Deep Learning in Real-Time Applications. IGI global, https://doi.org/10.4018/978-1-7998-3095-5. Vyas, A., Dhiman, H., and Hiran, K. (2021). Modelling of symmetrical quadrature optical ring resonator with four different topologies and performance analysis using machine learning approach. Journal of Optical Communications, Doi: https://doi.org/10.1515/joc-2020-0270.
160
[8]
Rohit Mittal et al.
Mittal, R., Pathak, V., Mishra, N., and Mithal, A. (2019). Localization and Impulse Analysis of Experimental Bot Using eZ430-Chronos. In: Mishra, S., Sood, Y., and Tomar, A. (Eds.) Applications of Computing, Automation and Wireless Systems in Electrical Engineering, Lecture Notes in Electrical Engineering, Vol. 553, Springer, Singapore, Doi: https://doi.org/ 10.1007/978-981-13-6772-4_61. [9] Montemerlo, M. and Thrun, S. (2007). FastSLAM 2.0. Springer Tracts in Advanced Robotics, 27, 63–90. [10] Burgard, W., Stachniss, C., Grisetti, G., Steder, B., K¨ummerle, R., Dornhege, C., Ruhnke, M., Kleiner, A., and Tard´os, J. D., “A comparison of SLAM algorithms based on a graph of relations,” In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009. IEEE, 2009, pp. 2089–2095 [11] Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013). Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research, 32(11), 1231–1237.
Shruti Dadhich, Vibhakar Pathak, Rohit Mittal, Ruchi Doshi
Chapter 10 Machine learning for weather forecasting Abstract: In weather forecasting, we predict atmospheric conditions for specific location and time. Meteorology is used to plot atmospheric change for the desire location by collecting cumulating data. Data can be processed thoroughly in the backend to obtain accurate weather predictions. Data science plays a major role in the quantitative data processing. Many streams, organizations and business rely on the accuracy of weather forecasting; similarly, the current situation of weather is equally important for individuals. The decision of plantation, irrigation and harvesting in agriculture, air traffic control, construction work and many other occupations depends on the climatic conditions. Forecasting of weather is all about the collection of data, processing of data and analysis of data. For previous data reproduction and estimation, weather models are essentially required. Therefore, it is important to have the correct piece of information, and which is to be near to exact decisions. However, over the most recent decade, machine learning (ML) has introduced and implemented in atmospheric science. With the support of the relative predictors and accessible data, weather information is calculated by ML. From consolidated results for ML and predictive modeling, we can get more accurate data results on how ML helps to improve the physically grounded models. ML and sophisticated models are utilized by the physical models and estimated information on big computer systems to forecast the weather. With the help of a Python application programming interface (API), we can improve the creation and collection of information and peruse meteorological information that has been created, and by using TensorFlow, we can create artificial neural networks models have been created. ML gives unsupervised and supervised learning techniques to forecast weather with an insignificant mistake. In this chapter, we focused on how to generate more accurate and correct weather forecasting for enormous timeframes. This chapter contains the use of regression decision tree (DT), linear regression, and clustering techniques, DT classification, binary logistic regression and principal component analysis of ML to forecast weather achieving higher accuracy.
Shruti Dadhich, Noida Institute of Engineering Technology Greater Noida, India, e-mail: [email protected] Vibhakar Pathak, Arya College of Engineering & I.T., Jaipur, India, e-mail: [email protected] Rohit Mittal, Arya College of Engineering & I.T., Jaipur, India, e-mail: [email protected] Ruchi Doshi, Dept. of Computer Science & Engineering, Azteca University, Mexico, e-mail: [email protected] https://doi.org/10.1515/9783110702514-010
162
Shruti Dadhich et al.
Keywords: artificial neural networks, TensorFlow, machine learning, linear regression, clustering
10.1 Introduction Due to a large amount of samples, weather forecast is a complex process and a difficult but essential task for researchers. Weather forecasting is done to determine the state of weather for place and time and taking a large number of data samples. Weather forecasts are done by collecting quantitative information at a given location with a time frame for current use and predicting the upcoming changes and behavior in the atmosphere over time. The forecast of climate parameters is important for different applications like climate monitoring, severe weather prediction, drought detection, aviation industry, agriculture and production, pollution dispersal, energy industry planning, communication [1] and so on. The significant factor for organizations and individuals is the exact condition of the environment. Many businesses are indirectly or directly connected to weather conditions. Some of the agriculture depends on precise weather prediction for when to the plantation, harvest and irrigate. Other occupations like airport control tower and respective authorities, construction and other businesses are dependent on the climate. Any organization can work perfectly without any interruption with weather forecasting [2]. The data readings obtained from wind speed, humidity, barometric pressure and temperature are required for weather forecasting. The collection of data is increasing day by day and it is not possible to collect such a large amount of data for analysis and processing on a single system. The dynamic nature of the environment makes so difficult for accurate and exact forecasting using weather parameters. Many techniques like multilayer perception, linear regression, radial basis function and autoregression networks are used to estimate atmospheric parameters such as rainfall, temperature, wind speed, atmospheric pressure and meteorological pollution [1, 3–7]. The dynamic nature of the atmosphere and its control is more likely to be explained by the nonlinear operator equations. Many forecast approaches were developed by artificial neural networks (ANNs) in the recent past [8].
10.2 Data science for weather prediction Quantitative data have been examined by data science. There are different subprocesses engaged with the total process of data science for the weather forecast, as shown in Figure 10.1.
Chapter 10 Machine learning for weather forecasting
163
Figure 10.1: Data science behind weather prediction.
10.2.1 Machine learning and predictive modeling Weather models are important and utilized both for prediction and to reproduce historical data. Machine learning (ML) has increasingly been applied to climate science over the recent past [9]. ML can help to develop and holds data that is based on weather and creates a relation between the available information and the predictors, and they can get accurate results by integrating the two methodologies [10]. ML and sophisticated models are utilized to predict the upcoming weather. A big computer system is used in these models to establish a balance between physical models and measured data. Weather data needs to be consumed within time as previously occurred data is not so much useful, as much as current data which the help of which prediction can be made for upcoming weather. So in the result, data must fall in and fall out easily and be reused in time.
10.2.2 Data: an important piece of weather predication To be close to accurate choices, it is important to have the right data. The details should be taken as it is important to remember the time and position at which it is noted. Today, with gyrometers, barometers and a large variety of sensors, all the gadgets are
164
Shruti Dadhich et al.
IoT enabled. Thus, the location starting with beginning to others is well accessible. Subsequently, mobile phones eventually turned out to revolutionize the weather analytics industry. This business has truly transformed them. There is a need to use the weather data in a short time frame of under a minute as no one needs to know what had occurred before. This is all-important that what is going on now and what will occur later on. In a few minutes, the data must fall in and fall out quickly and be reused quickly.
10.2.3 Weather data – Forecasting of nature disasters: By data analytics using different models, natural disasters like flood can be forecasted. This requires gathering data such as the yearly rainfall of a particular area and the surrounding road condition. – Sports: Climate like heavy rain will lead to postponing or stop the match in the middle of cricket or other matches. Predicting the weather will assist in selecting the best time for matches before eliminating the risk of delaying the game. – Predication for asthma patients: Weather information may be used for serious medical conditions or example asthma. Around the time of asthma attack, the inhalers we use have sensors in them. It can collect data to guarantee that patients are used appropriately. It gathers information identified with the temperature, quality of air, humidity and dust presence where the patient invests time. By predicting when asthma will be caused, this knowledge can help to minimize the risk of attack. – Car sales prediction: Weather information can also be used via vehicle vendor/ merchants to make sense of vehicle deals in a specific climatic scenario like in the stormy weather, human feel timid however need to go out because of work or different purpose and thus wind up purchasing a vehicle.
10.2.4 Sensor data and satellite imagery Today, satellite symbolism is the basic basis of atmospheric science. Satellite imagery comes in various shapes and sizes. In the white and black spectrum, certain satellites run, a few others may be helpful to recognize and calculate clouds, and some to quantify winds around the seas. Most information researchers rely on satellite imagery to produce short-term predictions, to decide if a prediction is right, and to approve models as well [11]. For pattern matching, ML is used here. When it matches a pattern that has just shown up previously, it very well may be utilized to anticipate what will occur later on. Sensor data are generally used to render forecasts at a local level to the last ground level reality climate model [9].
Chapter 10 Machine learning for weather forecasting
165
10.3 Machine learning types 1. Supervised learning: We provide input along with the desired output in this type of learning input and the desired output is given to the machine and for classification Input and output data are marked. The anticipation of the target variable is done from the set of predictor provided. In supervised learning, algorithms such as regression, logistic regression, decision trees (DTs), random forests, boosting algorithms, KNN (k-nearest neighbors) and others are used. Some types and their example mention in table 10.1 Table. 10.1: Supervised learning types and its examples. Types
Usage example in business
Neural network
Predicting financial outcomes and detecting fraud
Classification and regression
Investigating fraud and spam filtering
Decision tree
Problem management
2. Unsupervised learning: It can be involved in tracking inferences made from datasets that have no labeled responses but do have input data [12]. We do not have a leader who can direct us.Such algorithms used in unsupervised learninginclude K-means, adaptive resonance theory, a priori algorithm and self-organizing map (SOM Model) [13, 14]. Some types and their example mention in table 10.2 Table. 10.2: Unsupervised learning types and its examples. Types
Usage in business
Cluster analysis
Financial transactions, streaming analytics in IoT
Pattern recognition
Spam recognition, identity management
Association rule learning
Bioinformatics, assembly and manufacturing
10.4 Weather forecasting The way toward anticipating the climate is very complicated [15]. This mechanism of this cycle depends on two main foundations. On the one side, there is a large volume of data expected from a multitude of sources and also there are climate simulations and investigations that attempt to understand all the approaching data. A large collection of technologies and devices are available for collecting weather information.
166
Shruti Dadhich et al.
There is a ton of information going around ranging from essential equipment like anemometers, barometers, thermometers to complex instrument text ending from climate radar systems and balloons to environmental and deep space satellites. It is genuinely evident that information accessibility is not an issue with regards to weather estimates. Thus, the climate models used to distinguish insights and patterns from certain datasets should be studied. They achieve the finest outcomes while operating in real-time. The environment is continuously evolving and it is challenging to find a forecast over a long period. It is truly commendable that these models can deliver seven-day weather forecasts with an estimated 80% reliability [15]. In such a dynamic situation, a lot of data needs to be processed non-stop for better forecasting. When data science is previously an estimation method in weather estimating, at that point, it is up to the new methodologies in it that can drive that 80% up. ML is strong support of both data science and artificial intelligence. The presentation of ML-based simulations will significantly support weather models, as this technology can accommodate a lot of climate information and refine itself the more it is used for more detailed forecasts. For weather analysis, the primary thing about using ML is that we can make instant correlations and distinguish patterns on the fly. Development services in Java, Python and R, are thinking of new stages and solutions fit for gathering and comparing information from satellites, weather stations and radars to previous weather forecast data. The software is prepared from this to detect inaccuracies and anomalies based on previously known data and the current circumstances. Weather forecasting can provide more detailed forecast from deep learning over time. Deep learning algorithms investigate data in the same manner as a human brain. The only profit is that these algorithms operate much better than human do. Some models show the potential of deep and ML that data science can bring to the table. Bayer’s environment corporation is aggressively involved in artificial intelligence (AI) for climate prediction applied to the agricultural sector. The field view solution provides a forecast about the weather by using ML algorithms. This makes the decision-making procedure of farmers extraordinarily advantage. The final innovation that has progressed toward weather forecasting is smartphones. By extracting knowledge from the individual locations of their users, mobile devices improve data science in the field. Weather services can tailor predictions and also provide a high level of accuracy. So, users who access weather applications are integrated with a feedback loop that gives the correct data about the climate that shows signs of improvement because of them using the application in the first place [15]. ML is a component of AI. A computer learns from input and data using different algorithms in ML. A computer does not have to specially program. These can be changed and the algorithm can modify own their own [12]. The phases involved in ML are expanded: 1. Data collection: The gathering of information is an extremely vital advance in terms of consistency and quantity. It determines the power of our predictive model. It decides the predictive model’s accuracy. The information is gathered
Chapter 10 Machine learning for weather forecasting
2.
3.
4.
5. 6.
7.
167
and organized in a tabular format. Training data is a term used to describe this knowledge. Preparation of data: Data is fixed into the proper position and prepared for use in ML training. Data is categories into two sections. The first is the initial segment of the information and another one is testing data. This set of data is used to enhance the performance of a model. Selecting a model: In this step, it chooses a model that has been developed over time by both researchers and data scientists. Fundamental is to pick the right model to get their position fruitful. Training: To predict the model’s capability, a significant volume of training data was used, and this cycle also included the introduction of arbitrary values for model A and B. We can forecast the model’s performance by using certain values. The expected value is compared to the model’s predictions after this stage. Set the values to match the most recently predicted values at that stage: 70–30%, 60–40%, 65–35% and so on are the percentage of test and training information which we partition to check our model. Evaluation: This involves comparing the model to the data and then adding the data from the training process to the actual test data to validate our model. Parameter tuning: If necessary, we used this phase to enhance our preparation even further. Another parameter is the rate of learning, which demonstrates how far the line has been moved in each move. Prediction: It is responsible for queries. Values achieved can be utilized by the model to guess the result [12].
10.5 Techniques applied for weather forecasting 10.5.1 Artificial neural networks A vast range of applications has been discovered by ML algorithms [16] in recommender systems, AI, computer vision and data tracking and search engines in games. ANNs are used as ML algorithms [17]. ANNs are mechanisms that mimic how the brain operates. They consist of neurons and synapses in their most simple structure, where the neurons are the functions and synapses are the weights. The architecture of ANN, like backpropagation and basis function neural network, is established for forecast monsoon rainfall and other parameter prediction phenomena [18]. TensorFlow is a mathematical models library and computational graph [19]. It is developed by Google to design algorithms for ML. The effortless usage and smooth integration into heterogeneous systems are some of the benefits of TensorFlow. With directed graphs, models in TensorFlow are described in figure 10.2. The input or output to each operation of TensorFlow in this graph is at least zero or more tensors.
168
Shruti Dadhich et al.
Figure 10.2: Artificial neural network and its TensorFlow depiction.
Without previous hardware design details, the API allows data scientists to work with massive models and multiple data in a distributed environment. Only iterative training is done in ANNs [20]. The method will involve a feed-forward phase by the ANN in each loop and a backpropagation that move in reverse. A collection of data samples (mini-batch or batch) are taken care of for the ANN for each loop. This alters the parameters (biases and weights) that decrease a given loss function [21]. ANN is a group of activation functions and perceptrons. The perceptrons are associated with forming hidden layers or units. The hidden units make the nonlinear basis. In a lower-dimensional space, nonlinear basis interconnects the input layers and output layers, which is also known as ANNs. ANNs are an interconnection of input to output. The interconnection is calculated with the help of bias values and weight, where this architecture is known as the model [22]. The training cycle decides the values of biases and weights. The model values are introduced with random constraint values during the starting of the training. The error is computed using a loss function by contrasting it with the ground truth. The weights are tuned at every level based on the loss computed. The training is halted when the error cannot be further decreased. The training process learns the features during the training. The features are a good representation than the raw images. In TensorFlow, modeling of discrete-time series predicts weather forecast which is based on prior observations of the same variable used. The ANN inputs and outputs are the same variable but are used separately at different times [23].
10.5.2 Linear regression model In comparison to physical models of weather prediction, ML may speak to a realistic decision. First, linear regression and a variant of functional regression were applied using two ML algorithms. Input data from algorithms are the weather information
Chapter 10 Machine learning for weather forecasting
169
of the previous 2 days. This data includes the mean humidity, high temperature, atmospheric classification for every day, mean weather pressure and minimum temperature. The extreme and least temperatures were the output for up to 7 days [24]. We analyze their application to climate forecasting to possibly create exact weather figures for major periods. A variation is used to capture patterns in the weather forecasting models like functional regression and linear regression. Both models were surpassed through proficient weather forecasting services. For the forecast of later days, the inconsistency between models and professional ones easily decreased and maybe our models will beat professional ones for considerably longer time scales. The paradigm of linear regression is better than the functional regression model. It is proposing that 2 days were excessively insufficient to catch huge climate patterns for the last day. It may cause the functional regression model to perform better than the linear regression model by basing our predictions concerning climate information for 4 or 5 days. The linear regression algorithm was used as a linear combination of the models to forecast low and high temperatures. For classification data, linear regression cannot be used. The weather description for the day was not included in this algorithm. Therefore, only eight characteristics were used for each of the previous 2 days: mean humidity, high temperature, low temperature and mean air pressure. Therefore, p(i)∈ R9 for the ith pair of consecutive days is a nine-dimensional function vector, where p0 = 1 is characterized as the intercept term. For each pair of consecutive days, there are 14 quantities to be expected, with low and high temperatures for the next 7 days. For the ith pair of consecutive days, let the 14-dimensional vector containing these quantities is denoted by q(i)∈ R14. The forecasting of q(i) given p(i) is hθ(p(i)) = θT p, where θ ∈ R9×14. The cost function that is attempted by linear regression to minimize is given as follows: jðθÞ =
m
1X jhθ pðiÞ − qðiÞ j2 2 i=1
(10:1)
Where the sum of training example is m. Assuming P ∈ Rm×9 be denoted as Pij = x(i)j and Q∈Rm×14 be describe as Pij = p(i)j, the θ value reduces the cost in Equation (10.2) is − 1 θ = P T P PT Q
(10:2)
Another algorithm utilized was variations of functional regression. It checks for previously recorded climate trends after which it predicts the weather based on these historical patterns [24] which are basically like the current weather patterns.
10.5.3 Decision trees DTs play an important job. With some classification rules in weather prediction, the role of the DT is critical. Attributes are essential for weather forecasting and what
170
Shruti Dadhich et al.
class can accomplish these attributes. The class and attributes applicable to this exploration are represented in the below table 10.3 [25]. Table. 10.3: Classes and Attribute of Decision trees. S. no.
Class
Attribute
High, normal
Humidity
High, mild, cool
Temperature
Sunny, overcast, rainy
Outlook
True, false
Windy
A DT is a device for decision support [26]. It uses a graph identical to the model tree or choice and its possible results, utility, asset expenses and including chances for occasion results. It is one way to represent estimations. In the DT, every node shows a test on a property. The outcome of the test is addressed to each branch. Leaf nodes address the appropriation of class. An express plan of “assuming then” rules provides the choice tree structure, making their suits easy to translate [27]. Branches address component conjunctions in tree structures and leave address groupings that have prompted those arrangements. Entropy characterized the rise in data. In other to further improvement of accuracy and speculation of plan and relapse trees, various frameworks were introduced like boosting and pruning. Boosting is a process to improve the accuracy of a prescient potential by adding the limits in consolidation and arrangement on and on. The yield with a weighting of each limit so that the forecast’s aggregate blunder is reduced or separate free tree are produces simultaneously and only joins them after all the trees have been created. The tree has been pruned to increase the tree’s span and reduce overfitting in these sections, which is a problem in detailed single tree models as the model begins to match the disruption of data. The model would not have the ability to add up whether a similar model is added to details not included in the design of the model. There are many calculations such as Logit help, C4.5, alternating DT (LAD), alternating DT and classification and regression tree concerning DT. DT builds arrangements deteriorate tree structure models. It breaks datasets into smaller and incremental creation of associated DT at the same time. A tree with two nodes, that is, decision nodes and leaf nodes would have that outcome. For example, the decision node will have two or three branches: overcast, rainy and sunny as mentioned in figure 10.3. The node of the leaf interacts with an order or preference. The root node is the tree’s highest decision node, which corresponds to the predictor. DT and handle both clear cut and numerical information [26]. DT is the most commonly used methodology for investigating meteorological data. The neuron model in DT consists of three parts like interfacing links, synopsis arrangement, and consideration dependent on the strength/weight of its own.
Chapter 10 Machine learning for weather forecasting
171
Figure 10.3: Decision tree created by training datasets.
10.5.4 Logistic regression In case the dependent variable is binary (dichotomous) logistic regression is at most regression analysis that can be conducted. Similar to other regression analysis, there is also a predictive study of logistic regression. Logistic regression is used to show details and explain the relationship between the binary variable dependent and nominally independent, ordinal variable. In case for two and more dependent variable outcomes categories, they can be analyzed in polynomial logistic regression and ordinal logistic regression in case many categories are entertained [28, 29]. The logistic function based binary logistic regression model [30] is used to analyze the dependency of a dichotomous response variable (Y) on either independent or continuous explanatory variables (X1, X2 and Xk). Use of Logistic regression in epidemiology however its use has also recently started in meteorology.
10.5.5 Clustering Weather forecasting is nothing but the prediction of future weather conditions based on legacy datasets. This consists of two models, that is, predictive and descriptive, used for extraction of the information, used a clustering method of descriptive model [31]. Clustering is the most common type of unattended learning approach that is used without any labeling to find the data structure. The clustering tool is very powerful and is used in various forecasting works like time series and flood forecasting, real-time storm detection and so on. Clustering normally involves the grouping of similar data. For example, it is difficult to have a target attribute that remains the same during the entire process while predicting the weather for the future. Since
172
Shruti Dadhich et al.
weather causes a frequent change in the environment, it is difficult to use classification for weather forecasting. The dataset used contains various attributes that include a specific format for date, summary, temperature, humidity, precipitation type, wind speed, daily summary and cloud coverage pressure since using clustering for this dataset in this model. The simple steps that need to be taken before beginning with clustering algorithms called preprocessing of data are below. We used k-means, hierarchical clustering and clustering methods dependent on the density The definitions of different clustering algorithms are described: 1. Simple K-means: This requires splitting n experiments into k clusters using Equation (10.3) with the closest mean value cluster jðvÞ =
zi z X X i=1
ðjjai − bj jjÞ2
(10:3)
j=1
where the Euclidean interval between ai and bj is “ai − bj.” The number of data points in the ith cluster is zi. “z” is the number of centers for clusters 2. Hierarchical: To construct a hierarchy of clusters using Equation (10.4), this approach is used Lðr, sÞ =
ns nr X 1 X D xri , xsj nr ns i=1 j=1
(10:4)
In which there are distinct clusters S and L 3. Density-based: This method creates clusters using Equation (10.5) based on the results N 2 ð pÞ:fqndð p, qÞ ≤ ϵ
(10:5)
10.6 Conclusion – The major variation in temperature for latitude causes an atmospheric disturbance, floods and heavy rains. – K-means is the best suitable algorithm for the dataset provided. – In the improvement of accuracy of the model, ML needs some algorithm like binary logic regression. Further, improvement in the accuracy of weather forecasting with NWP and by applying products of the models. – Weather parameters like maximum and minimum value temperature of the year, month and days are used for classification using a DT algorithm. In the field of weather forecasts, DTs have proved to be an important tool for making decisions.
Chapter 10 Machine learning for weather forecasting
173
– Both functional regression and linear regression were surpassed by weather prediction services. Linear regression proved to be a lower bias, high variance model whereas functional regression proved to be a high bias, low variance model. Since linear regression is essentially a high variance algorithm due to its uncertainty to outliers, adding more data to the model is a way to strengthen it. Functional regression was high bias, indicating that the selection of the model was not good and that its predictions cannot be improved by the further collection of data. This bias could be due to the design choice to forecast weather. It is based on the weather of the last 2 days, which could be very short to catch weather trends that require functional regression and its bias is reduced if collection data of weather is based on the past 4 to 5 days. – TensorFlow is the effortless use and seamless integration into heterogeneous systems.
References Pal, N. R., Pal, S., Das, J., and Majumdar, K. (2003). “SOFM-MLP: A hybrid neural network for atmospheric temperature prediction.” IEEE Transactions on Geoscience and Remote Sensing, Vol.41, No, 12, pp.2783–2791. [2] https://www.softwebsolutions.com/weather-forecasting-using-data-science.html [3] Min, J. H. and Lee, Y.-C. (2005). Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert systems with applications 28(4), 603–614. [4] Mohandes, M. A., Halawani, T. O., Rehman, S., and Ahmed Hussain, A. (2004). Support vector machines for wind speed prediction. Renewable Energy 29, 939–947. [5] Yu, P.-S., Chen, S.-T., and Chang, I.-F. (2006). Support vector regression for real-time flood stage forecasting. Journal of Hydrology 328(3–4), 704–716. [6] Osowski, S. and Garanty, K. (2007). Forecasting of the daily meteorological pollution using wavelets and support vector machine. Engineering Applications of Artificial Intelligence 20(6), 745–755. [7] Wei-Zhen, L. and Wang, W.-J. (2005). Potential assessment of the “support vector machine” method in forecasting ambient air pollutant trends. Chemosphere 59(5), 693–701. [8] Radhika, Y. and Shashi, M. (2009). Atmospheric temperature prediction using support vector machines. International journal of computer theory and engineering 1(1), 55. [9] https://data-flair.training/blogs/data-science-for-weather-prediction/ [10] Mahrishi, M., Hiran, K. K., and Sharma, P. (Eds.) (2020). Machine Learning and Deep Learning in Real-Time Applications. IGI global, https://doi.org/10.4018/978-1-7998-3095-5. [11] Hiran, K. K., Doshi, R., Fagbola, T., and Mahrishi, M. (2019). Cloud Computing: Master the Concepts, Architecture and Applications with Real-world examples and Case studies. BPB Publications. [12] Agarwal, A. K., Shrimali, M., Saxena, S., Sirohi, A., and Jain, A. (2019, April). Forecasting using machine learning. International Journal of Recent Technology and Engineering (IJRTE), ISSN: 2277-3878, 7(6C).
[1]
174
Shruti Dadhich et al.
[13] Gupta, S., Indumathy, K., and Singhal, G. (2016). Weather prediction using the normal equation method and linear regression techniques. (IJCSIT) International Journal of Computer Science and Information Technologies 7(3), 1490–1493. [14] Vyas, A., Dhiman, H., and Hiran, K. (2021). Modelling of symmetrical quadrature optical ring resonator with four different topologies and performance analysis using machine learning approach. Journal of Optical Communications, Doi: https://doi.org/10.1515/joc-2020-0270. [15] https://www.europeanbusinessreview.com/how-data-science-can-enhance-weatherforecasting/. [16] Mitchell, T. (1997). Machine Learning. McGraw-Hill Education. [17] LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521(7553), 436–444. [18] Shrivastava, G. et al. Application of artificial neural networks in weather forecasting: a comprehensive literature review. International Journal of Computer Applications 51(18), 2012. [19] Abadi, M. et al. “Tensorflow: A system for large-scale machine learning”. 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 2016. [20] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by backpropagating errors. Nature 323(6088), 533–536. [21] Albaqsami, A., Hosseini, M. S., and Bagherzadeh, N. “HTF-MPR: A heterogeneous tensorflow mapper targeting performance using genetic algorithms and gradient boosting regressors”. 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2018. [22] Shanmugamani, R. “Deep Learning for Computer Vision” [23] Abrahamsen, E. B., Brastein, O. M., and Lie, B. “Machine learning in python for weather forecast based on freely available weather data”. (2018). [24] Holmstrom, M., Liu, D., and Christopher, V. “Machine Learning Applied to Weather Forecasting” Stanford University December 15, 2016. [25] Krishnaveni1, N. and Padma2, A. “Weather forecast prediction and analysis using sprint algorithm” © Springer-Verlag GmbH Germany, part of Springer Nature 2020. [26] Siddharth, S. B. and Hubballi2, R. G. (2016, May). Weather prediction based on decision tree algorithm using data mining techniques. International Journal of Advanced Research in Computer and Communication Engineering 5(5). [27] Holmes, R., “Using a decision tree and neural net to identify severe weather radar characteristics” [28] https://www.linkedin.com/pulse/regression-analysis-weather-forecasting-chonghua-yin [29] Pang, G. et al. (2019). A binary logistic regression model for severe convective weather with numerical model data. Advances in Meteorology, (2019). [30] Kleinbaum, D. (1994). Logistic Regression. Springer-Verlag, New York, NY, USA. [31] Nalluri1, S., Ramasubbareddy2, S., and Kannayaram2, G. (2019). Weather prediction using clustering strategies in machine learning. Journal of Computational and Theoretical Nanoscience 16, 1977–1981.
Roopa B. Hegde, Vidya Kudva, Keerthana Prasad, Brij Mohan Singh, Shyamala Guruvare
Chapter 11 Applications of conventional machine learning and deep learning for automation of diagnosis: case study Abstract: Computerized diagnostic methods are commonly used in recent years as initial screening methods for the detection of diseases. Machine learning (ML) and deep learning (DL) are extensively used in the medical field for the investigation of various computer-aided diagnostic applications. Remarkable progress has been made toward the development of such systems in various branches of the medical field by many researchers using the ML approach. In the present work, we compare and contrast the suitability of ML and DL for two different kinds of applications. Suitability of ML and DL methods experimented separately for categorization of white blood cells into the five normal types and abnormal types, and classification of uterine cervix images into cancerous and normal. Both the techniques performed equally well in the case of peripheral blood smear images due to visibly clear features of white blood cells. In the case of uterine cervix images, DL performed better compared to that of ML. This is because of features in the images are not differentiable. From this study, we conclude that conventional ML is useful when features are visible, availability of dataset and resource is a constraint. DL is useful when feature engineering is a difficult task and the availability of the resource is not a constraint. Keywords: conventional machine learning, deep learning, computer-aided diagnosis, peripheral blood smear, cervical cancer screening, classification
Roopa B. Hegde, NMAM Institute of Technology, NITTE, India, e-mail: [email protected] Vidya Kudva, NMAM Institute of Technology, NITTE, India, e-mail: [email protected] Keerthana Prasad, Manipal School of Information Sciences, MAHE, Manipal, India, e-mail: [email protected] Brij Mohan Singh, Department of Hematology, Kasturba Medical College, MAHE, Manipal, India, e-mail: [email protected] Shyamala Guruvare, Department of Obstetrics and Gynecology, Kasturba Medical College, MAHE, Manipal, India, e-mail: [email protected] https://doi.org/10.1515/9783110702514-011
176
Roopa B. Hegde et al.
11.1 Introduction Computerized diagnostic tools are commonly used in recent years as initial screening methods for the detection of diseases. Advances in imaging technologies and computer-aided tools have increased the possibility of using automated diagnosis methods in the medical field. Remarkable progress has been made toward the development of such systems in various branches of the medical field by many researcher groups either using conventional machine learning (ML) approach or deep learning (DL) approach. The conventional ML approach follows the image-processing pipeline. The final decision depends on segmentation and feature engineering. Extraction of regions of interest from images and selection of suitable features play an important role in this technique. Hence, this technique is suitable for those cases where features are visible. Whereas in DL technique, image is directly given as input and the network learns the feature by itself. Hence, this method is suitable for a complex scenario where features are not evident. However, this technique requires a large number of training samples to learn. Also, it requires more resources compared to that of the conventional ML technique. Computer-aided decision support systems reduce the burden on experts and help in providing objective results. Many research groups have tried to introduce automation in the screening procedure using conventional ML techniques. Recently, the possibility of lung cancer detection using CT images was experimented with by Wason and Nagarajan, Makaju et al., Nadkarni and Borkar [1–3]. Brain tumor detection using MRI images was carried out by Vadmal et al., Ray and Thethi [4, 5]. Challenges and opportunities in the development of automated brain tumor detection are provided by Martin et al. [6]. Conventional ML is used for the analysis of histopathological images by Komura et al. and Zarella et al. [7, 8]. Peripheral blood smear (PBS) images are used for the analysis of the morphology of blood cells by Rodellar et al. [9]. ML methods are used for the detection of sickle cell anemia [10] and leukemia [11–16]. The development of cervical cancer screening using a ML approach was attempted by several research groups [17–20]. Recently, DL has gained tremendous value in the medical field. Figure 11.1 shows the state-of-the-art techniques using DL for the analysis of medical images [21]. It is clear from the figure that numerous researchers have explored the usability of DL for various types of medical images for performing various tasks based on the application. DL using CNN is commonly used for classification and segmentation. Heasm et al. [22] provided challenges and achievements of using DL techniques for the segmentation of medical images by considering CT and MRI images. Weiming et al. [23] designed a deep network (DN) for forecasting mild cognitive impairment-to-Alzheimer’s complaint conversion using MRI data. The usability of DL to neuro-imaging techniques for the identification of many diseases is explained by Zhu et al. [24]. Juanying et al. [25] analyzed histopathological images using DL for automated diagnosis of breast cancer
177
Chapter 11 Applications of conventional machine learning
Number of papers
250 200
Segmentation (Organ, substructure) Detection (Object)
150
Classification (Object)
100
Other Detection (Organ, region, landmark) Segmentation (Object)
Classification (Exam)
50
Registration
0 2013
2012 All
2014
2015
CNN RBM RNN AE
2016
2017
0
20
40
60
80
100
Other Multiple
MRI Microscopy CT Ultrasound X-ray Mammography Other Multiple Color fundus photos
Pathology Brain Other Lung Abdomen Cardiac Breast Bone Retina Multiple 0
20
40 60 Number of papers
80
100
0
10
20
30 40 50 Number of papers
60
70
Figure 11.1: Application of deep learning in the medical domain [21].
and Izuka et al. [26] for the detection of gastric and colonic epithelial tumors. DL using CNN was used for sickle cell anemia detection [27], leukemia detection [28, 29] and classification of WBCs [30, 31]. Also, the usability of DL was explored for cervical cancer screening [32–34]. In this work, we compare and contrast the suitability of conventional ML and DL for two kinds of applications namely PBS images and uterine cervix images. PBS comprises erythrocytes (or RBCs), leukocytes (or WBCs) and platelets. In this work, we categorized WBCs into five types (normal) and abnormal. The other application is cervical cancer screening using cervix images obtained through the visual inspection with acetic acid (VIA) procedure. Cervix images are categorized into cancerous and normal cases. In the conventional ML approach, classification was achieved by using an imageprocessing pipeline, whereas, in the DL approach, a convolutional neural network (CNN) is designed with an input layer, convolution layers, activation layers, pooling layers, fully connected layers and output layer. Input images of a specific dimension are provided to the network through the input layer. In the convolution layer, the convolution operation is carried out to extract features from the input image. In activation layer, rectified linear unit (relU) is commonly used. It is a nonlinear function that replaces the negative value of the convolution layer output to zero. The pooling layer helps in minimizing computations in subsequent layers by reducing the size of the output of the convolution layer. Fully connected layers are the last few layers and they compile the features extracted by the previous layers. In this study, a DN using CNN is separately designed for the categorization of WBCs and the categorization of cervix images.
178
Roopa B. Hegde et al.
11.2 Method The methods used for the development of decision support systems for the aforementioned two applications are provided in the following sections. Section 11.2.1 explains the methodology for the classification of WBCs. Section 11.2.2 explains the methodology for building a decision support arrangement for cervical cancer screening.
11.2.1 Classification of white blood cells There are five types of WBCs, namely lymphocytes, monocytes, neutrophils, eosinophils and basophils as shown in Figure 11.2 PBS is evaluated under a microscope by pathologists. This is a laborious procedure in which pathologists look for any abnormality related to shape, size, count and distribution of blood cells. WBCs consist of the inner nucleus and outer cytoplasm. Therefore, it is important to analyze nuclei and cytoplasmic changes while evaluating WBCs. Hence, any irregularities related to the count and/or appearance of WBCs indicate an abnormal condition. Platelet
Band
RBC
Eosinophil
Neutrophil
Lymphocyte Basophil Monocyte
Figure 11.2: Microscopic view of the peripheral smear.
To investigate the usability of conventional ML and DL approaches for the classification of WBCs, we carried out experimentation using both approaches. PBS image dataset is acquired from Kasturba Medical College (KMC), Manipal, India. To conduct an extensive examination, Leishman-stained peripheral smears were captured in JPEG format using an OLYMPUS CX51 microscope with a resolution of 1,600 × 1,200. We accumulated 1,159 PBS images consisting of 170 lymphocytes, 109 monocytes, 297 neutrophils, 154 eosinophils, 81 basophils and 607 abnormal WBCs. A few images consisted of multiple WBCs and thus we could obtain 1,418 single WBC images out of 1,159 PBS images.
Chapter 11 Applications of conventional machine learning
179
The dataset holds images of varying brightness levels and color variations. Also, the images include erythrocytes, leukocytes, platelets and staining artefacts.
11.2.1.1 Conventional machine learning approach Classification of WBCs is carried out using image-processing steps. One of the unique features of WBCs is the presence of nuclei which appear dark in PBS images. This helped in the segmentation of nuclei and hence the segmentation of WBCs. The proposed nuclei segmentation method involved automatic cropping, histogram-based thresholding and post-processing. WBCs consist of dark nuclei and regions of cytoplasm around the nuclei. The region of cytoplasm appears pale and its color is similar to the background region in most cases which makes it difficult to detect the WBCs accurately. Also, variations in shape and size of WBCs add more complexity for the development of an automated method. In this study, the shape and color variations of WBC were addressed by using the color transfer method, generation of an adaptive mask and active contour method for accurate detection of WBCs. Features were extracted from nuclei as well as regions of cytoplasm for classification of WBCs into lymphocytes, monocytes, neutrophils, eosinophils, basophils and abnormal WBCs. Categorization is a two-step technique in which basophils are classified using a support vector machine (SVM) classifier based on characteristics of only the nuclei. Further, neural network (NN) is employed to categorize the other types of WBCs. The main blocks of the applied methodology are shown in Figure 11.3. In this figure, block N corresponds to nuclei segmentation and block W corresponds to segmentation of WBC regions. To limit the regions of processing, we performed an automatic cropping of the original image. To crop the images, approximate regions of nuclei were detected using arithmetic operation. The location of the approximate nucleus was used to crop the images automatically. The green component of the original image was contrast enhanced. The contrast-enhanced image was added to the original G component image. The result of the addition of the two images was a grayscale image. To detect the approximate region of the nucleus, a threshold value of 110 was used. This approximate region of the nucleus was utilized for cropping the primary images using the bounding box method. The result of automatic cropping is a small image consisting of a single WBC as shown in Figure 11.3(a). The cropped image was used for the segmentation of nuclei and WBC regions. We used the histogram-based thresholding method for the segmentation of nuclei. A histogram of the cropped image was obtained to estimate the appropriate threshold value. The estimated value was applied to the contrast-enhanced G component of the cropped image to obtain the region of the nucleus. For all the images in the dataset, threshold values thus obtained were in the range from 4 to 137. The nuclei segmentation steps are shown in Figure 11.3(b–d). We used the active contours approach for the segmentation of leukocytes. Active contours
Figure 11.3: The methodology of white blood cell classification.
180 Roopa B. Hegde et al.
Chapter 11 Applications of conventional machine learning
181
operate on the principle of the energy minimization method. Hence, to make use of this property effectively, we used color transfer and background removal methods. The color transfer method provides good contrast between the region of cytoplasm and background as shown in Figure 11.3(e). The background of the color transferred image was removed as shown in Figure 11.3(f) for the efficient use of the active contours method. Initialization of mask plays an important role in the active contours method. Hence, a novel adaptive mask generation method was introduced for the correct detection of WBCs. Mask for active contours technique was formed depending on the region and circularity of the nucleus. As a result, the size of the mask varied depending on the size of the WBCs. The mask generation method addressed the size and shape variations. To obtain the area of cytoplasm, we subtracted the area of the nucleus from the area of WBC. Color variations are common in cytoplasm regions among the different types of WBCs. Mean and variance of L, A and B components of CIE LAB color space representation were used. Texture variations can be witnessed in nuclei and also cytoplasm regions. Hence, texture features were extracted from both regions. The feature set consists of 93 features which include 9 shape features obtained from the nuclei, 6 color features obtained from the regions of cytoplasm and 39 texture features obtained from both nuclei and regions of the cytoplasm. These features helped in the representation of various characteristic features of WBCs which were required for classification. The classification was carried out using a hybrid classifier consisting of an SVM and a NN. In the first pace, WBCs were classified into basophils and “others.” Basophils consist of dark granules all over their region and hence nucleus and cytoplasm cannot be differentiated. The nucleus segmentation method detected the whole region of the basophil. Texture features derived from gray level dependence matrix (SGLDM) were extracted from the nuclei. These texture features distinguish basophils from the other types of WBCs. Hence, basophils were removed from the dataset for further processing such as extraction of WBCs. We used SVM classifier by selecting the polynomial kernel of order 3 for the classification of basophils from the other types of WBCs. In the second pace, categorization of WBCs into lymphocytes, monocytes, neutrophils, eosinophils and abnormal WBCs was performed using a NN classifier. To handle this multiclass problem, we used NN with one hidden layer consisting of 24 nodes. The dataset comprises 1,418 cropped images. It was divided into 80–20% for training and testing respectively. An average classification rate over 100 trials was computed by selecting the random images for training and testing.
11.2.2 Deep learning approach In this study, the DL approach using CNN for categorization of WBCs was experimented with by constructing a DN. The size of the dataset used for the conventional ML
182
Roopa B. Hegde et al.
approach was small for the DL approach using CNN. The size of the dataset was increased using augmentation. The dataset comprises 1,418 cell images. Out of 1,418 cell images, 221 images were reserved for testing the designed network. The training set consists of 1,197 cell images which are less for training a DN. Hence, to expand the size of the dataset, we performed data augmentation. We introduced color variations and brightness variations to the original images. These two methods were selected for data augmentation because color and brightness variations are common in PBS images. The augmented dataset consists of 2,697 cropped images with 300 lymphocytes, 250 monocytes, 500 neutrophils, 300 eosinophils, 201 basophils and 1,146 abnormal WBCs in the training set. The training set was used for full training CNN. The test set consisting of 221 images with 40 lymphocytes, 21 monocytes, 29 neutrophils, 16 eosinophils, 16 basophils and 99 abnormal WBCs was used to test the designed network. In this work, we designed a DN from scratch for the categorization of WBCs. The constructed DN using CNN is listed in Table 11.1. The six-layered network consists of four convolutions and two fully connected layers. The developed CNN will be referred to as “WBCnet.” In this design, a training set consisting of 2,696 images of size 227 × 227 × 3 was used. In this experimentation, classification of WBCs into six classes namely “lymphocytes, monocytes, neutrophils, eosinophils, basophils and abnormal WBCs” was considered. Hence, the final fully connected layer has six nodes. The DN was trained for 1,000 epochs with a minibatch size of 20. Table 11.1: CNN architecture (“WBCnet”). Layers
Details
Layer
Convolution layer [,], depth , stride (,), padding (,) ReLU Max-pooling layer (, stride )
Layer
Convolution layer [,], depth , stride (,), padding (,) ReLU Max-pooling layer (, stride )
Layer
Convolution layer [,], depth , stride (,), padding (,) ReLU Max-pooling layer (, stride )
Layer
Convolution layer [,], depth , stride (,), padding (,) ReLU
Layer
Fully connected layer ()
Layer
Fully connected layer ()
Chapter 11 Applications of conventional machine learning
183
11.3 Cervical screening Cervical cancer is one of the major causes of morbidity and mortality among women worldwide. VIA is a cost-effective screening technique for cervical cancer in poor resource environments. Pre-cancerous and cancerous lesions turn white after applying acetic acid which is called acetowhite (AW) lesions. Cervix images before and after applying acetic acid are shown in Figure 11.4.
Figure 11.4: Cervix images (a) before (b) after application of acetic acid.
The crucial features used by experts for deciding VIA positive lesions are intensity, margin and texture of AW lesion. Substantial expertise is required to identify these features. Hence, the person who is performing the test should have a high skill level to get accurate results. Further, “evaluation of VIA images suffers from high interobserver variability” [35]. The dataset comprises 2,198 images, with 1,108, VIA positive and 1,090 VIA negative. Out of these, 231 images were acquired in screening camps conducted by KMC, Manipal. The remaining images were collected from the National Cancer Institute (NCI) and National Institute of Health (NIH) database. The Institutional Ethics Committee approval was taken for image acquisition. Section 11.3.1 describes the methodology for cervical cancer screening using conventional ML and Section 11.3.2 describes the methodology for DL approaches using training and full training of a CNN.
11.3.1 Conventional machine learning approach Block diagram of cervix image categorization using conventional ML method is depicted in Figure 11.5.
Figure 11.5: Cervix image classification using conventional machine learning.
184 Roopa B. Hegde et al.
Chapter 11 Applications of conventional machine learning
185
Cervix images may comprise areas like vaginal folds, medical devices and cotton swabs, which are required to be eliminated from the analysis. The region of interest (ROI) would thus comprise only the cervix region. A method explained by Kudva, Prasad and Guruvare [36] was used for ROI detection. To understand the distinguishing features to be extracted, a detailed study of how the gynecological oncology experts manually evaluate the cervix during the VIA procedure was carried out which concluded that the categorization requires three types of information, namely “whiteness information, margin information and texture information” of AW regions. A study performed on color spaces revealed that a combination of saturation (S) component of HSV color representation, green component (G) of RGB color representation and the lightness (L) component of CIE-Lab color representation could highlight the acetowhite areas distinctly. To obtain a good distinction between AW and the background, we obtained a representation based on Equation (11.1), which we refer to as “feature image” F = ð1 − SÞ*G*L
(11:1)
Further, irrelevant background information in the feature image was eliminated by setting pixels in the feature image with a value less than or equal to the median value of pixels in F to zero which is referred to as “Acetowhite image” or “AW image.” Mean, standard deviation (SD), lower quartile (LQ), median, upper quartile (UQ) and interquartile range (IR) of acetowhite images were extracted using the technique proposed by Turkey [37] and were used as AW features for extracting whiteness information. AW image was filtered using contrast operator which computes the difference between the mean of those pixels whose intensity is greater than the center pixel and mean of those pixels whose intensity is less than the center pixel [38] to obtain the “margin image.” Several nonzero pixels in the margin image, LQ, median, UQ, IR, of the margin image are used to extract features of AW regions. Texture features such as SGLDM features [39], neighborhood gray-tone difference matrix features [40], and features based on neighboring gray level dependence matrix [41] were extracted from AW image and local binary pattern (LBP) image. Also, histogram features were extracted from the LBP image. Combining all the features, a total of 152 features were extracted which represent acetowhite, margin and texture features of the AW region and ranked using the ReliefF algorithm [42].
11.3.2 Deep learning approach To obtain better classification performance for the classification of uterine cervix images, a DL method was used. Both full training and transfer learning methods of DL were used. The full training approach involves designing and training of CNNs from scratch. One can design a deep or shallow layer CNN based on the number of layers included in the architecture. The transfer learning approach uses a pretrained CNN for a different application.
186
Roopa B. Hegde et al.
11.3.2.1 Full training In the full training approach, we considered two kinds of CNN architectures: deep layer and shallow layer, the architecture of which are listed in Tables 11.2 and 11.3 respectively. The architecture of the deep layer CNN is similar to the AlexNet architecture. Filter weights in all convolutional layers were initialized to random numbers from a zero-mean Gaussian distribution with a SD of 0.01. The CNN was trained for 100 epochs using a stochastic gradient descent algorithm taking a batch size of 50 and momentum of 0.9. A learning rate of 0.001 was used for all layers during training. Even for shallow layer, CNN filter weights were initialized to random numbers in the same manner as in deep layer CNN. Similar training parameters as in the deep training network were used. The same number of training and testing data as in deep trained CNN was used.
11.3.2.2 Transfer learning using fine-tuning AlexNet was used for transfer learning as it has relatively a smaller number of layers when compared to other pre-trained networks. As cervix image classification is a two-class problem, the final fully connected layer is replaced with a new fully connected layer with two neurons. Fine-tuning was initiated at the last layer and the subsequent layers were included incrementally until the desired performance was achieved. Experimentation was carried out with different batch sizes, depths and training epochs. The final fully connected layer was trained using a learning rate of 0.01 and other layers with 0.001. Parameters that provided the best performance were selected. The details of these experiments and the results were provided by Kudva, Prasad and Guruvare [43]. Table 11.2: Deep layer CNN architecture. Layer number
Layer name
Layer details
Input
× × , “zerocenter” normalization
conv
× convolutions, stride [ ], padding [ ]
relu
ReLU
pool
× max pooling, stride [ ], padding [ ]
conv
× convolutions, stride [ ], padding [ ]
relu
ReLu
pool
× max pooling, stride [ ], padding [ ]
Chapter 11 Applications of conventional machine learning
Table 11.2 (continued) Layer number
Layer name
Layer details
conv
× convolutions, stride [ ], padding [ ]
relu
ReLU
conv
× convolutions, stride [ ], padding
relu
ReLU
conv
× convolutions, stride [ ], padding [ ]
relu
ReLU
pool
× max pooling, stride [ ], padding [ ]
fc
, fully connected layer
relu
ReLU
fc
, fully connected layer
relu
ReLU
fc
fully connected layer
SM
Softmax layer
output
Classification output
Table 11.3: Shallow layer CNN architecture. Layer number
Layer name
Layer details
Input
× × , “zerocenter” normalization
conv
× convolutions, stride [ ], padding [ ]
relu
ReLU
pool
× max pooling, stride [ ], padding [ ]
conv relu
× convolutions, stride [ ], padding [ ] ReLu
pool
× max pooling, stride [ ], padding [ ]
fc
fully connected layer
relu
ReLU
fc
fully connected layer
prob
Softmax layer
output
Classification output
187
188
Roopa B. Hegde et al.
11.4 Results and discussion The outcome of the conventional ML and DL approaches for the two kinds of applications are discussed in this section. The outcome of categorization of leukocytes using both the approaches is discussed in Section 11.4.1 and results obtained for the classification of uterine cervix images are discussed in Section 11.4.2.
11.4.1 Results of classification of WBCs Classification of WBCs into lymphocytes, monocytes, neutrophils, eosinophils, basophils and abnormal WBCs was performed by using both conventional ML and DL approaches.
11.4.2 Results of conventional ML Segmentation is a two-step process consisting of the segmentation of nuclei and the segmentation of WBCs. We computed the mean and SD of the performance metrics and are listed in Table 11.4. The average accuracy of detection of nuclei method and WBC detection methods are 98% and 96% respectively. Segmentation of nuclei was more accurate compared to that of WBCs. This is because nuclei appear darker in PBS images and cytoplasm regions appear pale, and hard to distinguish from the background. Table 11.4: Performance of the segmentation method. Measures
Nuclei detection
WBC detection
Mean (%)
SD
Mean (%)
SD
Dice score
.
.
Accuracy
.
.
Precision rate
.
.
Recall rate
.
.
Classification is also a two-step process. In the first pace, detection of basophils was carried out based on the texture features of the nuclei. Accuracy of 100% and 99.6% were obtained for training and testing respectively. One basophil was misclassified as other WBCs. The mean and SD off accuracy, sensitivity and specificity of the NN classifier are listed in Table 11.5. An overall accuracy of 99.4% was obtained.
Chapter 11 Applications of conventional machine learning
189
Table 11.5: Performance of the classifier. WBC type
Accuracy
Sensitivity
Specificity
Mean (%)
SD
Mean (%)
SD
Mean (%)
SD
Lymphocyte
.
.
.
.
Monocyte
.
.
.
.
.
.
Neutrophil
.
.
.
.
.
Eosinophil
.
.
.
.
.
.
Abnormal
.
.
.
.
.
.
11.4.2.1 Results of DL In full training CNN, a training set consisting of 2,697 cropped images was used for performance evaluation. This dataset was further divided into training and validation sets. We considered 80% of the training set for training and the remaining 20% for validation. We obtained a training accuracy of around 99% for the designed “WBCnet.” Further, the saved “WBCnet” was tested using the test set consisting of 221 cropped images. The training, validation and testing performances are given in Table 11.6. It can be observed from the table that performance values for the training and testing sets are almost similar. Overall classification accuracy of around 98% was obtained for the classification of WBCs. Table 11.6: Performance of “WBCnet.” Accuracy (%)
Sensitivity (%)
Specificity (%)
Training
.
.
.
Validation
.
.
.
Testing
.
.
.
Overfitting is a common problem in the DL method. This information whether the designed network suffers from overfitting or not can be obtained by visualizing the relation between training and validation performances in terms of accuracy and loss. Accuracy and loss plots are shown in Figure 11.6. In the figure, plot (a) indicates the accuracy of training and validation, training accuracy is represented in blue whereas validation accuracy is represented in black. The plots shown in (b) indicate training and validation losses which are represented in orange and black colors respectively. It can be observed from the curves that validation
Accuracy (%)
0
0.5
1
1.5
2
0
0
10
20
30
40
50
60
70
80
90
100
0
100
100
2
2
200
200
300
300
4
4
400
400
(b)
Iteration
500
(a)
Iteration
500
6
6
600
600
700
700
Figure 11.6: Evaluation of training performances of “WBCnet”: (a) plot for accuracy and (b) plot for loss.
Loss
8
8
800
800
10
10
900
900
x104
x104
Final 1000
1000
Final
190 Roopa B. Hegde et al.
Chapter 11 Applications of conventional machine learning
191
accuracy and loss follow the training accuracy and loss. Also, the loss is reducing and converging. This indicates that the obtained results are free from overfitting.
11.4.2.2 Results of cervical cancer screening For both conventional ML and DL, we have used 2,198 images out of which 85% were used for training and 15% for testing. Conventional ML method for classification of uterine cervix images provided an accuracy of 77%, a sensitivity of 84% and a specificity of 70%. Results of full training using deep and shallow CNN architectures are shown in Table 11.7. Table 11.7: Classification results of deep CNN and shallow CNN. CNN architecture
Accuracy (%)
Sensitivity (%)
Specificity (%)
Deep CNN
Shallow CNN
For transfer learning method using fine-tuning experimentation with parameters such as batch size, several layers to be trained and training epochs indicated that a batch size of 50, epoch of 700 and tuning last two fully connected layers provided optimum results. This method provided an accuracy of 93.3%, a sensitivity of 93.3% and a specificity of 93.4%.
11.5 Discussion It can be noticed from the results of conventional ML that, classification of WBCs was more accurate compared to that of cervix images. This is because the features of WBCs are visibly clear, whereas the features of cervix images are not visibly clear. Extraction of features is challenging in this case. Hence, conventional ML is suitable for the classification of WBCs. However, segmentation of cytoplasm regions needs to be performed accurately to obtain accurate classification result. PBS consists of blood cells and these cells are evaluated for ruling out any abnormality. This is laborious and results depend on the skill of the person. Hence, automation of PBS analysis helps in providing objective results and this reduces the burden on pathologists. The three types of blood cells RBCs, leukocytes and platelets differ in shape, size, color and texture. These differences can be observed among the types of WBCs. These features are visibly clear. This helps in the development of an automated system that could be used as a decision support system. The design of computerized
192
Roopa B. Hegde et al.
systems requires a large training dataset for accurate results. PBS image consists of a large number of RBCs, few WBCs and platelets. Hence, many sub-images can be obtained from a PBS image which helps in building a large dataset. In the case of PBS images, an adequate number of images can be obtained but segmentation of blood cells is a difficult task due to color and brightness variations. This affects the classification result in the case of conventional ML. Also, this requires images captured under a controlled environment such as uniform lighting conditions. Therefore, accurate segmentation of ROI is a significant step in developing an automated system. This can be overcome by designing a DN based on the availability of the resources. Acquired cervix images were of dimension 4,164 × 3,120, and NCI/NIH images had a dimension of 2,891 × 1,973. During ROI detection, to increase the speed of the algorithm, images were resized. Also, to suit the size of the input layer size of CNN ROI detected images were further resized to 227 × 227. This leads to the loss of information. Getting the uterine cervix images is very hard as women are reluctant to volunteer for imaging. We have interacted with various gynecologists to understand how they evaluate cervix images as positive or negative. From this interaction, we observed that there were a lot of differences of opinion among them on the relevant features for classification as positive or negative. There are several reasons for which there can be white patches on the cervix after the application of acetic acid. All the white patches on the cervix are not necessarily due to cervical cancer. As a result of the complex, not clear nature of features, feature engineering is difficult in the case of cervix image analysis requiring a higher amount of heuristic evaluation. In this study, the classification of WBCs and categorization of cervix images using conventional ML and DL approaches were considered. We illustrated the results for the categorization of leukocytes using a hybrid classifier consisting of an SVM and a NN. Also, the design of a DN using CNN was considered for the categorization of leukocytes. Segmentation of nuclei and WBCs were performed. The overall accuracy of 98% and 96% were obtained for the segmentation of nuclei and WBCs respectively. The SVM classifier was used for the detection of basophils using the texture features derived from SGLDM. NN classifier was used for the classification of WBCs into lymphocytes, monocytes, neutrophils, eosinophils and abnormal WBCs. The overall accuracy of 99.6% was obtained for the classification of WBCs. It is a very complex task to evaluate the cervix images due to ambiguous features involving a lot of heuristics. This may be the reason for the poor performance of about 77% accuracy for the conventional ML method. Since DL is an approach that automates feature engineering, we considered DL. Also, we wanted to see how CNN performs for the limited dataset for uterine cervix image analysis. Both full training and transfer learning perform better than conventional ML due to the automation of feature engineering. Full training performance is poor when compared to shallow layer CNN and transfer learning. This may be due to the limited dataset. Transfer learning outperforms due to the availability of a large number of pre-learnt
Chapter 11 Applications of conventional machine learning
193
features at different layers of AlexNet. We obtained an accuracy of 93% by using the transfer learning approach. In this study, better performance was observed for the case of classification of cervix images. This is because CNN operates directly on original images. Hence, it is suitable for smaller size images rather than resizing larger images to a smaller size. There is a possibility of a loss of features by reducing the size of images. Hence, we cropped the PBS images to get small images consisting of a single WBC in them. Whereas the entire region in the cervix images was ROI. Hence, we had to use the resized image as input in this case.
11.6 Observations from comparison It can be observed that PBS analysis is a lower complexity task, with distinct features for the various cell types and hence is suitable for the ML approach. Also, individual cells are a small and large number of images are available. Thus, it is suitable for the DL approach too. In the case of cervical cancer screening, the identification of suitable features is a complex task. Mimicking the heuristic evaluation involved in decision-making is not easy and hence the performance of the ML approach is low. However, the DL approach offers a significant increase in performance with an accuracy of around 80%. In conventional ML, classification results depend on the intermediate steps. Segmentation of ROI plays a significant role in extracting relevant features. Identification and selection of appropriate features is an important step for obtaining accurate classification results. Training images are required for designing a classifier. To evaluate the segmentation and classification results, obtaining ground truth plays an important role in designing any decision support system. In DL, feature engineering is not required. The network performs the task of identification of suitable features. DNs require a large number of training images and the ground truth. Also, the DL approach is resource-intensive computation, limitation on the size of images being processed requiring resizing of images in case of big images lead to loss of information. In this work, we demonstrated the performances of conventional ML and DL for the two applications. DL requires high computation resources compared to that of conventional ML. Hence, when feature engineering is not complex and computation resource is a constraint conventional ML can be used. Also, this method may be used whenever it performs equally well in both approaches. DL offers automated identification of suitable features but requires high resources. Hence, DL can be used whenever identification of features is a difficult task and computation resource is not a constraint. However, when data availability is less transfer
194
Roopa B. Hegde et al.
learning or shallow network can be chosen instead of a highly complex network. Full training could be preferred when the availability of the dataset is not a constraint.
11.7 Conclusion In the present study, an effort is made to investigate the suitability of conventional ML and DL methods for two different kinds of applications namely PBS images and cervix image analysis. Also, we discuss the challenges and applicability of these techniques in analyzing medical images. To bring out the usability of these two techniques, we carried out experiments by considering the two kinds of applications separately. One kind is the classification of WBCs from PBS images which was a case in which there was a sufficient number of cell images with clearly visible features. In contrast with this, cervical cancer screening is a complex case in which features are not visibly clear. Also, obtaining a sufficient number of cervix images is a difficult task. We conclude that DL could be used for those cases where feature engineering is a difficult task. This is because conventional ML fails in such cases. Conventional ML is suitable if the size of the dataset is small and the features are clear. However, conventional ML and DL are equally useful for those cases where features are visible. In addition to this, DL is suitable for cases with small image size.
References [1]
[2]
[3]
[4]
[5]
[6]
Wason, J. V. and Nagarajan, A. (2019). Image processing techniques for analyzing CT scan images towards the early detection of lung cancer. Bioinformation, 15(8), 596–599. Doi: 10.6026/97320630015596, PMID: 31719770; PMCID: PMC6822523. Suren Makaju, P. W. C., Prasad, A. A., Singh, A. K., and Elchouemi, A. (2018). Lung cancer detection using CT Scan images. Procedia Computer Science, 125, 107–114. Doi: https://doi.org/ 10.1016/j.procs.2017.12.016. Nadkarni, N. S. and Borkar, S., Detection of lung cancer in CT images using image processing, Third International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, pp. 863–866, Doi: 10.1109/ICOEI.2019.8862577, 2019. Vadmal, V., Junno, G., Badve, C., Huang, W., Waite, K. A., and Barnholtz-Sloan, J. S. (2020). MRI image analysis methods and applications: an algorithmic perspective using brain tumors as an exemplar. Neuro-Oncology Advances, 2(1), January-December, vdaa049, Doi: https:// doi.org/10.1093/noajnl/vdaa049. Ray, A. K. and Thethi, H. P. (2017). Image analysis for mri based brain tumor detection and feature extraction using biologically inspired BWT and SVM. International Journal of Biomedical Engineering, 2017.,Article ID 9749108, Doi: https://doi.org/10.1155/2017/9749108. Martin, C., Masa, H., Gerald, Y., and Spencer, D. (2018). Survey of image processing techniques for brain pathology diagnosis: Challenges and opportunities. Frontiers in Robotics and AI, 5. Doi: 10.3389/frobt.2018.00120, ISSN=2296-9144.
Chapter 11 Applications of conventional machine learning
[7] [8]
[9]
[10]
[11]
[12]
[13]
[14] [15]
[16]
[17]
[18] [19]
[20] [21]
[22]
[23]
195
Komura, D. and Ishikawa, S. (2018). Machine learning methods for histopathological image analysis. Computational and Structural Biotechnology Journal, 16, 34–42. Aeffner, F., Zarella, M. D., Buchbinder, N., Bui, M. M., Goodman, M. R., Hartman, D. J., Lujan, G. M., Molani, M. A., Parwani, A. V., Lillard, K., Turner, O. C., Vemuri, V. N., Yuil-Valdes, A. G., and Bowman, D. (2019). Introduction to digital image analysis in whole-slide imaging: A white paper from the digital pathology association. Journal of Pathology Informatics, 9. Rodellar, J., Alférez, S., Acevedo, A., Molina, A., and Merino, A. (2018). Image processing and machine learning in the morphological analysis of blood cells. International Journal of Laboratory Hematology, Doi: 10.1111/ijlh.12818. Delgado-Font, W., Escobedo-Nicot, M., González-Hidalgo, M. et al. (2020). Diagnosis support of sickle cell anemia by classifying red blood cell shape in peripheral blood images. Medical and Biological Engineering and Computing, 58, 1265–1284. Doi: https://doi.org/10.1007/ s11517-019-02085-9. Biji, G. and Hariharan, S. An efficient peripheral blood smear image analysis technique for Leukemia detection, International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, pp. 259–264, Doi: 10.1109/I-SMAC.2017.8058350, 2017. Hegde, R. B., Prasad, K., Hebbar, H. et al. (2020). Automated decision support system for detection of leukemia from peripheral blood smear images. Journal of Digital Imaging, 33, 361–374. Doi: https://doi.org/10.1007/s10278-019-00288-y. Kazemi, F., Najafabadi, T. A., and Araabi, B. N. (2016). Automatic recognition of acute myelogenous leukemia in blood microscopic images using k-means clustering and support vector machine. Journal of Medical Signals and Sensors, Jul-Sep, 6(3), 183–193. PMID: 27563575; PMCID: PMC4973462. Shafique, S. and Tehsin, S. (2018). Computer-aided diagnosis of acute lymphoblastic leukaemia, Computational and Mathematical Methods in Medicine. Dhal, K. G., Gálvez, J., Ray, S. et al. (2020). Acute lymphoblastic leukemia image segmentation driven by stochastic fractal search. Multimedia Tools Applications, 79, 12227–12255. Doi: https://doi.org/10.1007/s11042-019-08417-z. Meghana, M. R. and Prabhu, A. (2019). An efficient technique for identification of leukemia in microscopic blood samples using image processing. International Journal of Research in Pharmaceutical Sciences, 10(3), 2409–2416. Doi: https://doi.org/10.26452/ijrps.v10i3.1487. Xu, T., Zhang, H., Xin, C., Kim, E., Long, L. R., Xue, Z., Antani, S., and Huang, X. (2017). Multifeature based benchmark for cervical dysplasia classification evaluation. Pattern Recognition, 63, 468–475. Kudva, V., Prasad, K., and Guruvare, S. (2018, October). Android device-based cervical cancer screening for resource-poor settings. Journal of Digital Imaging, 31(5), 646–654. Zhang, X. and Zhao, S. (2019). Cervical image classification based on image segmentation preprocessing and a CapsNet network model. International Journal of Imaging Systems and Technology, 29, 19–28. Kudva, V., Prasad, K., and Guruvare, S. (2020). Hybrid transfer learning for classification of uterine cervix images for cervical cancer screening. Journal of Digital Imaging, 33, 619–631. Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., Van Der Laak, J. A. W. M., Van Ginneken, B., and Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60–88. Hesam, H. M., Wenjing, J., Xiangjian, H., and Paul, K. (2019). Deep learning techniques for medical image segmentation: Achievements and challenges. Journal of Digital Imaging, 32, 582–596. Weiming, L., Tong, T., Qinquan, G., Guo, D., Xiaofeng, D., Yonggui, Y., Gang, G., Min, X., Min, D., and Xiaobo, Q. (2018). The Alzheimer’s disease neuroimaging initiative, convolutional
196
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32] [33]
[34]
[35]
[36]
[37] [38]
Roopa B. Hegde et al.
neural networks-based MRI image analysis for the Alzheimer’s disease prediction from mild cognitive impairment. Frontiers in Neuroscience, 12(13), 777–790. https://www.frontiersin. org/article/10.3389/fnins.2018.00777, Doi: 10.3389/fnins.2018.00777,. Guangming, Z., Bin, J., Liz, T., Yuan, X., Greg, Z., and Max, W. (2019). Applications of deep learning to neuro-imaging techniques. Frontiers in Neurology, 10(13), 869–882. https://www. frontiersin.org/article/10.3389/fneur.2019.00869, Doi: 10.3389/fneur.2019.00869,. Juanying, X., Ran, L., Joseph, L., and Chaoyang, Z. (2019). Deep learning based analysis of histopathological images of breast cancer. Frontiers in Genetics, 10, https://www.frontiersin. org/article/10.3389/fgene.2019.00080, Doi: 10.3389/fgene.2019.00080,. Iizuka, O., Kanavati, F., Kato, K., Rambeau, M., Arihiro, K., and Tsuneki, M. (2020). Deep learning models for histopathological classification of gastric and colonic epithelial tumours. Scientic Reports, 10(1504), Doi: https://doi.org/10.1038/s41598-020-58467-9. Alzubaidi, L., Fadhel, M. A., Al-Shamma, O., Zhang, J., and Duan, Y. (2020). Deep learning models for classification of red blood cells in microscopy images to aid in sickle cell anemia diagnosis. Electronics, 9. Doi: https://doi.org/10.3390/electronics9030427. Ahmed, N., Yigit, A., Isik, Z., and Alpkocak, A. (2019). Identification of leukemia subtypes from microscopic images using convolutional neural network. Diagnostics (Basel), 9(3), Doi: 10.3390/diagnostics9030104. Shafiquen, S. and Tehsin, S. (2018). Acute lymphoblastic leukemia detection and classification of its subtypes using pretrained deep convolutional neural networks. Technology in Cancer Research and Treatment, 17, 1533033818802789. Doi: 10.1177/ 1533033818802789. Sahlol, A. T., Kollmannsberger, P., and Ewees, A. A. (2020). Efficient classification of white blood cell leukemia with improved swarm optimization of deep features. Scientific Reports, 10, 2536. Doi: https://doi.org/10.1038/s41598-020-59215-9. Wang, Q., Bi, S., Sun, M., Wang, Y., Wang, D., and Yang, S. (2019). Deep learning approach to peripheral leukocyte recognition. PLoS ONE, 14(6), e0218808. Doi: https://doi.org/10.1371/ journal.pone.0218808. Alyafeai, Z. and Ghouti, L. (2020). A fully-automated deep learning pipeline for cervical cancer classification. Expert Systems with Applications, 141, 112951. Hu, L., Bell, D., Antani, S., Xue, Z., Yu, K., Horning, M. P., Gachuhi, N., Wilson, B., Jaiswal, M. S., Befano, B., Long, L. R., Herrero, R., Einstein, M. H., Burk, R. D., Demarco, M., Gage, J. C., Rodriguez, A. C., Wentzensen, N., and Schiffman, M. (2019, September). An observational study of deep learning and automated evaluation of cervical images for cancer screening. JNCI: Journal of the National Cancer Institute, 111(9), 923–932. Doi: https://doi.org/10.1093/jnci/djy225. Miao, W., Yan, C., Liu, H., Liu, Q., and Yin, Y. (2018). Automatic classification of cervical cancer from cytological images by using convolutional neural network. Bioscience Reports, 38(6), BSR20181769. Doi: https://doi.org/10.1042/BSR20181769. Vidya, K., Shyamala, G., Keerthana, P., Kiran, A. K., Premalatha, T. S., Asha, K., Suma, N., and Chythra, R. R. (2019, September). Interobserver variability among gynecologists in manual cervix image analysis for detection of cervical epithelial abnormalities. Journal of Clinical Epidemiology and Global Health, 7(3), 500–503. Kudva, V., Prasad, K., and Guruvare, S. (2017). Detection of specular reflection and segmentation of cervix region in uterine cervix images for cervical cancer screening. IRBM, 38, 218–291. Tukey, J. W. Exploratory Data Analysis. Addison-Wesley, 1977. Ojala, T., Pietikainen, M., and Maenpaa, T. (2002, August). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.
Chapter 11 Applications of conventional machine learning
197
[39] Haralick, R. M., Shanmugan, K., and Dinstein, I. (1973). Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics SMC, 3(6), 610–621. [40] Amadasun, M. and King, R. (1989). Textural features corresponding to textural properties. IEEE Transactions on Systems, Man, and Cybernetics, 19(5), 1264–1274. [41] Sun, C. and Wee, W. G. (1982). Neighboring gray level dependence matrix. Computer Vision, Graphics, and Image Processing, 23(3), 341–352. [42] Robnik Sikonjaigor, M. and Kononenko, I. (2003, October). Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 53(1–2), 23–69. [43] Kudva, V., Prasad, K., and Guruvare, S. (2019, December). Transfer Learning for Classification of Uterine Cervix Images for Cervical Cancer Screening. In: Kalya, S., Kulkarni, M., and Shivaprakasha, K. (Eds.) Advances in Communication, Signal Processing, VLSI and Embedded Systems, Lecture Notes in Electrical Engineering, Vol. 614, Springer, Singapore.
Index 5G communication 6 Abstract 103 Academic performance 24 ADF 37 Advanced metering integrated 54 Agriculture management 131 AI 35 AI/ML 9 AIC 41 Algorithms 147 Alumni management 25 ANN 41, 45, 135, 137–138, 142, 148 Architecture components 7 ARIMA 32, 37, 39, 41, 46, 49 ARIMA index 39 ARMA 39 ARMA(p,q) 40 artificial neural network (ANN) 32, 115, 162 ASIC 41 Atmospheric parameters 134 autoregressive integrated moving average (ARIMA) 32 BIC 41 Box 37 Bridge health monitoring 9 B-spline 150 Case study 175 Cervical cancer screening 175 Challenges 17 Classification 175 Classification tree 135 CNN 41 CO2 emission 31 Computer-aided diagnosis 175 Conventional machine learning 175, 176 Conventional method 136 Convolutional neural network 60, 105 Crop management 99 Cyclical 33
Data science 162 Data scientist 31, 32, 35, 50, 167 Data transmission and storage 9 Data-driven decisions 130 Dataset 53, 57, 134–135, 137, 141 Decentralized 34 Decision tree 83, 118, 142 Deep learning 53, 142, 175 Diebold Mariano 49 Disease detection 136, 138 Durbin Watson 37 Edge artificial intelligence 12 Education sector 27 Education system 17, 18, 23, 27, 28 Electric vehicles 67 Electricity 36–37, 39, 43 Energy 57 Ensemble classifier 88 Environmental impacts 130, 142 Evapotranspiration 140 Extended Kalman 148 Facial expression analysis 77 Field sensors 131 Fine-tuning 186 Forecasting 33, 35–36, 39–40, 43 Gateway 7 GPU 31–33, 35, 38, 40–41 Healthcare 12 Higher education 17 Image processing 78, 97, 99–100, 102, 104 Image retrieval 78 Internet of nanothings (IoNT) 1 Internet of things (IOT) 1, 34–35, 130, 141 IoT enabled 164 Jenkins 37 KNN 138–139
Data analysis and forecasting 31 Data analytics 130 Data model 115
https://doi.org/10.1515/9783110702514-012
Leaf image 97 Linear regression 80, 162
200
Index
Load analysis 53 Load forecasting 53 Load management 53 Load profiling 61 Long short-term memory (LSTM) 32, 41, 44, 49 LSTM RNN 46 MA(q) 39 Machine learning (ML) 35, 41, 53, 148 Machine learning model 97, 101 MAE 46 MAPE 46 ME 46 Microgrid 34, 58 MLP 32, 41, 49 MLP ANN 46 MPE 46 Naïve Bayes classification 82 Network architecture 8 PACF 39 Parametric data 141 Peripheral blood smear 175 Personalizing learning 23 Precision farming 129 Prediction 97, 102, 108 Predictive analytic 116 Preprocessing 97, 100–101, 103, 108 Prior literature 3 Random forest 84 Regression 137, 141–142 ReLU 41 Renewable energy source integration 31 Residual 37, 39 RMSE 46 RNN 32
Seasonal 35 Segmentation 188 Sensor interfacing 149 SLAM 147 SLAM algorithms 147 Smart grid 31, 32, 34, 35, 50, 53–55, 62, 65, 69–71 Smart irrigation 141–142 Smart meter data 53 Smart meters 53 Soft computing 142 Soil farming 133 Soil management 131 Solar photovoltaic panels 32 Standard deviation 188 Stationary 35–36, 38–39, 41 Student recruitment 21 Supervised learning 79, 115 Support vector machine 85, 120, 137 Sustainable agriculture 130, 132, 136, 138 Sustainable farming 130 Sustainable irrigation 140 TensorFlow 162 time series 33, 35–36 TPU 41 Transfer learning 186 trend 32, 36–37 TS 32, 37 validation 97, 102, 105, 107 Water management 139–140 Weather data 164 Weather forecasting 161 Weather prediction 162 White blood cells 178 Yield prediction 131, 136–137 YOLO V4 127
De Gruyter Frontiers in Computational Intelligence Already published in the series Volume 8: Internet of Things and Machine Learning in Agriculture. Technological Impacts and Challenges Jyotir Moy Chatterjee, Abhishek Kumar, Pramod Singh Rathore, Vishal Jain (Eds.) ISBN 978-3-11-069122-1, e-ISBN (PDF) 978-3-11-069127-6, e-ISBN (EPUB) 978-3-11-069128-3 Volume 7: Deep Learning. Research and Applications Siddhartha Bhattacharyya, Vaclav Snasel, Aboul Ella Hassanien, Satadal Saha, B. K. Tripathy (Eds.) ISBN 978-3-11-067079-0, e-ISBN (PDF) 978-3-11-067090-5, e-ISBN (EPUB) 978-3-11-067092-9 Volume 6: Quantum Machine Learning Siddhartha Bhattacharyya, Indrajit Pan, Ashish Mani, Sourav De, Elizabeth Behrman, Susanta Chakraborti (Eds.) ISBN 978-3-11-067064-6, e-ISBN (PDF) 978-3-11-067070-7, e-ISBN (EPUB) 978-3-11-067072-1 Volume 5: Machine Learning Applications. Emerging Trends Rik Das, Siddhartha Bhattacharyya, Sudarshan Nandy (Eds.) ISBN 978-3-11-060853-3, e-ISBN (PDF) 978-3-11-061098-7, e-ISBN (EPUB) 978-3-11-060866-3 Volume 4: Intelligent Decision Support Systems. Applications in Signal Processing Surekha Borra, Nilanjan Dey, Siddhartha Bhattacharyya, Mohamed Salim Bouhlel (Eds.) ISBN 978-3-11-061868-6, e-ISBN (PDF) 978-3-11-062110-5, e-ISBN (EPUB) 978-3-11-061871-6 Volume 3: Big Data Security Shibakali Gupta, Indradip Banerjee, Siddhartha Bhattacharyya (Eds.) ISBN 978-3-11-060588-4, e-ISBN (PDF) 978-3-11-060605-8, e-ISBN (EPUB) 978-3-11-060596-9 Volume 2: Intelligent Multimedia Data Analysis Siddhartha Bhattacharyya, Indrajit Pan, Abhijit Das, Shibakali Gupta (Eds.) ISBN 978-3-11-055031-3, e-ISBN (PDF) 978-3-11-055207-2, e-ISBN (EPUB) 978-3-11-055033-7 Volume 1: Machine Learning for Big Data Analysis Siddhartha Bhattacharyya, Hrishikesh Bhaumik, Anirban Mukherjee, Sourav De (Eds.) ISBN 978-3-11-055032-0, e-ISBN (PDF) 978-3-11-055143-3, e-ISBN (EPUB) 978-3-11-055077-1
www.degruyter.com