137 33
English Pages 252 [243] Year 2023
Studies in Computational Intelligence 1105
Amina Adadi Saad Motahhir Editors
Machine Intelligence for Smart Applications Opportunities and Risks
Studies in Computational Intelligence Volume 1105
Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, selforganizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.
Amina Adadi · Saad Motahhir Editors
Machine Intelligence for Smart Applications Opportunities and Risks
Editors Amina Adadi Moulay Ismail University Meknes, Morocco
Saad Motahhir ENSA Sidi Mohamed Ben Abdellah University Fès, Morocco
ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-031-37453-1 ISBN 978-3-031-37454-8 (eBook) https://doi.org/10.1007/978-3-031-37454-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The intelligence exhibited by machines has evolved dramatically in the last decade. Infrastructure and computing power capabilities are advancing, algorithms and models are more sophisticated, and data is more available and accessible than ever before. Slowly, we are moving from the “early excitement” phase concerning the likely potential of Machine Intelligence (MI) to change society, to the new era of “mainstream adoption” where intelligent machines have expanded beyond research labs and pilot projects to become democratized in our daily lives, and ubiquitous in business settings. The current generation of intelligent machines is supported by a maturing set of technologies including deep learning, natural language processing, computer vision, swarm intelligence, and robotics. These techniques are capable of embedding cognitive functions into machines such as perception, logical reasoning, learning, problemsolving, social intelligence, and even creativity, which in return provides innovative opportunities to businesses across industries for smarter, more reliable, and environmentally sustainable applications. A broad range of MI applications related to smart cities, smart health care, smart environment, smart manufacturing, and many others are paving the road toward a fully Smart World. Potentially, Artificial Intelligence (AI) is expected to be the next General-Purpose Technology (GPT). Over history, GPTs have been characterized by two features, which are the main focus of this book: (i) not only having a pervasive use in a wide range of socio-economic sectors, (ii) but also being associated with spillover effects and risks with socio-economic impacts. Indeed, as usage is growing, so is the awareness of the various risks, challenges, and limitations related to MI adoption. Key risks related to the absence of regulation, discriminatory biases, ethics, trust, privacy, safety, security, human workforce loss, and other negative outcomes need to be thoroughly discussed and addressed in order to scale the use of MI, especially in safety-critical applications. This book provides insights into the Machine Intelligence field, its underlying technologies, its real-world implementations, and its associated challenges; all these
v
vi
Preface
aspects are explored through the lens of smart applications. The proposed book has threefold objectives: (1) presenting a comprehensive background of the latest advances in MI and related technologies both existing and budding ones. (2) navigating the landscape of the most recent, prominent, and impactful MI smart applications. The focus is very much on practical application; the book will feature examples of innovative applications and real-world case studies. The broad set of smart applications for MI is partitioned into four themes covering all areas of the economy and social life, namely (i) (ii) (iii) (iv)
Smart Environment, Smart Social Living, Smart Business and Manufacturing, and Smart Government.
(3) finally, identifying risks and challenges that are, or could be, slowing down overall MI mainstream adoption and innovation efforts, and discussing potential solutions to address these limitations. Accordingly, the contributions are organized into three topical parts. Chapter “Application of Machine Intelligence in Smart Societies: A Critical Review of the Opportunities and Risks” is an introductory survey on the topic of MI and its application to Smart Societies. This survey discusses the opportunities as well as the risks related to AI mainstream adoption. The chapter also proposes potential solutions to mitigate risks from AI-enabled innovation. The first part Machine Intelligence for Smart Environment begins with a systematic review of crop selection solutions based on recommender systems, Chapter “Machine Learning Based Recommender Systems for Crop Selection: A Systematic Literature Review”. Chapter “Convolutional Neural Network for Identification and Classification of Weeds in Buckwheat Crops” proposes a model for the classification of weeds in crops based on ResNet-18. In Chapter “Cluster Analysis as a Tool for the Territorial Categorization of Energy Consumption in Buildings Based on Weather Patterns”, the K-means algorithm is introduced as a relevant and efficient technique for clustering the energy consumption patterns of State Social Housing in Mexico. Second Part Machine Intelligence for Smart Social Living includes contributions addressing education, health care and social issues. In Chapter “A Quantum Machine Learning Model for Medical Data Classification”, a quantum support vector classifier is tested on different medical datasets; different analyses are presented in terms of prediction accuracy and computational time to prove the performance of this technique in medical settings. Chapter “Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB): Perspectives and Implications” proposes a comprehensive overview of the impact of Internet of Behavior on human decisionmaking and the ethical implications of such influence. Chapter “A Machine Learning Based Approach for Diagnosing Pneumonia with Boosting Techniques” proposes an
Preface
vii
ensemble learning model for pneumonia detection. Chapter “Harnessing the Power of ChatGPT for Mastering the Maltese Language: A Journey of Breaking Barriers and Charting New Paths” conducts and reports on an experiment to assess the capability of AI and machine learning technology, specifically ChatGPT, in helping learn the Maltese language. Third Part Machine Intelligence for Smart Business and Manufacturing covers industrial applications of AI. In this vein, Chapter “A New Autonomous Navigation System of a Mobile Robot Using Supervised Learning” introduces a navigation model based on XGBoost classifier for an industrial robot. Chapter “Evolutionary AI-Based Algorithms for the Optimization of the Tensile Strength of Additively Manufactured Specimens” addresses 3D printing technology, AI-based tools for heatmapping, and evolutional algorithms are applied in the process of Additively Manufactured Specimens to identify optimal input and output parameters for the tensile specimens. Chapter “Securing Data Conveyance for Dynamic Source Routing Protocol by Using SDSR-ANNETG Technique” proposes a Neural Networks-based approach to select the secure and optimized route between nodes in a Mobile Ad hoc network. Fourth Part Machine Intelligence for Smart Government illustrates the use of MI in e-government in Chapter “Regional Language Translator and Event Detection Using Natural Language Processing”. In this contribution, a Natural Language Processing (NLP) interpreter is developed to translate Indian government documents from Hindi/ English to other local dialects in order to help citizens who are incapable to understand these languages. As conceived, this book is aimed at researchers and postgraduate students in applied artificial intelligence and allied technologies. The book is also valuable for practitioners; it serves as a bridge between researchers and practitioners. It will also connect researchers interested in MI technologies who come from different social and business disciplines and who can benefit from sharing ideas and results. We would like to take the opportunity to express our thanks to the contributing authors for their precious collaboration. But for their contributions, this initiative could not have become a reality. We would also like to thank the Studies in Computational Intelligence book series as a publication partner for their guidance and assistance, a special thanks to Dr. Thomas Ditsinger Springer, Editorial Director (Interdisciplinary Applied Sciences) and Prof. Janusz Kacprzyk (Series Editor-in-Chief), and Mr. Mohammed Ashraf Fareed (Springer Project Coordinator), for the editorial assistance throughout the book production process. Meknes, Morocco Fès, Morocco
Prof. Amina Adadi Prof. Saad Motahhir
Contents
Application of Machine Intelligence in Smart Societies: A Critical Review of the Opportunities and Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oluibukun Gbenga Ajayi
1
Machine Intelligence for Smart Environment Machine Learning Based Recommender Systems for Crop Selection: A Systematic Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . Younes Ommane, Mohamed Amine Rhanbouri, Hicham Chouikh, Mourad Jbene, Ikram Chairi, Mohamed Lachgar, and Saad Benjelloun Convolutional Neural Network for Identification and Classification of Weeds in Buckwheat Crops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Riksen and V. Shpak Cluster Analysis as a Tool for the Territorial Categorization of Energy Consumption in Buildings Based on Weather Patterns . . . . . . . O. May Tzuc, M. Jiménez Torres, Carolina M. Rodriguez, F. N. Demesa López, and F. Noh Pat
21
61
73
Machine Intelligence for Smart Social Living A Quantum Machine Learning Model for Medical Data Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hamza Kamel Ahmed, Baraa Tantawi, Malak Magdy, and Gehad Ismail Sayed
95
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB): Perspectives and Implications . . . . . . . . . . . . . . . . . . . . 115 Robertas Damaševiˇcius, Rytis Maskeli¯unas, and Sanjay Misra A Machine Learning Based Approach for Diagnosing Pneumonia with Boosting Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 A. Beena Godbin and S. Graceline Jasmine ix
x
Contents
Harnessing the Power of ChatGPT for Mastering the Maltese Language: A Journey of Breaking Barriers and Charting New Paths . . . 161 ˙ Jacqueline Zammit Machine Intelligence for Smart Business and Manufacturing A New Autonomous Navigation System of a Mobile Robot Using Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Jawad Abdouni, Tarik Jarou, Abderrahim Waga, Younes El koudia, Sofia El Idrissi, and Sabah Loumiti Evolutionary AI-Based Algorithms for the Optimization of the Tensile Strength of Additively Manufactured Specimens . . . . . . . . . 195 Akshansh Mishra, Vijaykumar S. Jatti, and Shivangi Paliwal Securing Data Conveyance for Dynamic Source Routing Protocol by Using SDSR-ANNETG Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Ahmed R. Zarzoor and Talib M. J. Abbas Machine Intelligence for Smart Government Regional Language Translator and Event Detection Using Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 P. Santhi, K. Deepa, M. Sathya Sundaram, and V. Kumararaja
Application of Machine Intelligence in Smart Societies: A Critical Review of the Opportunities and Risks Oluibukun Gbenga Ajayi
Abstract The integration of machine intelligence (MI) which includes techniques such as machine learning (ML), deep learning (DL), and artificial intelligence (AI), is increasingly being used to enable smart societies. Smart societies, which include smart buildings, homes, cities, entire regions, and the people living in them, aim to improve the quality of life of inhabitants by providing more efficient, safer, and more comfortable environments. The use of MI in smart societies has the potential to bring significant benefits, such as improved energy efficiency and traffic management, but also raises a number of challenges and risks. This chapter provides an overview of the opportunities and risks of using MI in smart societies. It begins by discussing the ways in which MI can be used in smart societies, such as energy management, transportation, security, etc. Then, it describes the potential risks and challenges of using MI in smart societies, including ethical and legal issues, the potential for perpetuating biases and discrimination, and the need for robust regulation and governance. Finally, the chapter concludes by suggesting potential research directions and recommendations for future work in the field. Keywords Machine intelligence · Smart societies · Internet of things (IoT) · Artificial intelligence (AI) · Sustainable development · Robotic cities
1 Introduction A “smart society” is a term that refers to a community or society that uses technology and data to improve the quality of life for its citizens or inhabitants. It is the integration of technology, information, and communication systems in physical spaces to create more efficient, sustainable, and responsive societies [87]. The goal of smart societies is to improve the quality of life and the well-being of the inhabitants, by O. G. Ajayi (B) Department of Land and Spatial Sciences, Namibia University of Science and Technology, Windhoek, Namibia e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Adadi and S. Motahhir (eds.), Machine Intelligence for Smart Applications, Studies in Computational Intelligence 1105, https://doi.org/10.1007/978-3-031-37454-8_1
1
2
O. G. Ajayi
providing them with more efficient, equitable, sustainable, safer and more comfortable environments. Examples of smart societies include: Smart buildings, which use technology to improve energy efficiency, indoor air quality, and comfort [25, 47]. Smart cities, which use technology to improve transportation, energy management, and public services. Smart homes, which use technology to improve safety, security, and comfort [20]. Smart campus, which uses technology to improve student services, security, and energy management, etc. Smart societies are typically enabled by the integration of several technologies such as: Internet of Things (IoT) devices, sensors, actuators, machine learning (ML) and artificial intelligence (AI). The information and data generated by these systems is then analysed and used to make real-time decisions, control and manage the society. Machine intelligence (MI), which can be considered as a higher evolution of machine learning (ML) has been used for the learning of patterns of living and behavioural patterns [28]. MI is basically constituted by an integration and assemblage of different digital technologies some of which includes the Internet of Things (IoT) and artificial intelligence (AI) which has led to the birth of Intelligence of Things by the collaborative modelling of IoT and AI [64], deep reinforcement learning (DR), deep learning (DL), digital twins (DW), fuzzy logic systems (FLS), machine learning (ML), and evolutionary algorithms (EA) involving non-linear dynamics, computational intelligence, ideas drawn from physics, physiology, and several other computational frameworks [28]. It helps in the investigation, simulation, modelling and analysis of very complex phenomena in order to provide solution to real-world challenges requiring a multidisciplinary approach. It plays a crucial role in enabling smart societies [4, 77]. It allows for the automated processing and analysis of large amounts of data generated by sensors and IoT devices in smart societies. One key application of MI in smart societies is in the area of energy management [11, 56, 71] which can lead to significant energy savings, as well as improved comfort for the building’s inhabitants [43]. In smart cities, MI can be used for traffic management, especially in the prediction of traffic congestion in the development of an intelligent transportation [13, 89, 94]. This allows for real-time adjustments to traffic signals and other infrastructure, leading to improved traffic flow and reduced emissions. MI can also be used to improve the safety and security of smart societies, especially with open data [40]. For example, in smart homes, ML algorithms can be used to detect unusual or suspicious activity, allowing for the prompt notification of homeowners or emergency services [30, 92]. In the healthcare field, MI can be used to improve patient monitoring and care in smart hospitals and homes [27, 34, 37, 39, 52, 75]. MI can also be used in smart societies to improve the quality of life of inhabitants by allowing for more personalized and responsive services. It should, however, be noted that while MI has the potential to bring significant benefits to smart societies, the technology also comes with its own set of challenges and risks [85]. For instance, the ability of MI to process and analyse large amounts of data can raise privacy concerns. Additionally, the use of MI in smart societies
Application of Machine Intelligence in Smart Societies: A Critical …
3
may perpetuate existing biases and discrimination. It is important that these issues are considered and addressed as MI is integrated into smart societies. The purpose of this chapter is to explore the potential benefits and challenges of using MI in smart societies in more details. It provides a comprehensive and up-todate overview of the current state of the field while also highlighting the opportunities and challenges associated with the integration of MI in smart societies. The first outline/framework of this chapter was drafted with the aid of ChatGPT before the outline was subjected to review and significant modification. Indeed, the author has modified the scope of the outline generated by the chatbot and revised the adapted contents.
2 Opportunities of MI for Smart Societies MI has the potential to transform our societies in many ways ranging from the creation of smart cities to the development of autonomous vehicles, or the optimization of business processes, thereby promoting smart living. The major applications of MI in smart societies and its socio-economic impacts are discussed in this section.
2.1 MI Applications in Smart Societies MI has the potential to bring many benefits to smart societies and to considerably improve the quality of life of the inhabitants of such societies. Specifically, it has the potential to enable smart societies to be more efficient [4, 56], sustainable [13, 28, 89], and responsive to the needs of their inhabitants in several ways [25, 27, 75]. Few examples of the opportunities and applications of MI in smart societies are presented in Fig. 1 and discussed as follows: I. Energy Management: ML algorithms can be used to analyse data from sensors in buildings, such as temperature and occupancy data, to predict energy usage and optimize heating, cooling, and lighting systems [78, 93]. This can lead to significant energy savings, as well as improved comfort for the building’s inhabitants [3, 4, 28, 42, 56]. For example, Amasyali and El-Gohary [5] provided a review of data-driven consumption of energy in buildings while Ntakolia et al. [67] reviewed the application of ML to the operation of district heating and cooling (DHC) systems which could lead to significant energy savings and improvements in system resilience. ML also offers significant improvement to many aspects of DHC operations [60]. A summary of AI methods used for the design of energy efficient buildings can be found in Tien et al. [86]. II. Traffic Management: MI can be used for intelligent traffic management in smart cities on the land, in the air and on the sea [65, 69, 74]. Monitoring
4
O. G. Ajayi
Fig. 1 Opportunities of MI in smart societies
of traffic sensors and analysing traffic data helps in the prediction of traffic congestion [13, 89]. This allows for real-time adjustments to traffic signals and other infrastructure, leading to improved traffic flow and reduced emissions in an intelligent transportation system [94]. Kistan et al. [48] documented the considerations for certification and the recent developments in the application of ML and cognitive ergonomics in air traffic management. III. Safety and Security: The safety and security of smart societies can be improved using MI [92]. For example, in smart homes, ML algorithms can be used to quickly detect unusual or suspicious activity in real time, allowing for the prompt notification of homeowners or emergency services [30]. It can also be used to analyse data on crime patterns, identify areas of concern known as hotspots, and automate the dispatch of emergency services [8, 59]. This can help to reduce response times, improve the safety and accessibility of emergency services, and to improve the overall well-being of inhabitants. IV. Healthcare and wellness: AI is now deployed for the facilitation of early disease detection and better understanding of disease progression, while optimizing dosages of medication and treatment [31, 35, 46, 63]. Patient monitoring and care in smart hospitals and homes can be improved with MI [34, 91]. For instance, by analysing data from wearables and other medical devices, ML algorithms can detect early signs of disease or deterioration in a patient’s health, allowing for prompt interventions and improved outcomes [27, 75]. Also, by deploying MI, health and wellness of inhabitants in smart societies can be improved by analysing data on population health, identifying areas of concern, and automating the delivery of healthcare services [32, 66]. This can help to reduce healthcare costs, improve patient outcomes, and to improve the overall well-being of inhabitants.
Application of Machine Intelligence in Smart Societies: A Critical …
5
V. Personalization: MI can also be used in smart societies to improve the quality of life of inhabitants by allowing for more personalized and responsive services such as personalized transportation, healthcare, and home automation services [62]. For example, ML or DL algorithms can be used to detect patterns of behaviour and adapt the environment to the needs of the inhabitants [25]. Cook et al. [24] provided a robust discussion on the technologies, opportunities, and applications of ambient intelligence which describes a situation in which humans are surrounded by intelligent objects that will enable the environment to be smart enough to recognize the presence of individuals and be responsive to this presence in a manner that is undetectable. VI. Building Management: MI can be used for identifying areas for improvement, and automating the control of heating, ventilation system, lighting, and power management systems which improves the efficiency and sustainability of building management in smart cities [28, 29, 47, 73]. This can also help in the reduction of energy costs and environmental impacts, and in the improvement of the comfort and safety of inhabitants [7]. MI can also be used in the automation of building designs [23]. VII. Waste Management: MI can be used to improve the efficiency and sustainability of waste management in smart societies [64, 84] by analysing data on waste patterns, and automating the collection and processing of waste. This can help to reduce the environmental impact of waste and to improve the cleanliness and safety of the environment for the inhabitants. Abdallah et al. [2] and Ihsanullah et al. [41] provided a review of the applications of AI in waste management. Abbasi and Hanandeh [1] also deployed AI for the monthly forecasting of municipal solid waste over a medium term period using Logan city council region in Australia as a case study, while a similar study using hierarchical structure approach was conducted by Bui et al. [16] in Taiwan. The obtained results proved that AI have good prediction performance and can be deployed successfully for the development of municipal solid waste forecasting and management models. A systematic review of the applications of AI for sustainable waste solid management practices in Australia is also provided by Andeobu et al. [6]. VIII. Inclusiveness: MI can be used to identify areas of social and economic disadvantage in smart societies and to target resources and services to these areas [8, 59]. This can help to ensure that all inhabitants have access to the benefits of smart city services and to reduce the gap between inhabitants of the city irrespective of their social status or class, thus improving the overall wellbeing of the inhabitants. It aids accessibility to city’s infrastructures by people living with disability [76], and for the elderly [45]. IX. Emergency Response: The efficiency and responsiveness of emergency response in smart societies can be improved with MI by analysing data on incidents and identifying areas of concern, and automating the dispatch of emergency services [68]. This can help to reduce response times, improve the safety and accessibility of emergency services, and to improve the overall
6
O. G. Ajayi
X.
XI.
XII.
XIII.
well-being of inhabitants. The review provided by Chamola et al. [19] highlighted recent advances on the management of pandemics using ML, while Kyrkou et al. [50] provided a detailed survey of the applications of MI in emergency management. Community Building: By analysing data on population demographics and social interactions, and creating platforms for community engagement and communication, MI can be used to improve community building and social connectedness in smart societies [61, 88]. This can help to foster a sense of community and belonging among inhabitants, and to improve the overall quality of life. Environmental monitoring: MI can be used to monitor and predict environmental conditions in smart societies such as air quality [33, 82], water quality [21, 54, 72, 81, 83, 90], pollutions [9, 79], weather patterns [49], and natural disasters [87] like Mountain fire [44]. This can help to improve the safety and well-being of inhabitants by providing early warning systems, and to promote sustainable practices by monitoring and managing natural resources. Entertainment and leisure: By analysing data on population demographics, preferences, and usage patterns using MI, the entertainment and leisure options available in smart societies can be improved. The concept of the application of MI in the improvement of entertainment and leisure in smart societies has also resulted in the advent of smart tourism [36, 38, 53, 80] which uses of technology to enhance the travel experience of tourists. Education and learning: MI has the potential to revolutionize education and learning, leading to significant improvements in the quality of life for inhabitants. Examples of some of these interventions include creation of personalized learning for students, design of adaptive testing systems to accurately measure student’s knowledge and identify areas where support is needed, creation of intelligent tutoring systems where real-time feedback and guidance is provided and tutoring is adapted to the student’s learning style and pace, creation of Virtual and Augmented Reality (VAR) learning, and implementation of Automated Grading Systems. More recently, an artificial intelligent chatbot known as ChatGPT which is a member of the Generative Pre-trained Transformer (GPT) family has been developed by OpenAI© . The chatbot works on the integrated principles of natural processing language (NLP) and transfer learning and it was launched on 30th November 2022. This chatbot is currently changing the landscape of learning, especially as it concerns students’ assessment because of its ability to mimic a real-life person in providing moderately accurate answers to questions in near-real time. The GPT-4 which is a multimodal large language model is the 4th in the OpenAI’s GPT series and it was launched on 14th March 2023.
In summary, the applications of MI in smart societies are inexhaustive. MI can help to improve the quality of life of inhabitants by providing personalized services and information, improving safety and security, optimizing comfort, improving health and wellness, fostering community building and social connectedness, monitoring
Application of Machine Intelligence in Smart Societies: A Critical …
7
and managing the environment, providing personalized entertainment and leisure options, and identifying and targeting resources and services to disadvantaged areas. Others include smart waste management, smart water management, smart education and learning, smart farming or agriculture, smart urban planning, and smart retail which aids optimization of retail operations, reduction of costs, and improvement of customer experiences, etc. MI can also enable smart societies to be more efficient, sustainable, and responsive to the needs of their inhabitants by optimizing energy consumption, improving transportation efficiency, managing buildings efficiently, managing waste effectively, revolutionizing education and learning, and responding to emergencies effectively.
2.2 Socio-economic Impacts of Machine Intelligence in Smart Societies The economic, social, and environmental impacts of MI in smart societies can be both positive and negative. Economically, MI in smart societies can lead to increased efficiency and productivity by automating repetitive tasks, resulting in cost savings and improved competitiveness. Meanwhile, automation can also contribute to income inequality if the benefits of increased productivity are not distributed widely. It can also foster job displacement, particularly in industries where manual labour is prevalent. This can have negative economic consequences for affected workers, who may need retraining or support to transition to new careers. On the flip side, MI can also create new job opportunities in the development and maintenance of these systems [22]. Socially, MI in smart societies can lead to improved quality of life by aiding the provision of better access to services and information and by increasing safety and security. For example, the use of MI in healthcare can lead to improved patient outcomes and increased access to healthcare services [46]. Additionally, MI in smart homes can lead to improved energy efficiency and increased comfort. Also, smart societies can also make it easier for people to isolate themselves from the outside world, which can have negative impacts on mental and physical health. Environmentally, MI in smart societies which births the concept of robotic cities [18] can lead to reduced energy consumption and emissions, as well as improved resource management. For example, the use of MI in buildings and transportation systems can lead to reduced energy consumption and emissions, and the use of MI in agriculture can lead to improved resource management and reduced environmental impact [70]. However, it is also important to note that MI in smart societies can also have negative impacts. For example, the increased use of technology can lead to increased energy consumption and emissions, and it may also have a significant impact on employment, particularly in low-skilled jobs [22] as it has been earlier discussed.
8
O. G. Ajayi
Additionally, MI can also lead to privacy and security concerns and can perpetuate biases and discrimination as discussed in Sect. 3.
3 Risks and Challenges of Machine Intelligence for Smart Societies The use of MI in smart societies presents several potential risks and challenges (see Fig. 2). One of the main challenges is the potential for bias and discrimination in the decision-making processes of MI systems [12]. These systems can be trained on biased data and algorithms, which can lead to unfair and discriminatory outcomes, particularly for marginalized and underrepresented groups [17]. If the data used to train an MI system is not diverse and representative of the population it will be used on, the system is likely to perpetuate biases and discrimination against certain groups. One example of bias in MI is the use of facial recognition technology, which has been shown to have higher error rates for people with darker skin tones and for women [17]. This is due to the fact that the training data used to develop these systems was predominantly composed of images of lighter-skinned individuals and men. Another way that MI systems can perpetuate biases is through the features or variables used in the algorithm. If the algorithm uses variables that are correlated with a sensitive attribute such as race, gender, or sexual orientation, the algorithm could make decisions that discriminate against individuals based on these attributes, even if the algorithm was not explicitly trained to do so. An example of this is the bias in predictive policing, which uses ML algorithms to predict crime hotspots
Fig. 2 Risks and challenges of the application of MI in smart societies
Application of Machine Intelligence in Smart Societies: A Critical …
9
and allocate police resources. These systems have been found to perpetuate racial bias by disproportionately targeting minority communities [57]. This is because the data used to train these systems is often based on past police practices, which have been shown to disproportionately target minority communities. Research has shown that MI systems can perpetuate and even amplify existing societal biases, leading to biased decisions in areas such as criminal justice, employment, and healthcare [58]. Another challenge is related to the security and privacy of the personal data used to train and operate MI systems in smart societies. These systems often collect and process large amounts of personal data, including sensitive information such as location, health, and financial data. The collected data can be used for targeted advertising, profiling, and decision-making, which can raise concerns about the right to privacy and the protection of personal data [10]. Security is another major concern related to the use of MI in smart societies. Smart devices and systems are vulnerable to hacking and cyber-attacks, which can have serious consequences such as theft of personal data, disruption of services, and even physical harm. Research has shown that many smart devices are often poorly secured and lack basic security features such as encryption and authentication. Additionally, the use of MI in smart societies raises ethical and legal issues related to data ownership and control. Smart systems and devices generate large amounts of data, which can be used for a variety of purposes such as research, analysis, and decision-making. However, there are often questions about who owns this data and who has the right to access and use it [14]. To address these ethical and legal issues, it is important to develop appropriate policies and regulations to protect privacy, security, and data ownership. This can include measures such as data minimization, data anonymization, encryption, and secure data storage. Additionally, there should be clear and transparent rules for data access and use, and individuals should have the right to control their own data. As ML models become more complex, it can be difficult for humans to understand how they work and how decisions are being made. This can lead to lack of transparency and accountability, and could result in decisions that are difficult to justify or understand. Furthermore, smart cities rely on a wide range of sensors, devices, and systems that must be able to communicate and work together effectively. However, if these systems are not designed with interoperability in mind, it can be difficult to share data and insights across different platforms and systems, which can hamper decision-making. Another risk is the impact of MI on employment, particularly in low-skilled jobs. Research has shown that the automation of certain tasks and jobs can lead to job displacement and economic disruption [22]. Also, there is the risk of a digital divide, where not all individuals have access to the benefits of MI in smart societies. This could lead to further marginalization and inequality in society. The use of MI in smart societies presents several potential risks and challenges, including the potential for bias and discrimination, security and privacy concerns, ethical and legal issues related to data ownership and control, the impact on employment and the potential for digital divide. These challenges must be carefully considered and addressed to aid the responsible deployment of MI in smart societies.
10
O. G. Ajayi
Ensuring transparency and explainability, using data that is a true representative of the population and being conscious of potential biases, having strong regulations to protect citizen data and ensuring strong cybersecurity measures are some of the steps that can be taken to mitigate some of these risks. Smart societies, which are intended to make people’s lives easier, safe, and more comfortable, should be designed with proper ethical principles and tested for bias, otherwise it could have serious consequences on individual’s lives. Therefore, the importance of having a robust ethical and legal framework in place to guide the development, implementation and monitoring of AI in smart societies cannot be over emphasized.
4 Addressing the Challenges and Risks of MI in Smart Societies While the advantages and benefits of MI in smart societies are enormous, there are also certain challenges and risks associated with the introduction of MI to enable smart living in smart societies. Two basic interventions are discussed under this section as methods of addressing some of these challenges, risks and concerns. These interventions are the need to enact robust regulation and establish governance frameworks, and the need for continuous research.
4.1 Need for Robust Regulation and Governance Frameworks for Machine Intelligence in Smart Societies The increasing use of MI in smart societies highlights the need for robust regulation and governance frameworks. Smart societies use technology to connect and integrate various systems and devices to improve the quality of life and efficiency. MI is a key component of these systems, as it enables the automation and optimization of various processes and decision making. However, the use of MI in smart societies also raises important ethical, legal, and societal issues, such as privacy, security, and accountability. One of the key challenges of MI in smart societies is the lack of clear and consistent guidelines for the development, deployment, and use of these systems. This can lead to confusion and inconsistencies in their implementation and can affect the trust and acceptance of these systems by the public. Another challenge is the lack of transparency and accountability in the decision-making processes of MI systems [26, 51, 55]. This can make it difficult to understand and evaluate the outcomes of these systems and to hold the responsible parties accountable for any negative consequences.
Application of Machine Intelligence in Smart Societies: A Critical …
11
To address these challenges, there is a need for robust regulation and governance frameworks that provide clear and consistent guidelines for the development, deployment, and use of MI systems in smart societies. These frameworks should also include mechanisms for transparency and accountability in the decision-making processes of these systems. Regulation and governance frameworks can help address these biases by setting standards for the development and deployment of ML models, as well as for the data that is used to train them. This can include measures such as transparency requirements for ML models, data quality standards, and auditing and oversight mechanisms. For example, in Europe, the General Data Protection Regulation (GDPR) has provisions to ensure transparency of the decision making process of automated systems. Regulation and governance frameworks can also help ensure that the use of MI in smart societies aligns with broader societal values and goals, such as privacy, security, and social justice. The governance of AI is becoming an important field as well, and many organisations such as the Institute of Electrical and Electronics Engineers (IEEE), International Organization for Standardization (ISO) or even Governments are creating guides and standards to create a responsible usage of AI. It is important to note that the creation of effective regulation and governance frameworks for MI is a complex and on-going challenge that will involve close collaboration between experts from a range of fields, including computer science, statistics, economics, law, and public policy, and different stakeholders.
4.2 Need for Research to Address the Challenges and Risks and Ensure the Responsible Deployment of Machine Intelligence in Smart Societies MI in smart societies presents complex ethical, legal, and societal issues that need to be addressed to ensure that these systems are designed and deployed in a responsible manner. Research is crucial to address the challenges and risks and ensure the responsible deployment of MI in smart societies. One of the key challenges in the deployment of MI in smart societies is the lack of understanding and knowledge about the potential impacts of these systems on the society [15]. Explorative research is needed to better understand these impacts and to develop effective policies and regulations to mitigate any negative effect. Research is needed to develop methods for ensuring transparency and accountability in the decision-making processes of these systems [26, 51, 55], including the use of explainable AI methods. Additionally, research is needed to address the impact of MI on employment, particularly in low-skilled jobs [22]. This can include research on retraining programs and the development of new job opportunities in the field of MI. Research also plays an important role in developing new technologies and to foster innovation in MI. Additionally, research is necessary to evaluate the effectiveness of
12
O. G. Ajayi
existing policies and regulations and to identify areas for improvement. Furthermore, research can also play a role in ensuring that MI is developed and deployed in an inclusive and equitable manner, taking into account the needs and perspectives of different groups and communities. Research in the field of MI and smart societies should also be interdisciplinary, involving collaboration between experts from various fields such as geospatial scientists, computer science, engineering, sociology, ethics, and law. This multidisciplinary approach will allow for a more comprehensive understanding of the challenges and opportunities of MI in smart societies and to develop more effective solutions.
5 Concluding Remarks MI is a rapidly growing field with the potential to revolutionize the way we live and work in smart societies. It is capable of enabling smart societies to be more efficient, sustainable, and responsive to the needs of their inhabitants. It can also be used to optimize energy consumption, improve transportation efficiency, manage buildings efficiently, manage waste effectively, and respond to emergencies effectively. MI can be used to improve the quality of life of inhabitants by providing personalized services and information, improving safety and security, optimizing comfort, and identifying and targeting resources and services to disadvantaged areas. Additionally, MI can improve health and wellness, foster community building and social connectedness, monitor and manage the environment, and provide personalized entertainment and leisure options. However, the implementation of MI in smart societies also poses a couple of risks and challenges, such as data privacy, security, and data ownership, ethical and legal issues, and the potential for MI to perpetuate biases and discrimination. Therefore, it is important to ensure that MI is developed and deployed responsibly, through robust governance and regulation frameworks, and to address these challenges and risks through research. In short, MI has the potential to transform smart societies in many ways, but it is important to approach its development and deployment with caution to ensure that the benefits are realized while minimizing the risks. Acknowledgements The virtual assistance of OpenAI’s GPT-3 model, also known as ChatGPT, in the writing of this chapter is acknowledged.
Application of Machine Intelligence in Smart Societies: A Critical …
13
References 1. Abbasi M, El Hanandeh A (2016) Forecasting municipal solid waste generation using artificial intelligence modelling approaches. Waste Manag 56:13–22. https://doi.org/10.1016/j.wasman. 2016.05.018 2. Abdallah M, Abu Talib M, Feroz S, Nasir Q, Abdalla H, Mahfood B (2020) Artificial intelligence applications in solid waste management: a systematic research review. Waste Manag 109:231–246. https://doi.org/10.1016/j.wasman.2020.04.057 3. Abdel-Razek SA, Marie HS, Alshehri A, Elzeki OM (2022) Energy efficiency through the implementation of an AI model to predict room occupancy based on thermal comfort parameters. Sustainability 14:7734. https://doi.org/10.3390/su14137734 4. Alanne K, Sierla S (2022) An overview of machine learning applications for smart buildings. Sustain Cities Soc 76:103445. https://doi.org/10.1016/j.scs.2021.103445 5. Amasyali K, El-Gohary NM (2018) A review of data-driven building energy consumption prediction studies. Renew Sustain Energy Rev 81(Part 1):1192–1205. https://doi.org/10.1016/ j.rser.2017.04.095 6. Andeobu L, Wibowo S, Grandhi S (2022) Artificial intelligence applications for sustainable solid waste management practices in Australia: a systematic review. Sci Total Environ 834:155389. https://doi.org/10.1016/j.scitotenv.2022.155389 7. Androjic I, Dolacek-Alduk Z (2018) Artificial neural network model for forecasting energy consumption in hot mix asphalt (HMA) production. Constr Build Mater 170:424–432. https:// doi.org/10.1016/j.conbuildmat.2018.03.086 8. Angelidou M, Psaltoglou A, Komninos N, Kakderi C, Tsarchopoulos P, Panori A (2018) Enhancing sustainable urban development through smart city applications. J Sci Technol Policy Manag 9(2):146–169. https://doi.org/10.1108/JSTPM-05-2017-0016 9. Arora J, Pandya U, Shah S, Doshi N (2019) Survey—pollution monitoring using IoT. Procedia Comput Sci 155:710–715 10. Barth S, de Jong MD (2017) The privacy paradox—Investigating discrepancies between expressed privacy concerns and actual online behaviour—a systematic literature review. Telematics Inform 34(7):1038–1058. https://doi.org/10.1016/j.tele.2017.04.013 11. Benedetti M, Cesarotti V, Introna V, Serranti J (2016) Energy consumption control automation using artificial neural networks and adaptive algorithms: proposal of a new methodology and case study. Appl Energy 165:60–71. https://doi.org/10.1016/j.apenergy.2015.12.066 12. Bolukbasi T, Chang K-W, Zou J, Saligrama V, Kalai A (2016) Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. https://doi.org/10.48550/arXiv.1607. 06520 13. Bommes M, Fazekas A, Volkenhoff T, Oeser M (2016) Video based intelligent transportation systems—state of the art and future development. Trans Res Procedia 14:4495–4504. https:// doi.org/10.1016/j.trpro.2016.05.372 14. Boyd D, Crawford K (2012) Critical questions for big data-provocations for a cultural, technological, and scholarly phenomenon. Inf Commun Soc 15(5):662–679. https://doi.org/10.1080/ 1369118X.2012.678878 15. Brundage M, Avin S, Wang J, Belfield H, Krueger G, Hadfield GK, Khlaaf H, Yang J, Toner H, Fong R et al (2020) Toward trustworthy AI development: mechanisms for supporting verifiable claims. arXiv:2004.07213 16. Bui T, Tseng J, Tseng M, Wu K, Lim MK (2023) Municipal solid waste management technological barriers: a hierarchical structure approach in Taiwan. Resour Conserv Recycl 190:106842. https://doi.org/10.1016/j.resconrec.2022.106842 17. Buolamwini J, Gebru T (2018) Gender shades: intersectional accuracy disparities in commercial gender classification. Proc Mach Learn Res 81:1–15 18. Burlacu M, Boboc RG, Butila EV (2022) Smart cities and transportation: reviewing the scientific character of the theories. Sustainability 14: 8109. https://doi.org/10.3390/su1413 8109
14
O. G. Ajayi
19. Chamola V, Hassija V, Gupta S, Goyal A, Guizani M, Sikdar B (2021) Disaster and pandemic management using machine learning: a survey. IEEE Internet Things J 8(21):16047–16071 20. Chan M, Campo E, Estève D, Fourniols JY (2009) Smart homes—current features and future perspectives. Maturitas 64(2):90–97. https://doi.org/10.1016/j.maturitas.2009.07.014. Epub 2009 Sep 2. PMID: 19729255 21. Chen Q, Cheng G, Fang Y, Liu Y, Zhang Z, Gao Y, Horn BKP (2018) Real-time learning-based monitoring system for water contamination. In: Proceedings of the 2018 4th international conference on universal village (UV 2018), Boston, MA, USA, 21–24 October 2018, pp 1–5 22. Chui M, Manyika J, Miremadi M (2015) Four fundamentals of workplace automation. McKinsey Q 23. Cirigliano A, Cordone R, Nacci AA, Santambrogio MD (2018) Toward smart building design automation: extensible CAD framework for indoor localization systems deployment. IEEE Trans Comput Aided Des Integr Circuits Syst 37(1):133–145. https://doi.org/10.1109/TCAD. 2016.2638448 24. Cook DJ, Augusto JC, Jakkula VR (2009) Ambient intelligence: technologies, applications, and opportunities. Pervasive Mob Comput 5(4):277–298. https://doi.org/10.1016/j.pmcj.2009. 04.001 25. Daissaoui A, Boulmakoul A, Karim L, Lbath A (2020) IoT and big data analytics for smart buildings: a survey. Procedia Comput Sci 170:161–168. https://doi.org/10.1016/j.procs.2020. 03.021 26. de Laat PB (2018) Algorithmic decision-making based on machine learning from big data: can transparency restore accountability? Philos Technol 31:525–541. https://doi.org/10.1007/s13 347-017-0293-z 27. Delmastro F Martino FD, Dolciotti C (2020) Cognitive training and stress detection in MCI frail older people through wearable sensors and machine learning. IEEE Access 8. https://doi. org/10.1109/access.2020.2985301.65573 28. Dounis AI (2023) Machine intelligence in smart buildings. Energies 16:22. https://doi.org/10. 3390/en16010022 29. Fadil ZA (2021) Smart construction companies using internet of things technologies. Periodicals Eng Nat Sci 9(2). https://doi.org/10.21533/pen.v9i2.1858 30. Farooq MS, Khan S, Rehman A, Abbas S, Khan MA, Hwang SO (2022) Blockchain-based smart home networks security empowered with fused machine learning. Sensors 22(12):4522. https://doi.org/10.3390/s22124522 31. Fogel AL, Kvedar JC (2018) Artificial intelligence powers digital medicine. Npj Digit Med 1(1):3–6. https://doi.org/10.1038/s41746-017-0012-2 32. Gaglio S, Re GL, Martorella G, Peri D, Vassallo SD (2014) Development of an IoT environmental monitoring application with a novel middleware for resource constrained devices. In: Proceedings of the 2nd conference on mobile and information technologies in medicine (MobileMed 2014), Prague, Czech Republic, 20–21 October 2014 33. Gallah N, Besbes K (2013) Small satellite and multi-sensor network for real time control and analysis of lakes surface waters. In: Proceedings of the RAST 2013: 6th conference on recent advances in space technologies, Istanbul, Turkey, 12–14 June 2013, pp 155–158, 40 34. Gomes MAS, Kovaleski JL, Pagani RN, da Silva VL (2022) Machine learning applied to healthcare: a conceptual review. J Med Eng Technol 46(7):608–616. https://doi.org/10.1080/ 03091902.2022.2080885 35. Graham S, Depp C, Lee EE, Nebeker C, Kim H-C, Jeste DV (2019) Artificial intelligence for mental health and mental illnesses: an overview. Curr Psychiatry Rep 21:116. https://doi.org/ 10.1007/s11920-019-1094-0 36. Gretzel U, Werthner H, Koo C, Lamsfus C (2015) Conceptual foundations for understanding smart tourism ecosystems. Comput Hum Behav 50:558–563. https://doi.org/10.1016/j.chb. 2015.03.043 37. Hayano J, Yamamoto H, Nonaka I et al (2020) Quantitative detection of sleep apnea with wearable watch device. PLoS ONE 15. https://doi.org/10.1371/journal.pone.0237279.e02 37279
Application of Machine Intelligence in Smart Societies: A Critical …
15
38. Hsu CC (2018) Artificial intelligence in smart tourism: a conceptual framework. In: Proceedings of the 18th international conference on electronic business. ICEB, Guilin, China, December 2–6, pp 124–133 39. Huang J-D, Wang J, Ramsey E, Leavey G, Chico TJA, Condell J (2022) Applying artificial intelligence to wearable sensor data to diagnose and predict cardiovascular disease: a review. Sensors 22(20). https://doi.org/10.3390/s22208002 40. Hurbean L, Danaiata D, Militaru F, Dodea A-M, Negovan A-M (2021) Open data based machine learning applications in smart cities: a systematic literature review. Electronics 10:2997. https:/ /doi.org/10.3390/electronics10232997 41. Ihsanullah I, Alam G, Jamal A, Shaik F (2022) Recent advances in applications of artificial intelligence in solid waste management: a review. Chemosphere 309:136631. https://doi.org/ 10.1016/j.chemosphere.2022.136631 42. Imran, Iqbal N, Kim DH (2022) IoT task management mechanism based on predictive optimization for efficient energy consumption in smart residential buildings. Energy Build 257. https://doi.org/10.1016/j.enbuild.2021.111762 43. Imran, Ahmad S, Hyeun KD (2019) Design and implementation of thermal comfort system based on tasks allocation mechanism in smart homes. Sustainability 11(20):5849. https://doi. org/10.3390/su11205849 44. Iqbal N, Ahmad S, Kim DH (2021) Towards mountain fire safety using fire spread predictive analytics and mountain fire containment in IoT environment. Sustainability 13(5) 45. Jabla R, Buend´ιa F, Khemaja M, Faiz S (2020) Smartphone devices in smart environments: ambient assisted living approach for elderly people. In: The thirteenth international conference on advances in computer-human interactions, pp 235–241 46. Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, Wang Y, Dong Q, Shen H, Wang Y (2017) Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol 2(4):230–243. https:// doi.org/10.1136/svn-2017-000101 47. Khaoula E, Amine B, Mostafa B (2022) Machine learning and the internet of things for smart buildings: a state of the art survey. In: 2nd international conference on innovative research in applied science, engineering and technology (IRASET), Meknes, Morocco, pp 1–10. https:// doi.org/10.1109/IRASET52964.2022.9738256 48. Kistan T, Gardi A, Sabatini R (2018) Machine learning and cognitive ergonomics in air traffic management: recent developments and considerations for certification. Aerospace 5:103. https:/ /doi.org/10.3390/aerospace5040103 49. Kulkarni PH, Kute PD (2016) Internet of things based system for remote monitoring of weather parameters and applications. Int J Adv Electron Comput Sci 3:68–73 50. Kyrkou C, Kolios P, Theocharides T, Polycarpou M (2023) Machine learning for emergency management: a survey and future outlook. Proc IEEE 111(1):19–41. https://doi.org/10.1109/ JPROC.2022.3223186 51. Lepri B, Oliver N, Letouzé E, Pentland A, Vinck P (2018) Fair, transparent, and accountable algorithmic decision-making processes. Philos Technol 31:611–627. https://doi.org/10.1007/ s13347-017-0279-x 52. Li W, Chai Y, Khan F, Jan SRU, Verma S, Menon VG, Kavita, Li X (2021) A comprehensive survey on machine learning-based big data analytics for IoT-enabled smart healthcare system. Mobile Netw Appl 26:234–252. https://doi.org/10.1007/s11036-020-01700-6 53. Li Y, Hu C, Huang C, Duan L (2017) The concept of smart tourism in the context of tourism information services. Tour Manage 58:293–300. https://doi.org/10.1016/j.tourman. 2016.03.014 54. Li Y, Wang X, Zhao Z, Han S, Liu Z (2020) Lagoon water quality monitoring based on digital image analysis and machine learning estimators. Water Res 172:115471 55. Lo Piano S (2020) Ethical principles in machine learning and artificial intelligence: cases from the field and possible ways forward. Humanit Soc Sci Commun 7:9. https://doi.org/10.1057/ s41599-020-0501-9 56. Lu C, Li S, Lu Z (2022) Building energy prediction using artificial neural networks: a literature survey. Energy Build 262:111718. https://doi.org/10.1016/j.enbuild.2021.111718
16
O. G. Ajayi
57. Lum K (2016) Predictive policing reinforces police bias. Human Rights Data Analysis Group. https://hrdag.org/2016/10/10/predictive-policing-reinforces-police-bias/ 58. Lum K, Isaac W (2016) To predict and serve? Significance 13(5):14–19. https://doi.org/10. 1111/j.1740-9713.2016.00960.x 59. Makieła ZJ, Stuss MM, Mucha-Ku´s K, Kinelski G, Budzi´nski M, Michałek J (2022) Smart city 4.0: sustainable urban development in the metropolis GZM. Sustainability 14(6):3516. https:/ /doi.org/10.3390/su14063516 60. Mbiydzenyuy G, Nowaczyk S, Knutsson H, Vanhoudt D, Brage J, Calikus E (2021) Opportunities for machine learning in district heating. Appl Sci 11:6112. https://doi.org/10.3390/app 11136112 61. Méndez JI, Ponce P, Medina A, Meier A, Petter T, McDaniel T, Mollina A (2021) Humanmachine interfaces for socially connected devices: from smart households to smart cities. In: McDaniel T, Liu X (eds) Multimedia for accessible human computer interfaces. Springer, Cham. https://doi.org/10.1007/978-3-030-70716-3_9 62. Mihailidis A, Carmichael B, Boger J (2014) The use of computer vision in an intelligent environment to support aging-in-place, safety, and independence in the home. IEEE Trans Inf Technol Biomed 8(3):238–247 63. Miotto R, Wang F, Wang S, Jiang X, Dudley JT (2017) Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 19(6):1236–1246. https://doi.org/10.1093/bib/ bbx044 64. Mishra S, Jena L, Tripathy HK, Gaber T (2022) Prioritized and predictive intelligence of things enabled waste management model in smart and sustainable environment. PLoS ONE 17(8):e0272383. https://doi.org/10.1371/journal.pone.0272383 65. Mondal MA, Rehena Z (2021) An IoT-based congestion control framework for intelligent traffic management system. In: Chiplunkar N, Fukao T (eds) Advances in artificial intelligence and data engineering. Advances in intelligent systems and computing, vol 1133. Springer, Singapore. https://doi.org/10.1007/978-981-15-3514-7_96 66. Mshali H, Lemlouma T, Moloney M, Magoni D (2018) A survey on health monitoring systems for health smart homes. Int J Ind Ergon 66:26–56 67. Ntakolia C, Anagnostis A, Moustakidis S, Karcanias N (2022) Machine learning applied on the district heating and cooling sector: a review. Energy Syst 13:1–30. https://doi.org/10.1007/ s12667-020-00405-9 68. Nunavath V, Goodwin M (2019) The use of artificial intelligence in disaster management—a systematic literature review. In: Proceedings of the international conference information and communication technology for disaster manage (ICT-DM), pp 1–8, Dec 69. Ouallane AA, Bakali A, Bahnasse A, Broumi S, Talea M (2022) Fusion of engineering insights and emerging trends: intelligent urban traffic management system. Inf Fusion 88:218–248. https://doi.org/10.1016/j.inffus.2022.07.020 70. Patrício DI, Rieder R (2018) Computer vision and artificial intelligence in precision agriculture for grain crops: a systematic review. Comput Electron Agric 153:69–81. https://doi.org/10. 1016/j.compag.2018.08.001 71. Qian Shi Q, Liu C, Xiao C (2022) Machine learning in building energy management: a critical review and future directions. Front Eng 9(2):239–256. https://doi.org/10.1007/s42524-0210181-1 72. Ragi NM, Holla R, Manju G (2019) Predicting water quality parameters using machine learning. In: Proceedings of the 4th IEEE international conference on recent trends on electronics, information & communication technology (RTEICT-2019), Bengaluru, India, 17–18 May 2019, pp 1109–1112 73. Rashid SJ, Alkababji AM, Khidhir AM (2021) Communication and network technologies of IoT in smart building: a survey. NTU J Eng Technol 1(1):1–18. https://www.iasj.net/iasj/dow nload/e4c1d255a9fb9b87 74. Rocha Filho GP, Meneguette RI, Torres Neto JR, Valejo A, Weigang L, Ueyama J, Pessin G, Villas LA (2020) Enhancing intelligence in traffic management systems to aid in vehicle traffic congestion problems in smart cities. Ad Hoc Netw 107:102265. https://doi.org/10.1016/ j.adhoc.2020.102265
Application of Machine Intelligence in Smart Societies: A Critical …
17
75. Sabry F, Eltaras E, Labda W, Alzoubi K, Malluhi Q (2022) Machine learning for healthcare wearable devices: the big picture. J Healthc Eng. https://doi.org/10.1155/2022/4653923 76. Salha R, Jawabrah M, Badawy U, Jarada A, Alastal A (2020) Towards smart, sustainable, accessible and inclusive city for persons with disability by taking into account checklists tools. J Geogr Inf Syst 12:348–371. https://doi.org/10.4236/jgis.2020.124022 77. Sanders D (2008) Progress in machine intelligence. Ind Robot 35(6). https://doi.org/10.1108/ ir.2008.04935faa.002 78. Seyedzadeh S, Rahimian FP, Glesk I, Roper M (2018) Machine learning for estimation of building energy consumption and performance: a review. Vis Eng 6(1):1–20. https://doi.org/ 10.1186/s40327-018-0064-7 79. Shaban KB, Kadri A, Rezk E (2016) Urban air pollution monitoring system with forecasting models. IEEE Sens J 16:2598–2606 80. Shafiee S, Rajabzadeh Ghatari A, Hasanzadeh A, Jahanyan S (2019) Developing a model for sustainable smart tourism destinations: a systematic review. Tourism Manag Perspect 31:287– 300. https://doi.org/10.1016/j.tmp.2019.06.002 81. Shaikh SF, Hussain MM (2019) Marine IoT: non-invasive wearable multisensory platform for oceanic environment monitoring. In: Proceedings of the IEEE 5th world forum internet things (WF-IoT 2019), Limerick, Ireland, 15–18 April 2019, pp 309–312 82. Sharma J, John S (2017) Real time ambient air quality monitoring system using sensor technology. Int J Adv Mech Civ Eng 4:72–73 83. Shelestov A, Kolotii A, Lavreniuk M, Medyanovskyi K, Bulanaya T, Gomilko I (2018) Air quality monitoring in urban areas using in-situ and satellite data within era-planet project Eos data analytics, Kyiv, Ukraine National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine Space Research Institute. In: Proceedings of the international geoscience and remote sensing symposium (IGARSS 2018), Valencia, Spain, 22–27 July 2018, pp 1668–1671, 39 84. Singh K, Arora G, Singh P, Gupta A (2021) IoT-based collection vendor machine (CVM) for E-waste management. J Reliable Intell Environ 7:35–47. https://doi.org/10.1007/s40860-02000124-z 85. Sirmacek B, Vinuesa R (2022) Remote sensing and AI for building climate adaptation applications. Results Eng 15. https://doi.org/10.1016/j.rineng.2022.100524 86. Tien PW, Wei S, Darkwa J, Wood C, Calautit JK (2022) Machine learning and deep learning methods for enhancing building energy efficiency and indoor environmental quality—a review. Energy AI 10:100198. https://doi.org/10.1016/j.egyai.2022.100198 87. Ullo SL, Sinha GR (2020) Advances in smart environment monitoring systems using IoT and sensors. Sensors 20:3113. https://doi.org/10.3390/s20113113 88. Wang Q, Jing S, Goel AK (2022) Co-designing AI agents to support social connectedness among online learners: functionalities, social characteristics, and ethical challenges. In: Designing interactive systems conference (DIS ’22), June 13–17, 2022, Virtual Event, Australia. ACM, New York, NY, USA, 16 pp. https://doi.org/10.1145/3532106.3533534 89. Won M (2021) Intelligent traffic monitoring systems for vehicle classification: survey. IEEE Access 8:73340–73358. https://doi.org/10.1109/ACCESS.2020.2987634 90. Yan H, Liu Y, Han X, Shi Y (2017) An evaluation model of water quality based on DSA-ELM method. In: Proceedings of the 16th international conference on optical communications and networks (ICOCN 2017), Wuzhen, China, 7–10 August 2017, pp 1–3 91. Yang S, Zhu F, Ling X, Liu Q, Zhao P (2021) Intelligent health care: applications of deep learning in computational medicine. Front Genet 12. https://doi.org/10.3389/fgene.2021. 607471 92. Yu J, de Antonio A, Villalba-Mora E (2022) Deep learning (CNN, RNN) applications for smart homes: a systematic review. Computers 11(2):26. https://doi.org/10.3390/computers11020026 93. Zhong Y, Sun L, Ge C (2021) Key technologies and development status of smart city. J Phys Conf Ser 1754. https://doi.org/10.1088/1742-6596/1754/1/012102 94. Zulkarnain, Putri TD (2021) Intelligent transportation systems (ITS): a systematic review using a natural language processing (NLP) approach. Heliyon. https://doi.org/10.1016/j.hel iyon.2021.e08615
Machine Intelligence for Smart Environment
Machine Learning Based Recommender Systems for Crop Selection: A Systematic Literature Review Younes Ommane, Mohamed Amine Rhanbouri, Hicham Chouikh, Mourad Jbene, Ikram Chairi, Mohamed Lachgar, and Saad Benjelloun
Abstract This work draws a systematic literature review about the use of Machine Learning based recommender systems for crop selection in agriculture, following the PRISMA protocol for systematic reviews. Agriculture is one of the vital sectors of every country’s economy and crop selection is a paramount problematic to ensure optimal yield depending on economical, technological and environmental factors. Yet, it is up to today underrepresented research thematic, although our study shows a significant increasing trend in the number of studies in the last years. We present the outline of the study, the selection process, as well as the method of content analysis and a literature review table. A detailed analysis of 40 articles published about the CR problem as well as the main achievements, and current challenges are discussed. One of the main outcomes of this study is that, in future works, much attention should be paid to crop selection by using smart and automated solutions due to the huge positive impact it has on agricultural yields, as well as incorporating more diverse recommender-systems algorithms to address this problematic. Finally, we shed light on some future perspectives worthwhile pursuing, concerning the whole ML pipeline and including model evaluation.
Y. Ommane (B) · M. Jbene · I. Chairi · S. Benjelloun MSDA (Modeling Simulation and Data Analysis), Mohammed VI Polytechnic University, Lot 660, Hay Moulay Rachid, Ben Guerir, Morocco e-mail: [email protected] Present Address: Y. Ommane UM6P-Faculty of Medical Sciences (UM6P-FMS), Institute of Biological Sciences (ISSB), Mohammed VI Polytechnic University, Ben-Guerir, Morocco M. A. Rhanbouri · H. Chouikh EMINES (School of Industrial Management), Mohammed VI Polytechnic University, Lot 660, Hay Moulay Rachid, Ben Guerir, Morocco M. Lachgar LTI Laboratory, ENSA, Chouaib Doukkali University, El Jadida, Morocco © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Adadi and S. Motahhir (eds.), Machine Intelligence for Smart Applications, Studies in Computational Intelligence 1105, https://doi.org/10.1007/978-3-031-37454-8_2
21
22
Y. Ommane et al.
Keywords Recommender systems · Collaborative filtering · Crop recommendation · Crop selection · Machine learning · Agricultural management practices
1 Introduction Crop selection (CS) is one of the most critical elements that directly affect the final yield in agriculture. Hence, selecting an appropriate crop is a critical decision that a farmer ought to make, considering different environmental factors. Choosing an appropriate crop for a given farm is a difficult decision, that should consider a plethora of other variables influencing the final yield. Experts are frequently consulted to assist farmers with CS or Crop recommendation (CR); but, as this alternative is time consuming and expensive, it is not available and affordable for many farmers. The use of recommender systems (RSs) in agricultural management has recently brought some captivating and promising results. RSs have experienced a huge growth in the last decade for their enormous benefits in supporting user’s needs, through finding the most suitable items based on information extracted from a collection of historical data. These systems also play an important role in decision-making, helping users to maximize profits or minimize risks. Today, RSs are used in many digital companies and are applied in different areas, such as: healthcare systems, education, customer segmentation, fraud detection, and financial banking [1]. RSs are being used in CR to provide farmers with better decisions. However, the CR framework does not have a detailed classification scheme for its algorithms and features, mainly due to the diverse approaches proposed in literature, as well as the absence of a Systematic Literature Review (SLR) dedicated to this issue. Indeed, currently available studies address mainly the general use of ML in crop yield prediction [2, 3]. Therefore, it is difficult and confusing to choose an RS algorithm and input parameters that fit one’s need, when developing a crop RS. Moreover, researchers may find it challenging to track the use and the trends of RSs algorithms in agriculture for this specific problematic. For these reasons, this SLR comes to fill the gap to the best possible. We decided to present a fair, unbiased, and credible SLR, that identifies all relevant and high-quality studies addressing the integration of RSs in CS with the following main objectives: – – – – –
Identify trends of RSs algorithms in CR. Classify the main techniques that were used in CR. Classify the main input features. Identify evaluation criteria and evaluation approaches that have been used. Specify the current challenges.
This paper is organized as follows: Sect. 2 gives a general overview of RSs. Section 3 presents the SLR’s protocol. In Sect. 4, an analysis and the results obtained for published RSs models for CS are provided. Section 5 covers a discussion of the
Machine Learning Based Recommender Systems for Crop Selection …
23
current achievements and current challenges. Finally, we conclude the analysis and present future perspectives and future research directions recommendation.
2 Overview of Existing Recommender Systems The objective of a recommender system is to provide the user with relevant recommendations according to their preferences. It drastically reduces the time needed for the user to search for items that are most interesting for him, and to find items that are likely to interest him but that he might not have paid attention to. Recommendation systems have been defined in several ways. The most popular and general definition that we quote here is that of Burke [4]: “a recommender system is a system capable of providing personalized recommendations or of guiding the user to interesting or useful resources within a large data space”. The information domain for a general recommendation system consists of a list of users who have expressed their preferences for various items. A preference expressed by a user for an item is called a rating, and is often represented by a triplet (user, item, rating). These notes can take different forms. However, most systems use ratings in the form of a scale of 1 to 5, or binary ratings (like/dislike). The set of triples (user, item, note) form what is called the note matrix. The pairs (user, item) where the user did not give a score for the item are unknown values in the matrix. Firstly, there are non-customized recommendation systems that do not depend on the user for making recommendations. In non-customized recommendation, the used algorithms are: Top Popular, which recommends the top items (e.g., movies) with the highest ratings, and Product Association, which recommends the best combinations of items that are frequently bought together [5]. Customized approaches are techniques that provide a recommendation to users based on ratings or content information. These techniques are either focused on characteristics for the individuals (or group) or characteristics for the product or service they are buying. In customized RS several methods are used (Fig. 1). Knowledge-based (KB) filtering is a technique that employs explicit knowledge to identify user’s preference based on knowledge on items, user, and matching between both. It is specifically effective in cases with less data on activity history of users [6]. RSs that are based on demographic information of users suggest a list of items that have good feedback from the users who are demographically similar [7]. The advantage of a demographic technique is that it does not require a history of user’s feedback. Most of the prior research had adopted the collaborative filtering (CF) approach, which employs a family of algorithms that calculates the utility and the rating for an item by a given user. Two general classes, as illustrated in Fig. 1, were suggested in literature [8]. First, Memory-Based algorithms, where the predicted value is estimated as a simple linear combination of ratings and weights, either explicitly or implicitly. The weights can reflect distance, correlation, or similarity between either users or items, this similarity function is the hyper-parameter that affects the prediction
24
Y. Ommane et al.
Fig. 1 Classification of recommendation systems techniques
quality. Several authors adopt Pearson, Euclidean and Cosine functions as similarity criteria, others use genetic algorithms to find the most suitable combination of weight vectors [9, 10]. Typical examples of this approach are neighborhood-based CF and item-based/user-based top-N recommendations. Memory Based algorithms are not always as fast and scalable as they are meant to be, especially in the context of systems that generate real-time recommendations based on very large datasets. To achieve these goals, Model Based RSs are used. Model Based CF involves building a model on the ratings dataset. In other words, information is extracted from the dataset in the form of a “model” to make recommendations, without having to use the complete dataset every time. This approach potentially offers the benefits of both speed and scalability. Two methods are generally used, one deals with the task from a probabilistic perspective, either by the calculation of the expected value of a rating given the user’s historical data, or using dimensionality reduction techniques, such as: matrix factorization (MF); to model the latent factor space and user/item interactions. In the same context, many existing studies in literature have examined the possibility of exploiting deep neural networks (DNN) architectures [11–14], Convolutional Neural Networks [15, 16], Recurrent Neural Networks [17] and Auto-Encoders [18, 19]. To capture more complex and nonlinear relations on the ratings dataset, there are many Model-Based CF algorithms. Bayesian networks, clustering models, latent semantic models, such as: Singular Value Decomposition (SVD), Probabilistic Latent Semantic (PLS) analysis [20], Multiple Multiplicative Factor (MMF), Latent Dirichlet Allocation (LDA) [21] and Markov Decision Process (MDP) based models that includes Contextual Bandits approach [22, 23] and Reinforcement Learning (RL) [24]. Nonetheless, CF algorithms suffer from three common problems, namely: – Cold start reflects the inability of suggesting recommendations for a new user or item in the absence of initial data. – Sparsity problem occurs when available data is insufficient to identify similar users.
Machine Learning Based Recommender Systems for Crop Selection …
25
– Scalability problems happen when the RS’s performance and latency decrease drastically with the increase of the number of users and items in the system. Finally, one way to transcend these hassles is the combination of CF and ContentBased Filtering (CBF) methods. (CBF) methods are another family of algorithms that make recommendations based on user preferences of product features, in Hybrid Filtering (HF) techniques [25, 26]. Due to the successful use of RSs in various advertising sectors, they have been applied to solve a variety of problems in the agricultural sector. In [27] authors proposed an ontology based on an RS that helps to identify the pests affecting a crop and their treatments. In [28] a cultivation calendar RS for wheat cultivation in Egypt based on climate data is developed. In another work [29], a hybrid technique for recommendation of agricultural products to buyers is used. In [30] a CF web-based RS was designed to provide help, such as: financial help, irrigation facilities and insurance to the farmer’s crops. CS is one of the fundamental issues that have a strong influence on farmer’s revenue, and the applications of recommendation techniques has shown significant progress recently. Henceforth, the remainder of this analysis will focus on the scientific literature related to crop recommendation.
3 Research Formulation To present a clear review of the recommendation techniques applied in agriculture, we followed the SLR protocol adopted in PRISMA [31], in its latest version (2020). In the following sections we formulate the Research Questions (RQs) addressed in this SLR. Then, we explain the adopted search strategy for collecting scientific papers, followed by the exclusion criteria that serve as a filter for relevant papers for our review. Finally, in the data extraction phase, the information needed for the analysis of the selected papers is extracted.
3.1 Questions Formulation To accomplish the objective of this SLR and get a full analysis of CR techniques, we defined the following RQs: – – – – –
RQ1. How did research about CR evolved over time? RQ2. What are the main techniques used in literature for CR? RQ3. What are the main input features used for CR? RQ4. Which evaluation parameters and evaluation approaches have been used? RQ5. What are the current challenges in CR?
26
Y. Ommane et al.
3.2 Search Strategy To identify relevant studies, we first started by performing a selection of the major scientific databases, such as: Google Scholar, ACM, Springer Link, IEEE, Wiley, Emerald, etc. There are some synonyms that indicate CR Systems. In this SLR, we consider terms that replace “recommendation” by “selection” or “suggestion”. To retrieve studies that use new techniques that are based on agricultural data, we used the terms “Artificial Intelligence”, “Machine Learning”, “Deep Learning”. All these terms were featured in Search Queries (SQ), such as: Query = (Crop AND (Recommendation OR Selection OR Suggestion) AND (“Artificial intelligence” OR “Machine Learning” OR “Deep Learning”)).
3.3 Exclusion Criteria (EC) To strengthen the validity of the SLR, we considered only studies published between 2010 and 2020. We maintained studies that have no evaluation section due to the relative scarcity of research publications in the CR field. We adopted the following EC: – – – – – –
EC1. Studies must be peer-reviewed articles or proceedings. EC2. Studies must be published in a conference, journal, press, etc. EC3. Letters, notes, and patents are not included in the review. EC4. Graduate reports are not considered. EC5. We considered only studies in English. EC6. Studies that do not describe their proposed approach in a proper way were not considered in the review. – EC7. Surveys and systematic literature reviews will not be considered.
3.4 Data Collection Process To answer the RQs, data from the selected articles have been collected and structured. The extracted information focused on verifying whether the studies meet the requirements stated in the exclusion criteria or not. The redeemed information are as follows: – – – – – – –
Paper reference, Year and type of publication, The Indexing Database, The country of origin for the study, The models that are used to address the problem, The inventory and description of crops and features used in each study, Performance measures used to evaluate the proposed models.
Machine Learning Based Recommender Systems for Crop Selection …
27
4 Results of the Study This section presents the harvest of the selection process; we present the information matrix along with the results for each RQ.
4.1 Filtering Process Based on the results from the aforesaid scientific databases, and excluding duplicates from different databases, 89 papers were identified. We followed a filtering process to eliminate the articles that do not match our selection criteria. We first excluded 18 records because they were not indexed in well-known academic indexing services. Afterwards, we started scrutinizing the articles’ content by reading the title, abstract and keywords, after that we kept 50 articles for further investigation, in addition to 2 more articles that were found in references. By full reading of the 52 articles, we excluded another 12 articles, as they are either not clearly relevant, or are out of scope. Thus, we ended up with 40 papers for synthesis and analysis. Figure 2 illustrates a flow chart of the paper filtering process.
Fig. 2 Flow chart of identification process of relevant CR articles
28
Y. Ommane et al.
4.2 Literature Review Information Matrix (LRIM) One of the major problems that farmers face in the beginning of every agricultural season is the selection of a suitable crop that would produce a better yield. This process is usually done based on the farmer’s experience, or with the help of an agronomist. Assisting this process was the objective of several papers throughout the past few years. Table 1 puts forward the Literature Review Matrix (LRM) presenting the studied articles addressing this problem, based on different approaches and techniques.
4.3 How Did Research About CR Evolved Over Time? The rise of new technologies in solving agricultural problems is an imminent fact. We can see in Fig. 3a the distribution of selected studies over time, and we notice a remarkable increase in the number of studies related to CR. Figure 3b illustrates the distribution of papers based on source database. Most of the selected papers were published in IEEE or Springer, and fewer papers were found in Wiley database. Furthermore, Fig. 3c presents the proportion of each type of publication, nearly 62% of papers were published in imminent conferences, and about 31% come from journal issues, while only 7.14% are book chapters, which enforce the quality of the publications included in the SLR.
4.4 What Are the Main Techniques That Were Used in the Literature for CR? RSs are generally classified into three types: CBF, CF, or HF. The CF based model was used the most. As discussed in Sect. 2, CF tries to compress the entire database into a model, then performs its recommendation task by applying reference mechanisms from this model. We identify two common approaches for MBCF: clustering and classification. Clustering CF assumes that users of the same group have the same interest, so they are partitioned into groups called “clusters”. The authors in [61] proposed a K-means clustering (KMC) algorithm, which is an unsupervised learning algorithm used to find fertilizers with NPK contents that are the nearest to the requirements of a specific crop. First, it calculates the required amount of fertilizer, then the algorithm forms clusters of similar fertilizers based on the Euclidean distance. Therefore, fertilizers in clusters with minimum distance are recommended to farmers. The recommendation task can be viewed as a multiclass classification problem, which uses a classifier supervised learning algorithm that maps the input data to a specific output. A variety of these classifiers were tested on agricultural data. In this context, this study [48] carried out a comparative experiment on data
ANN, and back-propagation
CSM
2014 This work Integrated – Crops: rice artificial neural networks – Features: rainfall, (ANN) with temperature, geo-graphical information elevation and slope system (GIS) to assess the suitability of land to cultivate a selected crop
2015 A technique named CS – Crops: seasonal method to select a crops, whole year sequence of crops based crops, short time on crop proprieties to plantation crops and improve net yields rate of longtime plantation crops to be planted over a crops – Features: geography season of a region, weather conditions, soil type and soil texture
[34]
Strengths
CSM method retrieves all possible crops that are to be sown at a given time stamp
High consistency in predicting crop suitability map
Multi-level LFDN The method can rank the actions and alternatives to select the appropriate one
Models and techniques
[33]
– Crops: wheat, corn, rice, and faba bean – Features: temperature, water, marketing and soil
Dataset
2014 A multi-level linguistic fuzzy decision network (LFDN) method is applied to a real case dataset to decide the cultivated crop among four crops
Contribution
[32]
Reference Year
Table 1 Literature review information matrix (LRIM)
No performance metric was used
Performance
No evaluation metrics or experiments have been applied to assess the efficiency of the proposed system
(continued)
No performance metric was used
Only 4 parameters – Mean squared were used to assess error (MSE): the suitability of the 0.113 – Accuracy: crop 83.43%
The dataset used was not described. The performance metrics are lacking
Weaknesses
Machine Learning Based Recommender Systems for Crop Selection … 29
2016 The proposed system is – Crops: 44 crops that ANN and fuzzy designed to predict the have been considered logic (FL) most suitable crops for a – Features: crop name, suitable rainfall, given farm, and to suggest temperature, cost, farming strategies, such soil, and pH as: mixed cropping, spacing, irrigation, seed treatment, etc. along with fertilizers and pesticides
[37]
Shannon’s entropy method and VIKOR method
2016 A hybrid soft decision – Crops: paddy, model has been developed groundnut, to take decisions on sugarcane, cumbu agriculture crop that can and ragi – Features: be cultivated in each twenty-seven input experimental land criteria, namely soil, water, season, input (6 sub criteria), support, facilities, and risk
Models and techniques
[36]
Dataset
2015 A rule system is Crops: 118 rice varieties Rule system developed to help farmers Features: 7 features make choices among rice varieties using the crop and the properties
Contribution
[35]
Reference Year
Table 1 (continued) Weaknesses
The extraction of crop growth data using FL
The model used deals with incomplete or missing data and inconsistency problems
More agricultural parameters can be identified to be included in the system
The model is trained (150) and tested (25) on a small dataset
The set of production The evaluation is rules is computed very restricted with KB and farmer’s (only 50 queries) land profile to infer suitable rice varieties
Strengths
– Precision: 34% of crops had a value from 0 to 0.2 and 30% from 0.8 to 1 – Recall: 39% of crops had a value from 0 to 0.2 and 40% from 0.8 to 1 (continued)
– Accuracy: 95.2% – Precision: 88.66%
Accuracy: 83.4%
Performance
30 Y. Ommane et al.
2017 This article attempts to predict crop yield and price that a farmer can obtain from his land, by analyzing patterns in past data
[40] – Crops: not mentioned Non-linear – Features: crop areas, regression types of crops cultivated, nature of the soil, yields and the overall crops consumed
2017 A system that can detect – Crops: not mentioned Pearson the user’s location then correlation – Features: crop recommend top-k crops similarity (PCS) growing period based on the seasonal database, thermal information and crop zone database, production rate (CPR) of physiographic each crop of similar farms database, seasonal crop database and CPR database
Ensemble, Naive Bayes, random tree, CHAID and KNN
Models and techniques
[39]
Dataset
2016 The authors applied the – Crops: millet, majority voting technique groundnut, pulses, using random tree, cotton, vegetables, CHAID, K-nearest banana, paddy, neighbors (KNN) and sorghum, sugarcane, Naïve Bayes as base coriander – Features: depth, learners for CR Texture, pH, Soil color, permeability, drainage, water holding and erosion
Contribution
[38]
Reference Year
Table 1 (continued) Weaknesses
The developed system uses the demand as input
The developed system can recommend appropriate crops in a satisfactory way
Accuracy: 88%
Performance
The recommendation model is not tested or evaluated on a dataset
(continued)
No performance metric was used
The model does not – Precision: take into 72% consideration the – Recall: 65% existing nutrient in the farm’s soil
Large number of soil Fertilization data attributes are used for like NPK values the prediction present in soil are not used
Strengths
Machine Learning Based Recommender Systems for Crop Selection … 31
2017 This paper develops a fuzzy based agricultural decision support system which helps farmers to make wise decisions regarding CS
[43]
Decision matrix, dominance-based rough set approach and Johnson’s classifier
Fuzzy based expert system
Models and techniques
– Crops: not mentioned Mamdani fuzzy inference system – Features: 15 parameters
2017 A decision-making tool is – Crops: paddy, developed for selecting groundnut, and the suitable crop that can sugarcane – Features: 26 input be cultivated in each variables were agricultural land classified into six main variables, namely soil, water, season, input, support, and infrastructure
[42]
Crops: 20 crops Features: 23 features
Dataset
2017 An FL-based expert system is proposed to automate the CS for farmers based on parameters, such as, the climatic and soil conditions
Contribution
[41]
Reference Year
Table 1 (continued)
The system is deployed at many places and results are found to be accurate
The validation results showed that the developed tool has sufficient predictive power to help the farmers select suitable crops
The study uses an important number of features to select the suitable crop
Strengths
Performance
(continued)
No empirical study No performance was conducted to metric used assess the quality of the model
The study is only Accuracy: 92% based on one metric to evaluate the model
The proposed No performance system is extremely metric was used customizable instead of a more ad hoc system
Weaknesses
32 Y. Ommane et al.
2018 Proposes a new system for CR based on an ensemble technique
[46] – Crops: cotton, sugarcane, rice, wheat – Features: soil type, pH value of the soil, NPK content of the soil, porosity of the soil, average rainfall, surface temperature, sowing season
– Crops: aus rice, aman rice, boro rice, jute, wheat, and potato – Features: 46 parameters
2018 Suggests using deep neural network for agricultural CS and yield prediction
[45]
Dataset
2017 Proposed two – Crops: corn, wheat, mathematical soy, barley formulations, the first one – Features: the land available to grow for the determination of crops, the sequence crop-mix that maximizes of operations required the farmer’s expected for each crop, the profit, and the second corresponding time model that maximizes the windows, the average expected profit availability of tools under a predefined and tractors, their quantile of worst operating costs and realization the working speeds
Contribution
[44]
Reference Year
Table 1 (continued)
The proposed model significantly increases the worst outcomes with respect to the farmer’s solution
Strengths
Ensemble model (RF, Naive Bayes and linear SVM and majority voting technique)
Lack of details about the parameters of the model and the complete list of input parameters
Only one farm was used for testing, and the model could be enriched by incorporating explicit decisions about other resources
Weaknesses
Accuracy ≥90%
The model’s expected profit is higher than the farmers
Performance
(continued)
Using three different Only four crops Accuracy: and independent were used for 99.91% classifiers enables the training and testing system to provide more accurate predictions
DNN, logistic The proposed model regression, has a relatively high support vector accuracy machine (SVM) and random forest (RF)
Natural integer programming and maximization of the conditional value-at-risk (CVaR)
Models and techniques
Machine Learning Based Recommender Systems for Crop Selection … 33
2018 A system that gives the – Crops: rice farmer a prior idea – Features: regarding the yield of a temperature, particular crop by humidity, location, predicting the production and rainfall rate according to the location of the farmer and the past data of weather conditions
[49] Mamdani fuzzy model and cosine similarity (COS)
– Crops: not mentioned J48, BF tree, – Features: pH, organic OneR and Naive matter, K, EC, Zn, Fe, Bayes Mn, Cu and texture
2018 Investigation of the predictive performance for different data mining classification algorithms to recommend the best crop for better yield, based on a classification of soil under different ecological zones
[48]
Models and techniques
– Crops: 20 crops Decision tree (DT), KNN, RF – Features: soil type, soil pH, precipitation, and ANN temperature, location parameters
Dataset
2018 An intelligent system, called agro-consultant, which assists farmers in making decisions about which crop to grow
Contribution
[47]
Reference Year
Table 1 (continued)
Relying only on location and weather parameters for prediction
Comparison of the performance of four classification algorithms
A map view feature, where the farmers can view the sow decisions made by his neighboring farmers using a pop-up marker on the map
Strengths
Performance
The study focused only on rice production and has not considered the other climatic conditions
Recommending a class of crops instead of recommending a single crop
(continued)
No performance metric used
– Accuracy: 97% – Precision: 97% – Recall: 97%
Not considering Accuracy: 91% other economic indicators like farm harvest prices and retail prices
Weaknesses
34 Y. Ommane et al.
2019 A multi-class classification-based decision model is developed to assist the farmer in selecting suitable crops using rough, fuzzy, and soft set approaches
[52] – Crops: paddy, groundnut, sugarcane, cumbu and ragi – Features: 27 features
2019 A new datamining – Crops: 10 crops technique was proposed – Features: soil, crop, to cluster crops based on temperature, and their suitability compared rainfall to soil nature of different areas
[51]
– Crops: not mentioned – Features: elevation, temperature, fertilizer type, rainfall, field type, seed type and soil
Dataset
2019 Proposes to design a KB solution for building an inference engine for recommending suitable crops for a farm
Contribution
[50]
Reference Year
Table 1 (continued)
Dominance-based rough, grey relational analysis, fuzzy proximity relation, bijective soft set approach, Naive Bayes, SVM and J48
Data mining and hierarchical clustering
Part rule based classifier and expert’s knowledge
Models and techniques The evaluation in this study is done by unknown experts
Weaknesses
The validation test outputs were compared to agricultural experts
No performance metric was used
– Farmers accuracy: 82.2% – Domain experts – Accuracy: 95.23% – Agricultural extension Accuracy: 88.5%
Performance
(continued)
Only five crops – Accuracy: were used. 98.4% – Precision: According to the 92% study, the execution time shows that the model is relatively slow
Various datasets were Only ten crops in merged to extract eight different crops requirements locations of Coimbatore were used for prediction and evaluation
The model developed has the potential to increase the accuracy of KB system (part rule algorithm)
Strengths
Machine Learning Based Recommender Systems for Crop Selection … 35
2019 A mobile application that allows farmers to predict the region’s production for a specific crop
[55]
An interface was designed to enable access to necessary information for selecting the proper crop
The system ensures a better understanding of the environmental factors behavior and analyses the farmer actions, such as application of fertilizer or pesticide, it also takes global warming into consideration
Strengths
– Crops: not mentioned ARIMA method, An android linear regression application was – Features: soil type, temperature, rainfall (LR), SVR model developed to facilitate the farmers accessibility to the suggested model
2019 Suggests using ANN and – Crops: not mentioned ANN and SVM SVM for crop prediction – Features: rainfall, considering the minimum and environmental parameters maximum temperature, soil type, humidity, and soil pH value
Moving average, autocorrelation, and 3D cluster correlation
Models and techniques
[54]
– Crops: celery, water spinach, green beans, and daikon – Features: temperature, humidity, illumination, atmospheric pressure, soil electrical conductivity (EC), soil moisture content, and soil salinity
Dataset
2019 An intelligent agriculture platform that manages and analyses sensors data to monitor environmental factors, which provides the farmer with a better understanding of crop suitability
Contribution
[53]
Reference Year
Table 1 (continued) Performance
(continued)
The white noise for No performance ARIMA model was metric was used chosen as a random value in the range of 0–10% of the crop yield
Test evaluation was Accuracy done by comparing (ANN): 86.80% the predicted crop with the real ones, which is not accurate since the actual cultivated crop is not necessarily the optimal
The application of No performance the system analysis metric was used result isn’t automotive, and no artificial intelligence model was used
Weaknesses
36 Y. Ommane et al.
2019 A hybrid crop RS based on a combination of CF technique and case-based reasoning
2019 A model that can predict soil series with land type, according to which it can suggest suitable crops
[58]
[59] – Crops: not mentioned Weighted KNN, – Features: soil dataset Gaussian Kernel based SVM, and and crop dataset bagged tree
– Crops: not mentioned ANN and case – Features: temperature based reasoning (CBR) data, rainfall, solar radiation, wind speed, evaporation, relative humidity, and evapotranspiration
FL
2019 A standalone crop – Crops: 30 crops recommending device – Features: pH level, that detects soil quality soil moisture, soil and recommends a list of temperature and soil crops based on FL models fertility
Models and techniques
[57]
Dataset
2019 Application of learning – Crops: rice, corn, and LVQ vector quantization soybeans – Features: altitude, (LVQ), which is part of rainfall, temperature, the ANN method, to and soil pH provide recommendations from three types of plants
Contribution
[56]
Reference Year
Table 1 (continued)
Suggesting crops based only on class of soil series is very interesting
The presented model has a remarkable performance and rational accuracy of prediction
Stand-alone device gives faster and real-time soil property reading and crop suggestion
Comparison of the evaluation metric between expert recommendation and real data
Strengths
More focus on soil classification
Only weather parameters were considered
Less details about the model and its performance
Only three crops were used, rice, corn, soybean
Weaknesses
(continued)
Accuracy (SVM): 94.95%
Precision: 90% Recall: 93%
No performance metric used
Accuracy: 93.54%
Performance
Machine Learning Based Recommender Systems for Crop Selection … 37
2020 Proposes using hierarchical fuzzy model to reduce the classical system complexity with the huge number of generated rules
[63] – Crops: not mentioned Hierarchical fuzzy model – Features: sand, silt, clay, nitrogen, phosphorus, potassium, soil color, soil pH, soil electrical conductivity, rainfall, climate zone, and water resources
The number of generated rules was reduced from 439 to only 152
The model was evaluated using Multiple Performance metrics
2019 A hybrid RS based on two – Crops: 24 crops classification algorithms – Features: 15 features by considering various attributes
[62]
Naive Bayes, J48
The accuracy of the developed system is reasonably high
2019 An ontology-based – Crops: not mentioned RF recommendation system – Features: soil for crop suitability characteristics, recommendation based on weather conditions region and soil type and crop production
Strengths
SVM, DT and The authors logistic regression compared different models
Models and techniques
[61]
– Crops: 15 crops – Features: soil color, pH, average rainfall, and temperature
Dataset
2019 The article addresses the problem of selecting the most suitable crop for a farm, by applying different classification algorithms
Contribution
[60]
Reference Year
Table 1 (continued)
No evaluation metrics are provided
Farmers cannot locate their exact coordinates
Only 4 parameters were considered from CR
Only four parameters were considered as input to the model
Weaknesses
(continued)
No performance metric was used
– Accuracy (J48): 95% – Recall (J48): 96% – F-measure (J48): 86%
Precision: 65%
The best performance is 89.66% and was achieved using the SVM classifier
Performance
38 Y. Ommane et al.
2020 Implementation of a – Crops: 23 crops fuzzy-based rough set – Features: 16 features approach to help farmers in deciding on CS in their agriculture land
2020 A CR system according to – Crops: 24 crops multiple properties of the – Features: soil types, crop and land pH, electric conductivity, organic carbon, nitrogen, phosphorus, sulfur, zinc, boron, iron, manganese and copper
[65]
[66]
Dataset
2020 An application that helps – Crops: peach, pear, selecting the most apricot, and almond convenient type of crops – Features: relative humidity, radiation, in a certain zone wind speed, considering the climate temperature, wind conditions of that zone, direction, cooling the production, and the units, sunlight, needed resources for each rainfall, accumulated crop radiation, and wind run
Contribution
[64]
Reference Year
Table 1 (continued)
The performance is measured using different evaluation metrics
The farmer’s recommendation request is made using internet of things (IoT) devices
Strengths
Property matching Fast and simple algorithm
FL
Fuzzy system
Models and techniques
– Accuracy: 92% – Precision: 93% – Recall: 92% – F-measure: 91%
No performance metric used
Performance
(continued)
Only soil properties – PCS: 4.80% were considered as – COS: 6.45% input to the model
The suggested method can be tested with a wide set of new crops
The recommendation module can be scaled to consider other types of additional information like soil parameters
Weaknesses
Machine Learning Based Recommender Systems for Crop Selection … 39
2020 Treats the integration of – Crops: corn, clover, AHP and POPSIS with sugar beet and wheat GIS to determine the most – Features: 63 land map units and their suitable crops for parcels chemical, physical, for land consolidation topographical, and areas socio-economic features
Analytic hierarchy process (AHP), technique for order preference by similarity to ideal solution (TOPSIS)
RF and SVM
[69]
The data includes 1530 soil samples and 13 types of cultivated land crops
2020 Proposes a clustering center optimized algorithm by SMOTE, then use an ensemble of RF and weighed SVM to predict the recommended crop
Models and techniques
[68]
Dataset
2020 A CS method to – Crops: 10 crops RF maximize crop yield – Features: soil type, based on weather and soil soil nutrients, soil pH parameters value, drainage capacity, weather conditions
Contribution
[67]
Reference Year
Table 1 (continued) Weaknesses
The integration of AHP, TOPSIS and GIS functions provides an effective platform to determine the suitability
Classification of crops based on soil analysis
No performance metric was used
Performance
Several criteria can be added such as meteorological and irrigation
(continued)
No performance was used
The study reference – Accuracy: range is limited 98.7% – Precision: 97.4% – F1-score: 97.8%
The soil and Only Four soil predicted weather parameters were parameters are used considered collectively to choose suitable crops for land
Strengths
40 Y. Ommane et al.
2020 A system for predicting the crop which has maximum yield per unit area in a district
2020 An FL-based CR system to assist farmers in selecting suitable crops
[71]
Contribution
[70]
Reference Year
Table 1 (continued) Models and techniques
– Crops: paddy, jute, potato, tobacco, wheat, sesamum, mustard and green gram – Features: 11 soil parameters, elevation and rainfall
FL
– A dataset published RF by the Government of Maharashtra, India, containing approximately 246 100 data points – Features: 7 parameters on the time span 1997 to 2014
Dataset The performance could be much better when considering more variables
Weaknesses
The validation was No explanation of based on a cultivation how the index (CI) membership functions of the inputs and outputs were derived from the dataset
The algorithm works even when the variables are mostly categorical
Strengths
Accuracy: 92.14%
Normalized root mean squared error (NRMSE): 49% (median value)
Performance
Machine Learning Based Recommender Systems for Crop Selection … 41
42
Y. Ommane et al.
a) Distribution of selected papers since 2014
b) Distribution of papers based on selected database
c) Distribution by type of publication.
Fig. 3 Distribution of selected papers
instances from Kasur district, Pakistan for soil classification using J48, BF Tree and OneR, that are a variety of DT based models, which is the most used technique in our literature survey. Besides, Naive Bayes Classifier (NBC) has a significant relevance, mainly because it encodes dependencies among different features by which it connects the causal relationships between items. On the other hand, [54] has investigated the use of SVM and ANNs. The results indicate that the ANN model captures non-linearities among features of the dataset, marking the best accuracy and prediction rate compared to SVM. Another technique for CS that can improve the model’s accuracy is ensemble learning, [46] exploited this technique to build a model that combines the predictions of multiple ML algorithms together and recommends the right crop with a high accuracy. The independent base learners used in the ensemble model are RF, NBC, and Linear SVM. Each classifier provides its own set of class labels with acceptable accuracy. The labels class of individual base learners are combined using the majority voting technique. The CR system classifies the input soil dataset into the recommendable crop type: Kharif and Rabi (Autumn and Spring). One of the most promising models in CF is Fuzzy Logic (FL), which
Machine Learning Based Recommender Systems for Crop Selection …
43
extracts IF–THEN rules from the provided data using a membership function and linguistic variables that express human knowledge. The authors in [49] proposed a fuzzy based model that uses 27 rules with 3 modalities: Low, Medium, and High. In this traditional single-layer fuzzy system, the rules are exponentially increased when the system’s parameter increases, and a larger rule base affects the system performance and transparency. Therefore, [63] developed a multi-layer system by using the fuzzy hierarchical approach. The hierarchical fuzzy model was applied in the same Mamdani1 fuzzy inference system for a suitable CR system. The CR system has 12 input variables, and it was decomposed into six fuzzy subsystems, then arranged by priority. The results show that a hierarchical CR system provides a better performance than a traditional fuzzy CR. CFMB models have a low frequency in the reviewed studies, even though they represent the most common approach in RSs. Yield prediction is based on similarity relationships among items (farms or crops), in terms of collected production yield. For instance, [39] proposed a model consisting of calculating the PCS between farms using the information stored in the crop growing period database, the thermal zone database, and the physiographic database, then, the algorithm selects the top n similar farms. The seasonal information and CPR of each crop of similar farms are used for filtering the first appropriate list to the context. Finally, they recommend the top k crops to each user respectively. Another study [49] used the similarity approach, in which they developed a system that gives the farmer a prior idea regarding the yield of a particular crop, by predicting the production rate. The COS measure is used to find similar farmers in terms of location from the database. Then, the obtained list of farmers, that are similar to the querying farmer, is used as the input data for the fuzzy algorithm. CBF, one of the most significant models in RSs, are of high importance for CR, as well as for yield estimation. Examples of CBF model applications include [66]. The model of this study is based on the contents that use soil and crop properties, then suggests a list of five high priority crops based on the corresponding properties between the crop and the land for matching soil properties. The algorithm takes two inputs, the land soil details and the required property value for each crop. Primitively, the algorithm computes the similarity between the land and the crop, based on their properties to predefine a range. If the comparison falls into a predefined range, they generate a rank for the combination of crops and lands. In another study [51], authors developed a new data mining technique to cluster the crop based on the suitability of a crop against the soil nature of areas. Features are extracted from the datasets using five different feature extraction metrics, such as, pH distance calculation, NPK (Macro nutrition distance calculation), MICRONUT (Micro nutrition distance calculation), water requirements, and temperature requirements. Then, the crops are clustered using hierarchical clustering based on the vectors into three groups, namely: most suitable, less suitable, or least suitable. 1
First introduced as a method to create a control system by synthesizing a set of linguistic control rules obtained from experienced human operators. In a Mamdani system, the output of each rule is a fuzzy set.
44
Y. Ommane et al.
HF is another significant category of models used in CR. In the first study [64], authors presented a new method, which is integrated within an IoT system that is developed to advise farmers which crop type will generate more yield. A fuzzy clustering technique is proposed to groups that have been characterized by their weather conditions. The extracted knowledge forms the model and the rules’ engine. Finally, the RS generates an ordered list of crops that are suitable in descending order. In the second study [37], the authors developed a CR (hybrid) system, which utilizes FL to choose from 44 crop rules. The system is based on FL, which gets input from an ANNs based weather prediction module. An agricultural named entity recognition module is developed using conditional random field to extract crop conditions data. Further, cost prediction is established based on a LR equation to aid in ranking the crops recommended. Table 3 shows how many studies use the approaches described earlier in Sect. 2, as well as the studies themselves. As an outcome, a significant number of CF approaches when developing RSs are observed. Over half of the reviewed studies indicated that CF is the most used approach, with a stronger emphasis on a Model Based method. This may be due to the availability of historical datasets for farmers linked to the marked dominance of CF in the last years. Figure 4 traces the timeline of the publications, this latter confirms that CF with a Model Based method has continual growth. The graph shows that there has been a slight increase in the two recent years in this field and the number of studies is likely to increase after 2020. Another important conclusion drawn from Table 2 and Fig. 4 is the scarcity of research efforts focused on other filtering methods. Nevertheless, some studies showed that the CBF and HF give more accurate recommendations overall than all other types of filtering. However, throughout the years, the research pace on these types of filtering has been relatively low.
Fig. 4 Timeline of articles by recommendation approaches
Machine Learning Based Recommender Systems for Crop Selection …
45
Table 2 Articles by type of recommendation technique Classification of RS Number of studies References CF/model-based
28
[32–34, 38, 40–43, 45–47, 52–57, 59–63, 65, 67–71]
CBF
5
[35, 44, 48, 51, 66]
HF
5
[36, 37, 50, 58, 64]
CF/memory based
2
[39, 49]
Table 3 Number of articles by type of ML algorithm used Number of studies
References
DT
13
[38, 45–48, 50, 52, 59–61, 67, 68, 70]
FL
11
[32, 37, 41, 43, 49, 52, 57, 63–65, 71]
ML algorithm
SVM
8
[45, 46, 52, 54, 55, 59, 60, 68]
ANN
6
[33, 37, 45, 54, 58, 59]
NBC
4
[38, 46, 48, 52]
Regression
4
[40, 45, 60, 69]
KNN
3
[38, 47, 59]
Ensemble
2
[38, 46]
KMC
1
[61]
PCS
1
[39]
COS
1
[49]
LVQ
1
[56]
Table 3 shows the distribution of applied ML algorithms in this study. Some papers applied more than one ML algorithm. Peculiarly, the most applied ML algorithm is DT-based. However, this SLR does not differentiate between different DT-based algorithms (J48, Part, RF, etc.…) in the analysis. The other widely used algorithms are SVM and FL algorithms. Some ML algorithms had a low rank in this SLR despite their popularity. It is the case for similarity methods and regression algorithms. Thus, these algorithms are not being investigated enough, which opens opportunities for future studies in CR field, to fill this gap. According to our study, the most popular type of RSs in agricultural applications is CF. A collaborative shift with a product-based focus is another name for this type of algorithm. In this Filtering, users are filtered and associated with each User in place of things. Only users’ behavior is considered in this system. Their profile information and content are insufficient. Users who rate products favorably will be linked to other users who act in a similarly favorable manner. Comparing CF to CBF, CF offers a variety of benefits. Among them are: 1. The item’s content is not important for relaying the entire context. 2. If an item’s information is not readily available, it is still possible to rate the item without having to purchase it.
46
Y. Ommane et al.
3. The user’s preferences and aspects are not considered when the focus is on the content. 4. To determine the links between the buyers and to make the best proposal based on the similarities between the users, CF depends on the ratings of other customers. In contrast, the CB technique only requires a user’s profile and goods to be examined. 5. Since most of the unknown users share your preferences, CF provides ideas. However, in CB, you will receive item recommendations based on feature lists. 6. New goods are recommended by numerous users without any specifications, in contrast to CF. 7. The biggest issue with CB is the cold start, which appears when there are not many rating records in the recommendation system. In this situation, CBF is a great solution to the issue. 8. The term used in the text to represent the item may not be representative, which is one disadvantage of the CB approach. Making flawless recommendations to users based on the exact ratings is another drawback of this strategy. The following list includes several CF’s shortcomings: 1. The CB system is only an idea for a design that considers the user’s current interests. As a result, one could also argue that this system is solely constrained by the interests or desires of current users. 2. This model only has outstanding hand setup features since the item representation of the features is hand-setup in comparison, which needs sufficient domain knowledge. 3. If the item’s content is inadequate to accurately describe the item, the final suggestion will be erroneous. 4. Since the item and profile attributes must match, the CB approach offers nothing in the way of innovation. A great CBF technique must surprise you. 5. Strong user profile information must be entered into the system for the algorithm to deliver the correct recommendation. 6. Scalability is the main issue with the suggested CF solution. Given the site’s growing user base, the system must offer the top user reliable suggestions. CF is a filtering method that is frequently used in systems that are suggested. Comparatively speaking to a CBF system, a CF system might inherently filter material that the system could not describe or evaluate, it may also suggest current information. The CF strategy is based on gathering and analyzing a lot of data regarding users’ actions, interests, or conduct to predict what users will like based on their connections to other users. Item-to-item CF (those who buy x also buy y), a method promoted by Amazon suggested system, is one of the most popular types of CF. Each approach, whether a CF system or a CBF system, has several advantages and disadvantages. As a result, many firms have adopted a hybrid system to combine the benefits of both methods, as previously discussed, and make every effort to offer their users more approachable and accurate advice.
Machine Learning Based Recommender Systems for Crop Selection …
47
4.5 What Are the Main Input Features? ML models are data-depending models. Without a constitution of high-quality training data, even the most performant algorithms theoretically will not give the expected results. Indeed, robust ML models can be useless when they are trained on inadequate, inaccurate, or irrelevant data. In the same context a wide variety of inputs were suggested in the reviewed articles, Table 4 shows the classification of these parameters in six categories: (1) Geography: This category of inputs indicates the agroclimatic regions, which is a land unit suitable for a certain range of crops and cultivars. Table 4 shows that 19 papers built their RS using geographic data among other variables which confirms the importance of this type of input, mainly because it works as an identifier that is unique to every farm. (2) Weather conditions (WCs): Weather plays a major role in determining the success of agricultural pursuits. For farmers, timing is critical in the obtainment of resources, such as: fertilizer and seed, but also forecasting likely weather in the upcoming season, informing on how much irrigation is needed, as well as temperature that can affect crop growth. These factors can be determined by recording hourly, daily, or weekly, temperature, rainfall, solar radiation, wind speed, evaporation, relative humidity, and evapotranspiration. In this SLR, WCs were used in 75% of the reviewed articles as Table 4 indicates. (3) Soil proprieties (SP): All soils contain mineral particles, organic matter, water, and air. The combinations of these components determine the soil’s quality, which depends both on its physical properties (texture, color, type, porosity, bulk density, etc.) and chemical properties (soil pH, soil salinity, nutrients availability, soil electrical conductivity etc.). Table 4 confirms that soil characteristics are the mandatory inputs on which researchers-built crop RSs. Below we present descriptions for the main soil physical and chemical properties encountered in the literature:
Table 4 Input variables categories used for CR in literature Geography
Weather
Soil
Crop
Production
Market
Location, hill area, river ground, depth, region, elevation, slope
Temperature, rainfall, humidity, evapotranspiration, solar radiation, atmospheric pressure
PH value, nutrients availability, soil type, soil EC, texture, depth, color, bulk density
Needed nutrients, Seasonal information, Weather information, Crop damage
Yield per unit, profitability per unit
Demand, market price, cost, benefit
48
Y. Ommane et al.
Soil Physical Properties: • Soil texture: Refers to the size of the particles that make up the soil and depends on the proportion of sand, silt and clay-sized particles and organic matter in the soil, it can influence whether soils are free draining, whether they hold water and how easy it is for plant roots to grow. • Soil color: The surface soil varies from almost white through shades of brown and grey to black. Light color indicates law organic matter content while clave color indicates a high organic matter content. • Soil type: It describes the way the sand, silt and clay particles are clumped together. Organic matter (decaying plants and animals) and soil organisms like earthworms and bacteria influence soil structure. it is important for plant growth, regulating the movement of air and water, influencing root development and affecting nutrient availability. • Soil porosity: It refers to the pores within the soil. Porosity influences the movement of air and water. Healthy soils have many pores between and within the aggregates. Poor quality soil has few visible pores, cracks, or holes. • Bulk density: It is the proportion of the weight of a soil relative to its volume. Bulk density is an indicator of the amount of pore space available within individual soil horizons and it reflects the soil’s ability to function for structural support, water and solute movement, and soil aeration. Soil Chemical Properties: • Soil pH: Soil reactivity is expressed in terms of pH and is a measure of the acidity or alkalinity of the soil. More precisely, it is a measure of hydrogen ion concentration in an aqueous solution and ranges in soils from 3.5 (very acid) to 9.5 (very alkaline). The effect of pH is to remove from the soil or to make available certain ions. • Soil salinity: It is the salt content in the soil; the process of increasing the salt content is known as salinization. Salts occur naturally within soils and water. Salination can be caused by natural processes such as mineral weathering or by the gradual withdrawal of an ocean. • Nutrients availability: Sixteen nutrients are essential for plant growth and living organisms in the soil. These fall into two different categories namely macro and micronutrients. The macronutrients include Carbon (C), Oxygen (O), Hydrogen (H), Nitrogen (N), Phosphorus (P), Potassium (K), Calcium (Ca), Magnesium (Mg), Sulphur (S) and are the most essential nutrients to plant development whereby a high quantity of these is needed. Micronutrients on the other hand are needed in smaller amounts, however they are still crucial for plant development and growth, these include Iron (Fe), Zinc (Zn), Manganese (Mn), Boron (B), Copper (Cu), Molybdenum (Mo) and Chlorine (Cl). Nearly all plant nutrients are taken up in ionic forms from the soil solution as cations or as anions. • Soil Electrical Conductivity (SEC): It is an indirect measurement that correlates very well with several soil physical and chemical properties. Electrical conductivity is the ability of a material to conduct (transmit) an electrical current. As
Machine Learning Based Recommender Systems for Crop Selection …
49
measuring soil electrical conductivity is easier, less expensive, and faster than other soil properties measurements, it can be used as a good tool for obtaining useful information about soil. (D) Crop propriety (CP): Some crops are very labor-intensive. Some crops require more skill than others. Some crops are riskier than others (high profit if it’s a good year but high chance of crop failure if the weather is bad), and some farmers are more able to cope with those risks. Each crop has its suitable amount needed of nutrients, optimal weather conditions and optimal soil properties. Unfortunately, there is no universal structure or data source for this kind of crop information, so researchers in different reviewed papers uses data mining techniques to extract knowledge from raw data, where FL shows high quality results, because of its rules generating model. (E) CPR: There are a lot of crop types produced in farms, not all of them are suitable for producing in all areas. So, considering CPR of each one of them for every farm is very important to recommend and predict the crop productivity. Almost 90% of the reviewed papers are using supervised learning, where crop yield or crop profitability, in ton/hectare or kg/hectare, were used as the dependent variable. (F) Market: Even with a high yield, a decision about recommending the crop cannot be taken without knowing its price for the period of sale, as well as what its cost is. The price of a specific crop is determined through demand/supply in the market; however, it can be predicted using historical data. In the other hand, cost can only be given by the farmer himself, ultimately it remains difficult to gather such data. Table 5 confirms this claim with just four papers using market information. Table 6 presents the number of papers for each variable; it indicates that from all the features cited above temperature and rainfall are the widely used parameters. This finding is coherent with the fact that WCs have an important impact on the CPR and determine the soil’s sustainability, nevertheless, it remains necessary to extract other information to build an efficient CR. This information was grouped previously in the soil property category, and they are pH-value, soil type and nutrient availability where they were cited respectively in [11, 15] and [10]. Less important variables occurred in a range of 1–6 and they are a mix of all categories of features such as Table 5 Distribution of papers by feature classes Feature class
Number of studies
References
WCs
20
[32, 33, 35–37, 39–45, 50, 54, 57, 58, 60, 64, 67, 71]
SP
19
[32, 35–43, 45, 50, 57, 60, 65, 66, 68, 69, 71]
Geography
17
[33, 35–42, 45, 50, 54, 65, 67, 69–71]
CPR
9
[32, 35, 39, 40, 44, 45, 50, 64, 70]
CP
9
[35, 39, 44, 50, 54, 57, 64, 67, 69]
Market
4
[32, 35, 37, 40]
50
Y. Ommane et al.
Table 6 Distribution of papers by features Feature
Number of studies
References
Temperature
17
[43, 45–47, 49, 51, 52, 54–58, 60, 61, 64, 67, 70]
Rainfall
16
[45–47, 49, 51, 52, 54–56, 60, 61, 63, 64, 67, 70, 71]
pH-value
14
[43, 46–48, 51, 56, 57, 59, 60, 63, 65–67, 69]
Soil type
10
[45–48, 51, 52, 54, 61, 65, 67]
Nutrients
9
[43, 45, 46, 51, 59, 61, 65, 66, 68]
Humidity
6
[43, 45, 49, 53, 58, 64]
Yield rate
5
[49, 55, 61, 64, 70]
EC soil
5
[48, 53, 63, 66, 69]
Salinity
5
[43, 53, 59, 65, 69]
Crop type
4
[34, 35, 50, 54]
Pressure
2
[53, 58]
Soil color
2
[60, 63]
Elevation
2
[45, 71]
Soil porosity
1
[46]
Elevation for geography, salinity for soil characteristics and humidity in weather conditions. The number of articles included in this SLR could give a relevant order of variable importance evolved in a CR algorithm which is statistically supported by the limited number of research papers in precision agriculture dedicated to CR.
4.6 Which Evaluation Metrics and Evaluation Approaches Have Been Used? Several evaluation metrics have been used. Table 7 gives information about metrics used for evaluation techniques in the reviewed studies. This SLR is restricted to the CS, which makes classification metrics such as: Accuracy, Precision or Recall, the most popular performance metrics used in the studies of this SLR. Accuracy is the proportion of true results among the total number of cases examined, which has the highest number of occurrences in our SLR, more than precision, which refers to the fraction of relevant recommendation among the retrieved crops, and recall, which refers to the fraction of retrieved recommendation among all relevant crops. Another important result is the existing of studies evaluating their results by using regression error metrics such us: RMSE, MSE and the mean absolute error (MAE) metrics. The reason is that a study may use CPR as an output of the model developed, then, it chooses the crop which has the highest rate. What is striking in Table 7 is the remarkable number of papers that are not evaluated by any performance criteria. The most likely cause of this result is the difficulty
Machine Learning Based Recommender Systems for Crop Selection …
51
Table 7 Distribution of articles by performance metrics Metric
Number of studies
References
Accuracy
19
[35, 36, 38, 42, 45–48, 50, 52, 54, 56, 59, 60, 65, 71]
Precision
10
[36, 37, 39, 48, 52, 58, 65, 68]
Recall
7
[37, 39, 48, 52, 58, 65, 68]
F-measure
5
[37, 52, 65, 68]
Sensitivity
2
[36, 52]
Specificity
2
[36, 52]
MSE
2
[33, 45]
RMSE
2
[48, 70]
MAE
1
[48]
No metric
11
[32, 34, 40, 41, 43, 49, 51, 53, 55, 57, 61, 63, 64, 67]
in verifying whether the recommended crop is truly the correct one. For this reason, most studies seek the help of experts or farmers to judge the relevance of the suggested crops [50].
4.7 Current Challenges in CR This section puts into terms challenges encountered in the extant research. These challenges are observed through three layers: the proposed algorithm, the used data and the evaluation preferences. Most studies have almost exclusively focused on the exploitation of CF algorithms for classification and clustering, more precisely: DTs, SVM, ANNs and NBC. Several gaps and shortcomings were identified in these techniques, namely the cold start problem, where the system cannot draw any inferences for users or items about which has not yet gathered sufficient information. A closer look to the literature reveals that these proposed adaptations are very classical, since the field of RS has known a significant improvement due to entertainment companies, new algorithms were developed. For Instance, the Netflix Prize was an open competition for the best CF algorithm to predict user ratings for movies [72]. On September 21, 2009, the grand prize was given to the BellKor’s Pragmatic Chaos team, which bested Netflix’s own algorithm for predicting ratings by 10.06% [73]. During this competition MF became widely known due to its effectiveness, and important steps were taken in later years towards some very successful algorithms, which share the same basis of latent factor and user/item representations. Unfortunately, there was no article—so far—that suggested an implementation of these modern techniques for CR. A potential barrier that may researchers face in their quest to solve this delay is the available data, Fig. 5 shows that more than 50% of the reviewed papers were published by Asian researchers, and more precisely Indian researchers, together with researchers all over the world using Indian datasets, where SP parameters, CPR
52
Y. Ommane et al.
Fig. 5 Articles distribution by origin countries of the dataset used
parameters and Geography parameters for a very long period and by different states/ districts, are available in governmental official websites. Notwithstanding, these data are most of the time inaccessible for foreign scientists. Furthermore, the proposed structure of data presents another challenge, where the vast majority of the well performed RS are fitted by a user/item rating matrix, and the user’s demographic data for hybrid systems, where ratings are either explicit or implicit (number of clicks, number of page visits, the number of times a song was played, etc.) So different complex preprocessing techniques are required. Even though, the sparsity remains a potential challenge to deal with. Finally, a lack of clarity was observed for performance metrics and evaluations of the models from anonymous experts which raises a very important question about the reliability of these results, hence the need for improved models’ assessment and good expertise to interpret its outcomes.
5 Discussion During the last decade, the use of technology to enhance agricultural processes has been very remarkable. For CR, we can see clearly from Table 3 that there has been some success in this direction. Exploration of the techniques used shows us that most of the time, the problem is formulated as a classification problem, where algorithms such as DTs and NBC, etc. give remarkable results. Fuzzy systems have been used in other cases, to model the uncertainty in input variables that were categorized and analyzed to facilitate the choice for future researchers. This study reveals also some challenges that are being faced when creating a CS method, like the unavailability of the data, more specifically benchmarking datasets to
Machine Learning Based Recommender Systems for Crop Selection …
53
compare the models, the input variables are quite different from a study to another, and the difficulty in measuring the performance of the proposed methods; some papers compare their model predictions to what the farmer has cultivated, while others compare to domain experts’ recommendations. Historical evidence shows that great scientific achievements were guided by industrial needs, unfortunately, precision agriculture and CR have not yet gained the attention that it deserves from different stakeholders, basically in emergent countries where agriculture is the most valuable resource. Major improvements in the agricultural domain will certainly appear by integrating the successful algorithms of RS that were developed for entertainment companies, whom humankind might benefit differently. In this paper we have studied 40 well selected articles from different reliable sources. Nevertheless, this number remains statistically insignificant and more similar works are needed to illuminate the path for new researchers that are willing to innovate and effectively contribute to the field.
6 Conclusion and Future Work In this SLR we have presented a detailed analysis of 40 articles published from 2010 to 2020 about the CR problem as well as the main achievements, and current challenges. Although agriculture is the most valuable resource in emerging nations, precision agriculture and CR have not yet attracted enough interest from various stakeholders. By implementing the CR’s effective algorithms, that were originally created for entertainment enterprises, humanity may benefit in multiple ways and significant advancements in the agricultural area will undoubtedly occur. From our analysis, we witnessed a noticeable increase in the number of publications in the topic of CR, from 2014 to 2020, which shows the growing interest in this hot topic for agriculture and agricultural industry. The most used ML approach for CR according to our study, is the model-based CF, afterwards comes the use of memory-based CF, then CBF and HF. Since both CF (Model-based and Memorybased) and CBF have pros and cons, there is a remarkable growing interest in HF which attempts to combine the advantages of both CF and CBF. Regarding the main input features exploited for CR, unequivocally, the WCs are the most used input features for CR, followed by SPs, Geography, CPR, CP then Market. Amongst WCs parameters, temperature and rainfall come in the top list. For SPs, pH-value, soil type, nutrients are the most used. Yet, it remains beneficial to collect other information (e.g., nutrient efficiency, yield gap) to build future effective and efficient CR systems. The evaluation metrics and evaluation approach used were unanimously: Accuracy, Precision and Recall. Which are the mainly used performance metrics for such tasks in the ML researchers’ community. Concerning the inventoried challenges, we mentioned the cold start issue, the sparsity, and the scarcity of real datasets. For this last issue, that concerns less some Asian countries such as India, it can only be overcome if more countries, all over the world, invest in technology for Agriculture and have some data sharing awareness.
54
Y. Ommane et al.
Finally, we highlighted the need for clearer and improved ML studies pipelines for CR, and clear evaluation using human expertise to interpret their outcomes. This is of paramount interest for the reliability of the studies’ results. This SLR was conducted with the aim of providing insights of the kind of solutions that were proposed in the recent years for the CS task. Such insights are valuable in suggesting new directions for research studies and in providing a good understanding of the recent research trends. As a perspective, our team and other ML practitioner for Agriculture are working to propose new recommendation methods taking into consideration all the historical changes that happened to the farm (e.g., crop rotations) and working on providing fertilizer recommendation based on the nutrient efficiency variation rather than using only the yield prediction. Models’ assessment methodologies, including human evaluation from experts, should also include innovative methodologies in the future works. Acknowledgements Ethics Approval and Consent to Participate Not applicable. Consent for Publication We confirm our consent for the publication of the present paper. Availability of Data and Materials Not applicable. Competing Interests No, we declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper. Funding This research was funded by the business unit AgriEdge of Mohammed VI Polytechnic University (UM6P), in the framework of the Digital Farming Project, a joint project between Mohammed VI Polytechnic University (UM6P), Benguerir, Kingdom of Morocco and Massachusetts Institute of Technology (MIT), Boston, USA. Author’s Contribution The authors equally contributed. Acknowledgements We acknowledge AgriEdge (a Moroccan company specialized in precision agriculture) for their financial support, as well as the Mohammed VI Polytechnic University (UM6P) for the material and administrative support.
References 1. Portugal I, Alencar P, Cowan D (2018) The use of machine learning algorithms in recommender systems: a systematic review. Expert Syst Appl 97:205–227. https://doi.org/10.1016/j.eswa. 2017.12.020 2. Van Klompenburg T, Kassahun A, Catal C (2020) Crop yield prediction using machine learning: a systematic literature review. Comput Electron Agric 177:105709. https://doi.org/10.1016/j. compag.2020.105709 3. Chlingaryan A, Sukkarieh S, Whelan B (2018) Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: a review. Comput Electron Agric 151:61–69. https://doi.org/10.1016/j.compag.2018.05.012 4. Burke R (2002) Hybrid recommender systems: survey and experiments. User Model User-Adap Inter 12:331–370. https://doi.org/10.1023/A:1021240730564
Machine Learning Based Recommender Systems for Crop Selection …
55
5. Poriya A, Bhagat T, Patel N, Sharma R (2014) Non-personalized recommender systems and user-based collaborative recommender systems. Int J Appl Inf Syst 6(9):22–27. https://doi.org/ 10.1.1.428.6731 6. Burke R (2000) Knowledge-based recommender systems. https://www.cs.odu.edu/~mukka/ cs795sum09dm/Lecturenotes/Day6/burke-elis00.pdf 7. Aïmeur E, Brassard G, Fernandez JM, Onana FSM (2006) Privacy-preserving demographic filtering. In: Proceedings of the 2006 ACM symposium on applied computing, Association for Computing Machinery, New York, NY, USA, pp 872–878. https://doi.org/10.1145/1141277. 1141479 8. Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the fourteenth conference on uncertainty in artificial intelligence, UAI’98, Morgan Kaufmann Publishers Inc., pp 43–52. https://doi.org/10.5555/ 2074094.2074100 9. Bobadilla J, Ortega F, Hernando A, Alcalá J (2011) Knowledge-Based Systems Improving collaborative filtering recommender system results and performance using genetic algorithms. Know-Based Syst 24(8):1310–1316. https://doi.org/10.1016/j.knosys.2011.06.005 10. Tsapatsoulis N, Georgiou O (2012) Investigating the scalability of algorithms, the role of similarity metric and the list of suggested items construction scheme in recommender systems. Int J Artif Intell Tools 21(4):1–29. https://doi.org/10.1142/S0218213012400180 11. Cheng H-T, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, Anil R, Haque Z, Hong L, Jain V, Liu X, Shah H (2016) Wide amp; deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems, DLRS 2016, Association for Computing Machinery, New York, NY, USA, pp 7–10. https://doi.org/10.1145/2988450.2988454 12. Xue H-J, Dai X, Zhang J, Huang S, Chen J (2017) Deep matrix factorization models for recommender systems. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI-17, International joint conference on artificial intelligence, pp 3203–3209. https://doi.org/10.24963/ijcai.2017/447 13. Wang X, He X, Wang M, Feng F, Chua T-S (2019) Neural graph collaborative filtering. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, SIGIR’19, Association for Computing Machinery, pp 165–174. https:// doi.org/10.1145/3331184.3331267 14. Kiran R, Kumar P, Bhasker B (2020) DNNRec: a novel deep learning-based hybrid recommender system. Expert Syst Appl 144. https://doi.org/10.1016/j.eswa.2019.113054 15. Oord AVD, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. In: Proceedings of the 26th international conference on neural information processing systems, vol 2, NIPS’13, Curran Associates Inc., Red Hook, NY, USA, pp 2643–2651. https://doi.org/ 10.5555/2999792.2999907 16. Kim D, Park C, Oh J, Lee S, Yu H (2016) Convolutional matrix factorization for document context-aware recommendation. In: Proceedings of the 10th ACM conference on recommender systems, RecSys ’16, Association for Computing Machinery, New York, NY, USA, pp 233–240. https://doi.org/10.1145/2959100.2959165 17. Srivastav G, Kant S (2019) Review on e-learning environment development and context aware recommendation systems using Deep Learning. In International conference on recent developments in control, automation and power engineering, RDCAPE. https://doi.org/10.1109/RDC APE47089.2019.8979066 18. Sedhain S, Menon AK, Sanner S, Xie L (2015) Autorec: autoencoders meet collaborative filtering. In: Proceedings of the 24th international conference on world wide web, WWW ’15 Companion, Association for Computing Machinery, New York, NY, USA, pp 111–112. https:/ /doi.org/10.1145/2740908.2742726 19. Liang D, Krishnan RG, Hoffman MD, Jebara T (2018) Variational autoencoders for collaborative filtering. In: Proceedings of the 2018 world wide web conference, WWW ’18, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, pp 689–698. https://doi.org/10.1145/3178876.3186150
56
Y. Ommane et al.
20. Li L, Wang D, Li T, Knox D, Padmanabhan B (2011), Scene: a scalable two-stage personalized news recommendation system. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, Association for Computing Machinery, New York, NY, USA, pp 125–134. https://doi.org/10.1145/2009916.2009937 21. Purushotham S, Liu Y, Kuo C-CJ (2012) Collaborative topic regression with social matrix factorization for recommendation systems. In: Proceedings of the 29th international conference on international conference on machine learning, Omnipress, Madison, WI, USA, pp 691–698. https://doi.org/10.5555/3042573.3042664 22. Li L, Chu W, Langford J, Schapire RE (2020) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on world wide web. https://doi.org/10.1145/1772690.1772758 23. Song L, Tekin C, van der Schaar M (2016) Online learning in large-scale contextual recommender systems. IEEE Trans Serv Comput 9(3):433–445. https://doi.org/10.1109/TSC.2014. 2365795 24. Zheng G, Zhang F, Zheng Z, Xiang Y, Yuan NJ, Xie X, Li Z (2018) Drn: a deep reinforcement learning framework for news recommendation. In: Proceedings of the world wide web conference, international world wide web conferences steering committee, Republic and Canton of Geneva, CHE, pp 167–176. https://doi.org/10.1145/3178876.3185994 25. De Campos LM, Fernández-Luna JM, Huete JF, Rueda-Morales MA (2010) Combining content-based and collaborative recommendations: a hybrid approach based on Bayesian networks. Int J Approximate Reasoning 51(7):785–799. https://doi.org/10.1016/j.ijar.2010. 04.001 26. Kant V, Bharadwaj KK (2012) Enhancing Recommendation quality of content-based filtering through collaborative predictions and fuzzy similarity measures. Procedia Eng 38:939–944. https://doi.org/10.1016/j.proeng.2012.06.118 27. Lacasta J, Lopez-Pellicer FJ, Espejo-García B, Nogueras-Iso J, Zarazaga-Soria FJ (2018) Agricultural recommendation system for crop protection. Comput Electron Agric 152(June):82–89. https://doi.org/10.1016/j.compag.2018.06.049 28. Salam MA, Mahmood MA, Awad YM, Hazman M, El Bendary N, Hassanien AE, Tolba MF, Saleh SM (2014) Climate recommender system for wheat cultivation in North Egyptian Sinai Peninsula. In: Advances in intelligent systems and computing, vol 303, Springer, pp 121–130. https://doi.org/10.1007/978-3-319-08156-413 29. Iorshase A, Charles OI (2015) A well-built hybrid recommender system for agricultural products in Benue State of Nigeria. J Softw Eng Appl 08(11):581–589. https://doi.org/10.4236/ jsea.2015.811055 30. Jaiswal S, Kharade T, Kotambe N, Shinde S (2020) Collaborative recommendation system for agriculture sector. ITM Web Conf 32:03034. https://doi.org/10.1051/itmconf/20203203034 31. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Moher D (2021) Updating guidance for reporting systematic reviews: development of the prisma 2020 statement. J Clin Epidemiol 134:103–112. https://doi.org/10.1016/j.jclinepi. 2021.02.003 32. Elomda BM, Hefny HA, Ashmawy F (2015) A multi-level linguistic fuzzy decision network. In: Advances in intelligent systems and computing. Springer, Cham. https://doi.org/10.1007/ 978-3-319-11310-4 33. Farnood Ahmadi F, Farsad Layegh N (2015) Integration of artificial neural network and geographical information system for intelligent assessment of land suitability for the cultivation of a selected crop. https://doi.org/10.1007/s00521-014-1801-z 34. Kumar R, Singh MP, Kumar P, Singh JP (2015) Crop selection method to maximize crop yield rate using machine learning technique. https://doi.org/10.1109/ICSTM.2015.7225403 35. Kawtrakul A, Amorntarant R, Chanlekha H (2015) Development of an expert system for personalized crop planning. In: 7th international ACM conference on management of computational and collective intelligence in digital EcoSystems, MEDES 2015, Association for Computing Machinery, pp 250–257. https://doi.org/10.1145/2857218.2857272
Machine Learning Based Recommender Systems for Crop Selection …
57
36. Deepa N, Ganesan K (2016) Multi-class classification using hybrid soft decision model for agriculture crop selection. Neural Comput Appl 30(4):1025–1038. https://doi.org/10.1007/s00 521-016-2749-y 37. Sridhar R (2016) Fuzzy logic based hybrid recommender of maximum yield crop using soil, weather and cost. ICTACT J Soft Comput 6(4):1261–1269. https://doi.org/10.21917/ijsc.2016. 0173 38. Pudumalar S, Ramanujam E, Rajashree RH, Kavya C, Kiruthika T, Nisha J (2017) Crop recommendation system for precision agriculture. In: International conference on advanced computing. https://doi.org/10.1109/ICoAC.2017.7951740 39. Mokarrama MJ, Arefin MS (2018) RSF: a recommendation system for farmers, pp 843–850. https://doi.org/10.1109/R10-HTC.2017.8289086 40. Raja SK, Rishi S, Demand based crop recommender system for farmers. In: Proceedings— 2017 IEEE technological innovations in ICT for agriculture and rural development, pp 194–199. https://doi.org/10.1109/TIAR.2017.8273714 41. Kapoor A, Verma AK (2017) Crop selection using fuzzy logic-based expert system. Appl Soft Comput Web. https://doi.org/10.1007/978-981-10-7098-38 42. Deepa N, Ganesan K (2017) Decision-making tool for crop selection for agriculture development. Neural Comput Appl 31(4):1215–1225. https://doi.org/10.1007/s00521-017-3154-x 43. Joshi R, Fadewar H, Bhalchandra P (2017) Fuzzy based intelligent system to predict most suitable crop. In: Proceedings of the international conference on communication and signal processing. Atlantis Press, pp 379–383. https://doi.org/10.2991/iccasp-16.2017.58 44. Filippi C, Mansini R, Stevanato E (2017) Mixed integer linear programming models for optimal crop selection. Comput Oper Res 81:26–39. https://doi.org/10.1016/j.cor.2016.12.004 45. Islam T, Chisty TA, Chakrabarty A (2019) A deep neural network approach for crop selection and yield prediction in Bangladesh. In: IEEE Region 10 humanitarian technology conference. https://doi.org/10.1109/R10-HTC.2018.8629828 46. Kulkarni NH, Srinivasan GN, Sagar BM, Cauvery NK (2018) Improving crop productivity through a crop recommendation system using ensembling technique. In: Proceedings 3rd international conference on computational systems and information technology for sustainable solutions. https://doi.org/10.1109/CSITSS.2018.8768790 47. Doshi Z, Nadkarni S, Agrawal R, Shah N (2018) Agro-consultant: intelligent crop recommendation system using machine learning algorithms. In: Fourth international conference on computing communication control and automation (ICCUBEA). IEEE, pp 1–6. https://doi.org/ 10.1109/ICCUBEA.2018.8697349 48. Arooj A, Riaz M, Akram MN (2018) Evaluation of predictive data mining algorithms in soil data classification for optimized crop recommendation. Int Conf Adv Comput Sci. https://doi. org/10.1109/ICACS.2018.8333275 49. Kuanr M, Kesari Rath B, Nandan Mohanty S (2018) Crop recommender system for the farmers using Mamdani fuzzy inference model. Int J Eng Technol 7(4.15). https://doi.org/10.14419/ ijet.v7i4.15.23006 50. Anley MB, Tesema TB (2019) A collaborative approach to build a KBS for crop selection: combining experts knowledge and machine learning knowledge discovery. In: Communications in computer and information science, vol 1026. Springer, pp 80–92. https://doi.org/10.1007/ 978-3-030-26630-18 51. Poongodi S, Rajesh Babu M (2019) Analysis of crop suitability using clustering technique in Coimbatore region of Tamil Nadu. Concurrency Comput 31(14):1–13. https://doi.org/10.1002/ cpe.5294 52. Deepa N, Ganesan K (2019) Hybrid rough fuzzy soft classifier based multi-class classification model for agriculture crop selection. Soft Comput 23(21):10793–10809. https://doi.org/10. 1007/s00500-018-3633-8 53. Tseng F-H, Cho H-H, Wu H-T (2019) Applying big data for intelligent agriculture-based crop selection analysis. IEEE Access 7:116965–116974. https://doi.org/10.1109/access.2019.293 5564
58
Y. Ommane et al.
54. Fegade TK, Pawar BV (2020) Network and support vector machine. https://doi.org/10.1007/ 978-981-13-9364-823 55. Meeradevi, Salpekar H, Design and implementation of mobile application for crop yield prediction using machine learning. In: 2019 global conference for advancement in technology (GCAT). IEEE, pp 1–6. https://doi.org/10.1109/GCAT47503.2019.8978315 56. Rizaldi T, Putranto HA, Riskiawan HY, Setyohadi DPS, Riaviandy J, Decision support system for land selection to increase crops productivity in Jember regency use learning vector quantization (LVQ). In: Proceedings—2019 international conference on computer science, information technology, and electrical engineering, vol 1, pp 82–85. https://doi.org/10.1109/ICOMITEE. 2019.8921033 57. Martinez-Ojeda CO, Amado TM, Dela Cruz JC (2019) In field proximal soil sensing for real time crop recommendation using fuzzy logic model. In: International symposium on multimedia and communication technology (IS-MAC). IEEE. https://doi.org/10.1109/ISMAC.2019. 8836160 58. Kamatchi SB, Parvathi R (2019) Improvement of crop production using recommender system by weather forecasts. Procedia Comput Sci 165:724–732. https://doi.org/10.1016/j.procs.2020. 01.023 59. Rahman SAZ, Mitra KC, Islam SM (2019) Soil classification using machine learning methods and crop suggestion based on soil series. In: 21st international conference of computer and information technology. https://doi.org/10.1109/ICCITECHN.2018.8631943 60. Kumar A, Sarkar S, Pradhan C, Recommendation system for crop identification and pest control technique in agriculture. In: Proceedings of the 2019 IEEE international conference on communication and signal processing, pp 185–189. https://doi.org/10.1109/ICCSP.2019.869 8099 61. Chougule VKA, Mukhopadhyay D (2019) Crop suitability and fertilizers recommendation using data mining techniques. In: Advances in intelligent systems and computing, vol 714. Springer, pp 205–213. https://doi.org/10.1007/978-981-13-0224-419 62. Viviliya B, Vaidhehi V (2019) The design of hybrid crop recommendation system using machine learning algorithms. Int J Innov Technol Exploring Eng 9(2):4305–4311. https://doi.org/10. 35940/ijitee.b7219.129219 63. Aarthi R, Sivakumar D (2020) Modeling the hierarchical fuzzy system for suitable crop recommendation. In: Lecture notes in electrical engineering, vol 686. Springer Science and Business Media Deutschland GmbH, pp 199–209. https://doi.org/10.1007/978-981-15-7031-519 64. Cadenas RM-EM, Carmen M (2020) Development of an application to make knowledge available to the farmer: detection of the most suitable crops for a more sustainable agriculture. J Ambient Intell Smart Environ 12(5):419–432. https://doi.org/10.3233/AIS-200575 65. Rajeswari AM, Anushiya AS, Fathima KSA, Priya SS, Mathumithaa N (2020) Fuzzy decision support system for recommendation of crop cultivation based on soil type. In: Proceedings of the 4th international conference on trends in electronics and informatics. https://doi.org/10. 1109/ICOEI48184.2020.9142899 66. Patel K, Patel HB (2020) A state-of-the-art survey on recommendation system and prospective extensions. Comput Electron Agric 178. https://doi.org/10.1016/j.compag.2020.105779 67. Jain S, Ramesh D (2020) Machine Learning convergence for weather-based crop selection. In: IEEE international students’ conference on electrical, electronics and computer science. https:/ /doi.org/10.1109/SCEECS48394.2020.75 68. Liu A, Lu T, Wang B, Chen C (2020) Crop recommendation via clustering center optimized algorithm for imbalanced soil data. In 5th international conference on control, robotics and cybernetics (CRC). IEEE, pp 31–35. https://doi.org/10.1109/CRC51253.2020.9253457 69. Sari F, Koyuncu F (2021) Multi criteria decision analysis to determine the suitability of agricultural crops for land consolidation areas. Int J Eng Geosci 6(2):64–73. https://doi.org/10. 26833/ijeg.683754 70. Karwande A, Wyawahare M, Kolhe T, Kamble S, Magar R, Maheshwari L (2021) Prediction of the most productive crop in a geographical area using machine learning. Lect Notes Netw Syst 141:433–441. https://doi.org/10.1007/978-981-15-7106-043
Machine Learning Based Recommender Systems for Crop Selection …
59
71. Banerjee G, Sarkar U, Ghosh I (2021) A fuzzy logic-based crop recommendation system. Springer, Singapore. https://doi.org/10.1007/978-981-15-7834-26 72. Bennett J, Lanning S (2007) The netflix prize. http://www.cs.uic.edu/~liub/KDD-cup-2007/ NetflixPrize-description.pdf 73. Koren Y (2009) The bellkor solution to the netflix grand prize. https://netflixprize.com/assets/ GrandPrize2009_BPC_BellKor.pdf
Convolutional Neural Network for Identification and Classification of Weeds in Buckwheat Crops V. Riksen
and V. Shpak
Abstract Traditional weed management approaches can become more effective when integrated with artificial intelligence (AI) models. The identification and classification of weeds using AI techniques can play an important role in weed control, helping to increase crop yields. The rapid development of deep learning methods based on convolutional neural networks helps in solving this problem. In particular, the trained algorithm is able to automatically extract information from images, detect and classify weeds. The article discusses the construction of an image classifier using the ResNet-18 architecture. Three types of weeds are present in buckwheat crops with different intensity: wild oat (Avena fatua), field bindweed (Convolvulus arvensis) and barnyard grass (Echinochloa crus-galli). The task of the classifier is to recognize these weeds in the photograph and determine one of the two gradations of weediness of the site—the number of weeds exceeds the economic injury level (EIL) or does not exceed. The effectiveness of the proposed algorithm is confirmed by the rather high quality of the classifier predictions (the number of correct classifications for the initial set of 24 images is 87%) and the construction of the Confusion matrix. Keywords Weeds · Deep learning · Classifier · ResNet-18 · Buckwheat
V. Riksen (B) · V. Shpak Siberian Federal Research Centre of Agrobiotechnologies, Russian Academy of Sciences, Novosibirsk Reg, Krasnoobsk 630501, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Adadi and S. Motahhir (eds.), Machine Intelligence for Smart Applications, Studies in Computational Intelligence 1105, https://doi.org/10.1007/978-3-031-37454-8_3
61
62
V. Riksen and V. Shpak
1 Introduction Projected global population growth by 2050 could lead to a significant increase in food demand. As a consequence, an urgent task of the research community in the agricultural sector is the use of intelligent systems to increase crop yields [1–4]. Buckwheat is a valuable food crop. Buckwheat yields to all grain crops in terms of size and stability of crops. The average crop yield of buckwheat in the Russian Federation is 0.44 t/ha. However, buckwheat can provide a crop yield of 2.5–3.0 t/ ha or more [5, 6]. One of the main factors limiting crop yields are weeds. They create intense competition for light, moisture, food and do not allow cultivated plants to realize their potential [7]. Plant protection is an essential element in limiting the impact of negative factors on the formation of crop productivity. A complex of agrotechnical and chemical means is used against weeds. The first one includes a system of basic, pre- and post-sowing cultivation, compliance with crop rotations, ensuring optimal timing, sowing depth, seeding rates and plant density. The second one includes the use of herbicides if agrotechnical means turned out to be insufficient and the weediness of crops exceeds the EIL. The use of herbicides allows one to eliminate weeds, increase crop yields, and simplify the care of cultivated crops. For the effective development of weed control measures, it is necessary to have detailed information on the composition of the weed component in the fields [8]. This work requires a large number of human resources, so it is necessary to develop various automated approaches to monitoring weeds. In addition, process automation is associated with a reduction in the number of herbicides used, which will reduce the environmental load of the territory [9]. The integration of AI methods in the field of agriculture plays an important role in improving production efficiency [10–15]. Deep learning methods applied to digital photographs can help identify weeds through machine image processing. There are many architectures for image classification, such as AlexNet [16], GoogLeNet [17], ResNet [18], etc. The aim of the study is to build a classifier based on deep learning Convolutional Neural Network (CNN) capable of correctly identifying weeds and classifying the degree of intensity of their presence in areas occupied by buckwheat (Fagopyrum).
2 Materials and Methods Datasets in the form of photographs (with a resolution of 1440 × 1920) obtained using a mobile camera and phytosanitary diagnostics from 24 points were used as initial information for training the CNN.
Convolutional Neural Network for Identification and Classification …
63
Monitoring of weeds was carried out in the budding phase of buckwheat in 4 fields using a quantitative method for weediness accounting of the field (0.25 m2 frame overlay). The number of weeds counting sites, according to the methodology, depended on the size of the field. According to the on-farm designation of working areas, the fields O2/4 (photographs Fag. 1–4), O9/2 (Fag. 5–8), O9/3 (Fag. 9–13), K8 (Fag. 14–24). On the study area, crops are cultivated according to the traditional (classical-K8) and organic (O2/4, O9/2, O9/3) farming systems. Weeds such as wild oat (Avena fatua), field bindweed (Convolvulus arvensis) and barnyard grass (Echinochloa crus-galli) were present at different intensities in the sites. The task of the classifier is to recognize these weeds in the photograph and determine one of the two gradations of the site weediness by them— low or high. In the fields with buckwheat, 6 classes of weediness were identified: Avenafatua0, Avenafatua1, Convolvulusarvensis0, Convolvulusarvensis1, Echinochloacrus-galli0, Echinochloacrus-galli1. Here, an identifier with “0” is assigned to a class with weediness not exceeding the EIL, with “1”—exceeding the EIL. One photo can contain both one weed with the appropriate gradation, and several. Thus, the task of constructing the required classifiers refers to the problem of multicomponent classification of images by several labels. To identify weeds and their intensity, we used the ResNet-18 architecture, which is a good compromise between the computational cost of the computers at our disposal and the time to obtain the final results. The construction of the neural networks presented in this paper is based on the use of a pretrained convolutional neural network (one of the components of the PyTorch framework). The architecture of ResNet-18 is shown in Fig. 1. There are 4 convolutional blocks labeled in pink, purple, yellow and orange in the Fig. 1. Each block contains 4 convolutional layers. Together with the first convolutional layer and the last fully connected layer, there are 18 layers in total. Therefore, this model is commonly known as ResNet-18. The last layer is denoted as avg pool, the output of this layer is smoothed and fed to the layer with the softmax activation function, denoted as fc, which acts as the final output classifier. In the diagram, fc is specified with the maximum possible number of classes—1000. In blocks, convolution on each layer is produced by 3 × 3 filters, denoted as conv, n, where n is the number of filters used, and if the output feature maps have the same size, then the layers have the same number of filters. If the output feature map is halved the number of filters is doubled in layers (denoted- /2), the convolutional layers are downsampled with a step of 2. In the process of training and testing, the weight and bias parameters are chosen in such a way as to minimize the learning error, which is a measure of the difference between the actual and the class features obtained at the output of the network. ResNet-18 uses binary cross entropy as a loss function as such a measure, and optimization of network parameters, weights, and biases is carried out using the stochastic gradient descent (SGD) method.
64 Fig. 1 ResNet-18 architecture
V. Riksen and V. Shpak
Convolutional Neural Network for Identification and Classification …
65
Summing up, we get the following scheme of the classifier based on the ResNet18 model: the original photo is processed through alternating convolutional layers, activation functions and pool layers. At the very end, the rules of the final classifier are triggered, which shows us the final probabilistic evaluations for each class.
3 Results Since we have a relatively small number of images at our disposal, which is clearly not enough to build a neural network with good generalizing characteristics, we used the technique of increasing the amount of initial data using the aug_transforms procedure built into PyTorch. The transformations applied in it in the form of rotation, scaling, translation, etc., form different images (from the same source samples), create variations of the input images that look different, but display the same facts. Some of these transformations are random, such as images cropping, changing brightness or contrast. The addition of these images to the original dataset made it possible to increase their total volume used in the neural network model by 6 times. The process of generating synthetic training images is shown in Fig. 2. The expected size of the input data for the ResNet-18 neural network, represented as a matrix of pixels, is 224 × 224. In order to accept input data from images of arbitrary size, PyTorch provides the appropriate transformation functions. Images with an original pixel resolution of 1440 × 1920 were compressed to a resolution of 512 × 512 using the “Resize” procedure. Next, the images were scaled to the required input format. The neural network we use was trained using a test set of images, which was used in the algorithm to obtain a more objective and reliable model. Before starting training, the model parameters were set, such as the number of epochs, batch size, learning rate, etc. After determining the initial parameters, the model was trained again with the Adam optimizer, which is used to achieve better training results. During training, different epochs were used: 10, 20, 30. One epoch means iteration over all used images, which are sequentially entered for training in batches of 8 images. The overall accuracy of the selected models ranged from 77.5 to 90.8%. The evaluations obtained are shown in Table 1. Table 2 presents the main parameters of the neural network we built based on the ResNet-18 architecture. The size of the feature vector for the penultimate conv5x layer is 25088 (7 × 7 × 512) and 512 in the case of the avg pool layer. The sizes of the feature vectors affect the total amount of calculations and the time it takes to obtain the final results. Since the original color images were taken using a standard digital camera, their digital representation on a computer has 3 channels: red, green and blue, while a normal gray image has only one channel. The output of ResNet-18 is formed as a vector of dimension 6, where each component of the vector gives an evaluation of the probability of occurrence of the corresponding class. Since we are building a classifier with multiple labels, each image may have
66
V. Riksen and V. Shpak
Fig. 2 The process of generating synthetic images Table 1 The results of neural network training by epochs
Epoch
Train_loss
Valid_loss
Accuracy_multi
1 ……
0.860313
0.662197
0.483333
10 …….
0.394793
0.395668
0.775000
20 ………
0.264712
0.285602
0.858333
30 ………
0.249835
0.266138
0.908333
Convolutional Neural Network for Identification and Classification … Table 2 Main structural parameters of the neural network ResNet-18 for Fagopyrum
Name of layers in blocks
Output dimensions
Layer filters
Conv1x
112 × 112 × 64
7 × 7, 64, step 2
Conv2x
56 × 56 × 64
3 × 3, 64 × 2 3 × 3, 64 × 2
Conv3x
28 × 28 × 128
3 × 3, 128 × 2 3 × 3, 128 × 2
Conv4x
14 × 14 × 256
3 × 3, 256 × 2 3 × 3, 256 × 2
Conv5x
7 × 7 × 512
3 × 3, 512 × 2 3 × 3, 512 × 2
Avg pool
1 × 1 × 512
7 × 7, fully connected feature vector layer
Fc
6
Classifier with 512 × 6 connections
67
different classes or none at all. The corresponding class is predicted in the image if its probability is higher than 50%. The following table presents data describing the predictive abilities of the constructed neural network, tested on the original sample. The classes highlighted in red are erroneous predictions (Table 3). Figure 3 illustrates the algorithm of the classifier. The upper inscription describes the sample with true weediness labels, the inscription below—predictive labels for this sample according to the classifier. Table 4 shows the final evaluations of the predictive abilities of the classifier obtained on the original dataset from Table 3. The obtained evaluations indicate a sufficiently high quality of the classifier and the suitability of its application for solving practical problems. Another indicator of the quality of a neural network model is the Confusion matrix. Since we are dealing with a multicomponent classification, the construction of the matrix is carried out on the basis of the “one against all” principle. With its help, you can determine which class of prediction has problems. Figure 4 shows the results of constructing a Confusion matrix for each label of weediness classes: 0-Avenafatua0, 1-Avenafatua1, 2-Convolvulusarvensis0, 3Convolvulusarvensis1, 4-Echinochloacrus-galli0, 5-Echinochloacrus-galli1. The rows of each matrix (dimensions 2 × 2) represent the number of images with true labels: Y—corresponds to the number of images with the selected label of the weediness class (samples with the true label of this class), N—the number of images of all other weediness classes (all other samples with their true labels). The columns represent the images that the model has determined to be the class with the selected label (column Y) and all other classes (column N).
68
V. Riksen and V. Shpak
Table 3 Comparison of correspondences between actual site weediness observations with Fagopyrum and classifier predictions Images
Actual picture by weeds
Matches
Predicting the presence of weeds by classifier based on ResNet-18
Fag 1.jpg
Avenafatua1
1
[‘Avenafatua1’]
Fag 2.jpg
Avenafatua1
1
[‘Avenafatua1’]
Fag 3.jpg
Echinochloacrus-galli1
1
[‘Echinochloacrus-galli1’]
Fag 4.jpg
Convolvulusarvensis1 Echinochloacrus-galli1
1
[‘Convolvulusarvensis1’]
Fag 5.jpg
Convolvulusarvensis1 Avenafatua1
0.5
[‘Avenafatua0’, ‘Avenafatua1’, ‘Convolvulusarvensis0’, ‘Convolvulusarvensis1’]
Fag 6.jpg
Avenafatua1
1
[‘Avenafatua1’]
Fag 7.jpg
Convolvulusarvensis0 Avenafatua1
1
[‘Avenafatua1’, ‘Convolvulusarvensis0’]
Fag 8.jpg
Avenafatua1
1
[‘Avenafatua1’]
Fag 9.jpg
Convolvulusarvensis1 Avenafatua1
1
[‘Avenafatua1’, ‘Convolvulusarvensis1’]
Fag 10.jpg
Convolvulusarvensis0 Avenafatua1
0.5
[‘Avenafatua1’, ‘Convolvulusarvensis1’]
Fag 11.jpg
Convolvulusarvensis0 Avenafatua1
1
[‘Avenafatua1’]
Fag 12.jpg
Avenafatua1 Echinochloacrus-galli0
1
[‘Avenafatua1’, ‘Echinochloacrus-galli0’]
Fag 13.jpg
Convolvulusarvensis1 Avenafatua1
0.5
[‘Avenafatua1’, ‘Echinochloacrus-galli0’]
Fag 14.jpg
Convolvulusarvensis1
0.5
[‘Avenafatua0’, ‘Convolvulusarvensis1’]
Fag 15.jpg
Convolvulusarvensis1
0.5
[‘Avenafatua0’, ‘Convolvulusarvensis1’]
Fag 16.jpg
Convolvulusarvensis1
1
[‘Convolvulusarvensis1’]
Fag 17.jpg
Avenafatua1
1
[‘Avenafatua1’],
Fag 18.jpg
Convolvulusarvensis0 Avenafatua1
1
[‘Avenafatua1’, ‘Convolvulusarvensis0’]
Fag 19.jpg
Convolvulusarvensis1
1
[‘Convolvulusarvensis1’
Fag 20.jpg
Convolvulusarvensis1
1
[‘Convolvulusarvensis1’]
Fag 21.jpg
Convolvulusarvensis1
1
[‘Convolvulusarvensis1’]
Fag 22.jpg
Convolvulusarvensis1
0.33
[‘Avenafatua0’, ‘Avenafatua1’, ‘Convolvulusarvensis1’]
Fag 23.jpg
Convolvulusarvensis1 Avenafatua0
1
[‘Avenafatua0’, ‘Convolvulusarvensis1’]
Fag 24.jpg
Convolvulusarvensis1
1
[‘Convolvulusarvensis1’]
Convolutional Neural Network for Identification and Classification …
69
Fig. 3 The result of classifying 2 images using the ResNet-18 neural network
Table 4 Final quality ratings of the classifier based on ResNet-18 The total number of correct classifications for the original set of 24 images
20.83
Correct classifications in %
87
Absolutely accurate classifications in %
79
The diagonal of the matrix from left to right shows the images that were classified correctly (correct predictions), the wrong classification is reflected outside the diagonal (wrong predictions). Classes with labels 1, 4, 5 were most accurately predicted, the sum of incorrectly predicted and incorrectly unpredicted classifications is 1. For classes 2 and 3, the sum of these classifications was 3 and 2, respectively. The worst forecast result for class 0 is Avenafatua0. Perhaps this is due to the small number of this weed in the test sample.
70
V. Riksen and V. Shpak
Fig. 4 Confusion matrix for weediness classes, obtained from the initial sample for the site with Fagopyrum
4 Conclusion According to the constructed classifier for the working plot O2/4, weeds are predicted—wild oat, barnyard grass, field bindweed (in excess of the EPV), for plots O9/2, O9/3, K8—wild oat and field bindweed. Since buckwheat in the fields O2/4, O9/2, O9/3 is cultivated according to the organic farming system, in such field crop rotations the optimization of the phytosanitary state of crops is based primarily on a complex of organizational, economic and agrotechnical measures, the use of mechanical and physical methods of protection plants. First of all, timely and highquality soil preparation, the use of resistant varieties and hybrids, the development of special crop rotations, the spatial isolation of crops, the use of microbiological preparations are needed when the economic injury level is exceeded by pests, diseases and weeds. As for K8 site, which is cultivated according to the traditional farming system,
Convolutional Neural Network for Identification and Classification …
71
the most effective method of combating suckering weeds, rhizomatous weeds and annual monocotyledonous grass weeds in the budding phase of buckwheat is herbicidal treatment (based on such active ingredients as Fenoxaprop-P-ethyl, Cletodym + haloxyfop-P-methyl, etc.). In order to prevent the spread of weeds and eliminate the foci of weeds, it is necessary to comply with the regulations of all technological measures in the process of growing crops. As a result of the study, a classifier was built based on the ResNet-18 deep learning model, which is able to recognize weeds with the corresponding weediness gradations in photographs from buckwheat fields with high accuracy. The undoubted advantage of our approach to building a convolutional neural network is the high accuracy of the input data obtained by specialists in the phytosanitary diagnostics of crops. The proposed method is very flexible, as the model can continue to train on a new dataset, and as a result, become more versatile and accurate. The method presented in this article will be an important step towards precision agriculture for solving such a complex problem as the identification of weeds in crops.
References 1. Holzworth DP, Huth NI, de Voil PG et al (2014) APSIM—evolution towards a new generation of agricultural systems simulation. Environ Model Softw 62:327–350. https://doi.org/10.1016/ j.envsoft.2014.07.009 2. Jones JW, Antle JM, Basso B, Boote KJ, Conant RT, Foster I et al (2017) Brief history of agricultural systems modeling. Agric Syst 155:240–254. https://doi.org/10.1016/j.agsy.2016. 05.014 3. Walter A, Finger R, Huber R et al (2017) Opinion: smart farming is key to developing sustainable agriculture. Proc Natl Acad Sci 114(24):6148–6150. https://doi.org/10.1073/pnas.1707462114 4. Caldera U, Breyer C (2019) Assessing the potential for renewable energy powered desalination for the global irrigation sector. Sci Total Environ 694:133598. https://doi.org/10.1016/j.scitot env.2019.133598 5. Farooq S, Rehman RU, Pirzadah TB, Malik B, Dar FA et al (2016) Cultivation, agronomic practices, and growth performance of buckwheat. Molecular breeding and nutritional aspects of buckwheat. Academic Press, pp 299–319 (2016). https://doi.org/10.1016/B978-0-12-803 692-1.00023-7 6. Dubenok NN, Zayats OA, Strizhakova EA (2017) Formation of the production potential of buckwheat (Fagopyrum escukntum L.) depending on the level of mineral nutrition and sowing method. Proc Timiryazev Agricult Acad 6:29–41. https://doi.org/10.26897/0021-342X-20176-29-41 7. Ahmad A, Saraswat D, Aggarwal V, Etienne A, Hancock B (2021) Performance of deep learning models for classifying and detecting common weeds in corn and soybean production systems. Comput Electron Agric 184:106081. https://doi.org/10.1016/j.compag.2021.106081 8. Hasan AM, Sohel F, Diepeveen D, Laga H, Jones MG (2021) A survey of deep learning techniques for weed detection from images. Comput Electron Agricult 184:106067. https:// doi.org/10.1016/j.compag.2021.106067 9. Lottes P, Behley J, Chebrolu N, Milioto A, Stachniss C (2020) Robust joint stem detection and crop-weed classification using image sequences for plant-specific treatment in precision farming. J Field Robot 37:20–34. https://doi.org/10.1002/rob.21901 10. Bah MD, Hafiane A, Canals R (2018) Deep Learning with unsupervised data labeling for weeds detection on UAV images. Remote Sens 10(11):15–17. https://doi.org/10.3390/rs10111690
72
V. Riksen and V. Shpak
11. Priya SJ, Sundar GN, Narmadha D, Ebenezer S (2019) Identification of weeds using HSV color spaces and labelling with machine learning algorithms. IJRTE. 8(1):1781–1786 12. Potena C, Nardi D, Pretto A (2019) Fast and accurate crop and weed identification with summarized train sets for precision agriculture. intelligent autonomous systems. In: Proceedings of the 14th international conference IAS-14, vol 531. Springer, Cham, pp 105–121. https://doi. org/10.1007/978-3-319-48036-7_9 13. Ahmed F, Bari AH, Hossain E, Al-Mamun HA, Kwan P (2011) Performance analysis of support vector machine and Bayesian classifier for crop and weed classification from digital images. World Appl Sci J 12(4):432–440 14. Binguitcha-Fare AA, Sharma P (2019) Crops and weeds classification using Convolutional Neural Networks via optimization of transfer learning parameters. Int J Eng Adv Technol (IJEAT). 8(5):2285–2294 15. Zhang W, Hansen MF, Volonakis TN, Smith M, Smith L, Wilson J et al (2018) Broad-leaf weed detection in pasture. In: IEEE 3rd international conference on image, vision and computing, pp 15–23. https://doi.org/10.1109/ICIVC.2018.8492831 16. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386 17. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition, CVPR, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594 18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR, pp 770–778. https://doi. org/10.1109/CVPR.2016.90
Cluster Analysis as a Tool for the Territorial Categorization of Energy Consumption in Buildings Based on Weather Patterns O. May Tzuc, M. Jiménez Torres, Carolina M. Rodriguez, F. N. Demesa López, and F. Noh Pat
Abstract This book chapter explores the application of k-means, an unsupervised learning technique designed to allow the categorization of patterns and statistical and geographic indicators of energy consumption in various climatic regions of Mexico. It investigates the relationship between energy consumption and climatic and operational patterns in a case study of State Social Housing. The k-means results demonstrate how the distribution of the groups obeys temperature and relative humidity patterns, which can be visualized using Geographic Information Systems software. This methodology has broad implications for future studies on thermal comfort, energy poverty, and pollutant emissions and lays the foundation for replicable research on energy efficiency in housing and other related fields. Keywords Unsupervised machine learning · Energy policy · Net-zero building · Territorial categorization · State social housing O. May Tzuc (B) · F. Noh Pat Facultad de Ingeniería, Universidad Autónoma de Campeche, Humberto Lanz Cárdenas y Unidad Habitacional Ecológica Ambiental, Col. Ex Hacienda Kalá, C.P. 24085 San Franciso de Campeche, Campeche, México e-mail: [email protected] M. Jiménez Torres Facultad de Ingeniería, Universidad Autónoma de Yucatán, Av. Industrias No Contaminantes por Anillo Periférico Norte, Apdo. Postal 150, Cordemex, Mérida, Yucatán, México C. M. Rodriguez Programa de Arquitectura, Universidad Piloto de Colombia, Carrera 9ª # 45ª-44, Sede F, C.P. 110231 Bogotá, Colombia F. N. Demesa López División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Pachuca, Col. Venta Prieta, Carretera México-Pachuca Km. 87.5, C.P. 42083 Pachuca de Soto, Hidalgo, México M. Jiménez Torres Departamento de Ingeniería y Proyectos, Universidad Internacional Iberoamericana, Col. IMI III, C.P. 24560 San Francisco de Campeche, Campeche, México © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Adadi and S. Motahhir (eds.), Machine Intelligence for Smart Applications, Studies in Computational Intelligence 1105, https://doi.org/10.1007/978-3-031-37454-8_4
73
74
O. May Tzuc et al.
1 Introduction Reducing pollutants and environmental degradation is a pressing issue for society given the impacts of climate change and global warming [1]. Governments recognize the importance of addressing this challenge and have established objectives to limit the increase in ambient temperature below 1.5 °C through international policies and agreements, such as the Kyoto Protocol and the Paris Agreement [2]. Despite these efforts, the world’s annual energy demand continues to rise with a significant proportion met by fossil fuels, hindering progress towards achieving these goals [3]. The buildings sector represents approximately 40% of the final energy consumption in developed countries and is responsible for more than 30% of pollutant emissions [4]. Therefore, one of the primary global objectives is to develop strategies and policies in the construction sector that promote rational and reduced energy use in building operations while maintaining occupant comfort. The Net-Zero Energy Building (NZEB) concept has emerged as one of the most impactful ideas for the decarbonization of the building sector. First introduced in 2010, NZEBs are defined as buildings with high energy performance that require minimal energy, which can be supplied by renewable sources located on-site [5, 6]. International efforts have been made to promote these constructions through regulations. However, implementing NZEBs presents significant challenges as construction systems and their operation vary significantly depending on the region and purpose of the building. Moreover, the climate plays a significant role in determining energy consumption patterns. The American continent differs from Europe, Africa and part of Asia in terms of its vertical distribution and territorial extensions that encompass various latitudes and climatic zones, ranging from coastlines to mountainous regions. As a result, it is essential to develop methodologies that continuously recategorize the energy consumption of homes in each region, based on the evolution of weather patterns and occupant behavior. This is a complex task because of the sparsely distributed variables involved in the process. Climate zoning is a critical component of the energy efficiency policy of buildings in many countries. However, most zoning methods are climate-focused without considering the actual (sensitive) response to the energy performance of buildings regarding cooling, heating, or comfort. To address this issue, various studies have suggested employing two widely used climate-focused approaches, namely degree-days and Köppen-Geiger [7]. These methods have an overlap of 14–37% regarding performance, as measured by a simulated energy performance indicator called “uniqueness”. Misclassifying areas using climate-focused zoning methods can have a detrimental impact on the construction industry, as unsuitable building requirements may be imposed in a particular location [8]. One potential solution to address the challenge of categorizing locations in countries based on energy, operational and environmental patterns is to leverage machine learning. Specifically, unsupervised machine-learning techniques such as clustering can be employed to label locations, group those with similar behavior, and differentiate them from other regions within the same country. Table 1 showcases several
Cluster Analysis as a Tool for the Territorial Categorization of Energy …
75
studies where this approach has been used successfully to classify territories based on energy, social and demographic factors that lack clear boundaries or associations that are easily discernible. Table 1 Analysis of climatic classifications based on the energy performance of buildings Study
Location
Research feature
Pérez-Fargallo et al. Poverty adaptive [9] degree hourly Index (PADHI)
Chile
Reclassification of residential buildings based on the risk of energy poverty. Evaluates both the number of people at risk of poverty and the necessary degrees hourly of heating and cooling based on adaptive comfort levels
Bienvenido-Huertas Climate et al. [10] classification considering climate change
Spain
Analyses the climate classification included in the CTE norm and reclassifies the most common residential buildings in Andalusian towns considering climate change and energy patterns
Bienvenido-Huertas Effectiveness of et al. [11] the passive design strategies considering climate change
France, Portugal, Spain, Argentina, Brazil, and Chile
Climate change scenarios will affect thermal comfort and significantly reduce comfort hours in warm climates. In addition, passive design strategies could be less effective in the future
Xiong et al. [12]
Analysis components
Passive building, China using climate data with Hierarchical Agglomerative Clustering (HAC) following Ward’s method
Climate scenarios applied to Hot Summer and Cold Winter (HSCW) zone as a showcase, where there are no fine climate zones for energy-efficient building design with diverse climate characteristics
Bhatnagar et al. [13] Determination of the base temperature from cooling degree days (CDD) and heating degree days (HDD)
India
Walsh et al. [8]
United States of Results were clustered using the America k-means method considering 3 different zoning levels. An existing climatic zoning performance indicator, the Mean Percentage of Misclassified Areas (MPMA), was calculated for each alternative
Climate classification based on the intensive use of archetypes, building performance simulation, and GIS
Classification of climatic zones using the ASHRAE 169–2013 standard methodology based on meteorological data and the Energy Signature and Performance line method
76
O. May Tzuc et al.
This chapter highlights the effectiveness of clustering analysis for territorial categorization based on energy consumption patterns of residential buildings and weather, specifically in countries with notable climate diversity such as those in Latin America. It focuses on a case study in state social housing (SSH) in Mexico to demonstrate how this approach can be applied in practice. This offers an exciting opportunity to rethink energy strategies and policies that prioritize the well-being of populations and combat problems such as energy poverty and the efficient use of energy by considering critical territorial and climatic factors.
2 Description of the Benchmark Case Mexico (23° 38' 4.2” N, 102° 33.167’ W) is a country of vast territorial extension, spanning 1.9 million km2 , with diverse soil morphology, and geographical location, divided by the tropic of cancer. As a result, Mexico experiences significant climatic variability, with more than ten types of climates according to the KöppenGeiger classification, ranging from equatorial desert to subtropical (Table 2) [14]. This climate variability is dispersed among the 32 states and over 2,400 municipalities that comprise the country’s political division (Fig. 1). At the national level, Mexico’s temperature variation ranges from 2 to 28 °C, with maximum peaks reaching over 35 °C. The country’s average annual precipitation varies between 5 and 150 mm (Fig. 1), with the highest rainfall occurring in July and September and March being the driest month with little more than 10 mm [15]. The high-temperature variability and its maximum peaks in the summer months have led to an increase in energy demand in the building sector, which is the thirdlargest energy consumer in the country, accounting for 20% of the national demand [16]. Moreover, the climatic behavior is combined with the constructive typology of Table 2 Köppen-Geiger climates found in Mexico
Köppen-Geiger climates
Climate description
Af
Equatorial climate
Am
Monsoon climate
Aw
Tropical Savanna
BWh
Warm desert
BSh
Warm semi-arid
BWk
Cold desert
BSk
Cold semi-arid
Csa
Warm Mediterranean
Cfa
Warm oceanic/humid subtropical
Cfc
Cool oceanic
Cwa
Humid subtropical
Cwb
Subtropical oceanic highland
Cluster Analysis as a Tool for the Territorial Categorization of Energy …
77
Fig. 1 Average meteorological data of the 32 states in Mexico: a Annual ambient temperature and b Relative ambient humidity
national housing, particularly State Social Housing (SSH), which account for over 35 million in the country. More than 90% of SSH units are built with materials such as bricks, blocks, or cement partitions and 70% have concrete slab ceilings or joists with vaults [17].
2.1 Case Study: State Social Housing in Mexico State Social Housing (SSH), which is the focus of this analysis, accounts for over 30% of the residential construction in Mexico [1]. The case study has a constructed area of 69.69 m2 and comprises a north-facing hall (dining room), a south-facing kitchen, two bedrooms, and an east-facing bathroom. The building has a total of seven windows, two on the north façade, three on the west façade, and two on the south façade. The doors are located on the north and south walls. However, the building’s construction materials have high conductivity and heat retention, which is insufficient in many regions of the country, especially against radiation. The SSH building’s walls consist of 20 cm blocks covered with internal and external mortar finishes, while the ceiling is made up of a joist and vault with a layer of mortar (Fig. 2). Table 3 shows the breakdown of the materials used in the housing envelope and their thermal properties, as per the current standard NOM-020-ENER-2011 [18].
3 Unsupervised Machine Learning Machine learning methods can be classified into three categories: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning deals with structured, pre-treated and organized information, while unsupervised learning offers an exploratory approach to data analysis by identifying patterns in large datasets more efficiently than manual observation. Unsupervised machine learning algorithms are designed to analyze unlabeled datasets and discover hidden
78
O. May Tzuc et al.
Fig. 2 Characterization of the case study: a Front view, layout and dimensions of an SSH unit and b Digital model and envelope layers
Table 3 Thermal properties of the construction materials in Mexican SSH
Roof
Walls
Windows
Material
Thickness (m)
U (W/m2 K)
White paint
0.010
1.45
Cast concrete
0.060
Concrete block
0.170
Mortar
0.020
White paint
0.010
Mortar
0.015
Concrete block
0.180
3 mm glass
0.003
2.21
3.8
patterns or data groupings without the need for human intervention. This category of machine learning is typically used for three main tasks: dimensional reduction [19], association [20], and clustering [21]. This chapter focuses on the last one, specifically the so-called k-means clustering, as a tool for territorial categorization based on energy and climate patterns.
3.1 K-means Clustering Classification Clustering is widely regarded as the most representative technique of unsupervised learning, which involves discovering hidden structures in data where the correct answer is unknown. Clustering is an essential tool for various data analysis or machine learning-based applications such as regression, prediction, and data mining and is critical to group objects in multiple disciplines such as engineering, humanities, medical sciences, health, and economics [22]. The primary goal of clustering is to group unlabeled datasets based on specific patterns and build subsets known as clusters. Clusters are groups of objects that share similarities with each other, but also have noticeable differences from objects in other groups. Conventionally, the clustering structure is represented as a dataset M divided into subsets m i where each object belongs exclusively to one subset.
Cluster Analysis as a Tool for the Territorial Categorization of Energy …
M = m1 ∩ m2 ∩ · · · ∩ mi ∩ · · · ∩ mk
79
(1)
An adequate clustering algorithm must meet the following criteria [23]: • • • • • • •
Scalability. Interact with different types of attributes. Discover clusters with arbitrary shapes. Minimal domain knowledge to determine input parameters. High dimensionality. Ability to deal with noise and outliers. Insensitivity regarding attributes.
Various algorithms designed to perform clustering tasks are found in the literature. This is because clustering is a process that starts from unknown data, making it challenging to define when a cluster is found. Therefore, diverse approaches based on various inclusion principles have been developed [24]. A detailed description of clustering taxonomy can be found in [22, 25]. This chapter focuses on implementing k-means clustering as a categorization tool since it is one of the most versatile techniques to handle considerable amounts of data. K-means belongs to the category of prototype-based clustering and is widely used both in academia and industry. It is based on the premise of grouping a dataset according to a predefined number of clusters k. Its advantages include simplicity, computing speed, and the ability to work with high-dimensional data, making it ideal for grouping similar objects and discovering underlying patterns in large datasets [23]. The main idea behind the k-means algorithms is to partition the dataset into k centroids (one for each cluster) based on their common features. For this purpose, the method considers the dataset (n) as objects where each sample (x i ) is positioned in a m dimensional space. The objective is to reduce the distance between a given sample xi and its nearest centroid c j : Minimeze J =
k E n E
( j)
||xi
2
− c j ||
(2)
j=1 i=1
This process is known as the within-cluster sum of square (WCSS) minimization, where the squared Euclidean distance determines if objects belong to a group according to their similarities and differences from the rest of the dataset. d2 =
m E
(xri − xsi )2
(3)
i=1
The k-means algorithms can be carried out through the following steps [26]: (1) Organize the dataset in vector spaces of m-dimensions, where each dimension represents a feature of the set.
80
O. May Tzuc et al.
Fig. 3 The iterative process used by the k-means algorithm to group datasets according to their features and the square distance from the centroid
(2) Propose k centroids (C = [c1 , c2 , . . . ck ]) to be randomly distributed among the database as initial cluster centers. (3) Create the k clusters assigning each data point (xi ) to the nearest centroid c j . (4) Move the centroids to the center of the data points that make up the cluster. (5) Repeat steps 3 and 4 until the cluster allocation does not change or some tolerance criteria or maximum iterations are met. Figure 3 shows an example of the previously described procedure for a dataset with two features (2 dimensions). Here, k = 3 centroids are assigned to the scattered data and Eqs. 2 and 3 are applied. After the first iteration, a first data grouping is obtained, which changes as the centroids are relocated. The result is obtained after a given number of iterations.
3.2 Cluster Evaluation Like most clustering algorithms, k-means clustering requires the user to specify the number of clusters to execute the grouping process. This is one of the main drawbacks of this unsupervised learning technique since an incorrect selection of k can result in poor performance when grouping the data. One way to solve this is by trial-and-error, which is highly subjective and based on perception rather than verifiable metrics. To address this problem, various metrics have been developed to assess how well the data points (xi ) fit the groups formed by k centroids. Two commonly used
Cluster Analysis as a Tool for the Territorial Categorization of Energy …
81
Fig. 4 Interpretation of two common methods for optimal cluster identification: a Silhouette method and b Elbow method
techniques are the Silhouette and the Elbow methods due to their simplicity. The Silhouette method calculates a measure of the cohesion (a(i)) between the data points and their cluster and the separation (b(i)) from the other clusters. The smaller the measurements, the better the data fit the cluster. The following equation is used to estimate the measurements: s(i ) =
b(i ) − a(i ) max[a(i), b(i )]
(4)
The Silhouette values range from −1 to 1, with high values indicating that the point corresponds well to its own group and does not belong to other groups. If most of the points have a high Silhouette value, then the clustering solution is adequate. If many points have a low or negative Silhouette value, the clustering solution may have too many or too few clusters [27]. Figure 4a illustrates how the maximum value appears in three, so this is the optimal cluster number to group the data. The Elbow method considers the relevant information for cluster size assignment in the WCSS and determines the number of optimal clusters when the WCSS variations become insignificant as the number of groups increases. The easiest way to identify this is through a knee chart, as shown in Fig. 4b. In this figure, no significant variation in the results is observed after four clusters, indicating that four is the optimal number of clusters for this dataset. It is worth noting that this criterion can be considered both ascending and descending depending on the data to be grouped.
82
O. May Tzuc et al.
3.3 Clustering Workflow Before conducting a clustering process, it is highly recommended to perform the following four-stage workflow to obtain the best possible data classification [28]: • Prepare data: Since clustering calculates the similarity between two examples, the attributes of all the data should be combined as a numerical value, these scales must not differ too much. Techniques such as normalization, logarithmic transform, and quantiles can be used for this purpose. • Create a similarity metric: This metric is used to quantify the similarity between data points. It is important to understand the data and how to derive the similarity of attributes for choosing an appropriate similarity metric. The most used similarity metric in clustering algorithms is the Euclidean distance, but other alternatives are Manhattan distance and Cosine distance (Table 4). • Run clustering algorithm: The clustering algorithm uses the similarity metric to group data. • Interpret results: Since clustering is an unsupervised learning technique, no “truth” is available to verify the results. Therefore, verifying the quality of clustering is a rigorous process as the absence of information complicates quality assessment. It depends on the applicator’s experience with the studied topic. A visual check can be performed to see if the clusters look as expected and if the examples that are similar appear in the same cluster. The optimal clustering techniques presented in Sect. 3.2 can aid in this process.
4 Case Study: K-Means-Based Categorization of Energy Consumption of Mexican SSH To illustrate the algorithm discussed in Sect. 3, this chapter presents a case study in which k-means clustering is used to categorize the electrical consumption of SSH in all Mexican locations. Table 4 Common similarity metrics used in k-means clustering [29] Similarity metric
Equation
Description
Euclidian distance
d(x, c) = (x − c)(x − c)'
Each centroid is the mean of points in that group
Manhattan distance
d(x, | E p c)|| = | j=1 x j − c j
Each centroid is the mean related to the components of the points in that group
Cosine distance
d(x, c) = 1 − √(x(xc') x')(cc')
Each centroid is the mean of points in that group, after normalizing those points to a Euclidean unit of length
Cluster Analysis as a Tool for the Territorial Categorization of Energy …
83
4.1 Impact of Operational and Environmental Patterns In Mexico, SSH is built in extensive complexes known as housing development. Although the construction of a single house may not have a significant impact, their massive replication throughout the country leads to high energy consumption and greenhouse gas emissions. Moreover, the construction sets (described in Sect. 2) are often unsuitable for the climatic conditions where they are built, requiring artificial climate control to reach the comfort zone. This makes SSH an important consideration in developing policies towards net-zero buildings. As shown in Fig. 5, there is no dominant type of climate throughout the country, causing the climatic patterns of each location to vary, affecting the needs for comfort and energy use. Similarly, air conditioning operating patterns vary depending on the region, time zone, sunrise, sunset, climate zone, and construction set [30]. This combination of effects produced by climatic and air conditioning operating patterns makes it difficult to develop general strategies to achieve comfort levels while reducing energy consumption and CO2 emissions for SSH. Therefore, the application of a k-means algorithm as a categorization tool is an alternative to identify similarities and allow the development of regional strategies to reduce the use of energy and polluting agents. The k-means-based categorization of SSH’s territorial energy was conducted using a database with 9760 samples. The working database comprises the patterns of energy use, hours of operation of air conditioning, ambient temperature, and relative humidity. Figure 6 summarizes the process for database generation, which includes a conventional Mexican SSH with a set based on masonry construction (Table 3),
More than 9 hrs
From 5 to 9 hrs
From 2 to 5 hrs
Less than 2 hrs
Minisplits: invertir and on/off
Air conditioning operating patterns Central and window
Coolers and others
net-zero building Climatic patterns Fig. 5 Relevance of operational and climatic patterns to meet net-zero building policies for Mexican State Social Housing
84
O. May Tzuc et al.
Fig. 6 Generation of the work database required for the categorization of Mexican State Social Housing
operation setting and air conditioning patterns following national surveys [31], and a thermal comfort temperature of 24 °C according to ASHRAE [32]. Subsequently, the SSH energetically analyzed the different types of climates in the country by using the climatic files of the 2,400 locations (cities) obtained through the Meteonorm software [33]. Building information and climate files were loaded into DesignBuilder Software to ease energy calculation [34], allowing for quickly and reliably simulated energy consumption by air conditioning throughout the country. The results were integrated into a spreadsheet as shown in Fig. 6. To interpret and understand the interaction of the four variables stored in the spreadsheet as well as their dispersion, a scatter matrix was used (Fig. 7). Each point in the figure refers to one of the 2,400 locations in the country. From this figure, interpretations can be obtained that can help to reduce the features of interest as well as see direct or inverse relationships between variables. The figure shows that the interaction between power consumption and ambient temperature is almost linear. This direct relationship between both variables is a crucial point in achieving thermal comfort. The other patterns do not seem to obtain any correlation or grouping at first glance. However, the impact of operating hours and relative humidity on electrical energy consumption rates for indoor air conditioning is well documented [35]. Based on this information, it is possible to ignore the ambient temperature since its information would be replicated regarding energy consumption. In addition, the dispersion results make the hours of operation and the relative humidity suitable candidates for the clustering study.
Cluster Analysis as a Tool for the Territorial Categorization of Energy …
85
Fig. 7 Analysis of the variables of interest using the dispersion matrix
4.2 Computational Approach by K-means Clustering Figure 8 illustrates the computational approach used to classify the 2,400 Mexican locations from the working database with 9,760 samples loaded into a Python script. Since the categorization was based on three features (relative humidity, hours of operation, and electrical consumption for air conditioning), the results could be represented in a spatial distribution and provide a visual interpretation. The Elbow criterion was used to preselect the optimal number of clusters. The Python script to identify the optimal number of clusters was assisted by the libraries Pandas [36], Numpy [37], Sklearn [38], and Matplotlib [39]. Pandas is designed to facilitate the manipulation of databases in Python, while Numpy handles matrices and arrays natively. Sklearn is a well-known machine learning library with algorithms for classification, regression, clustering, and dimensionality reduction. The working database for this example (Fig. 8) consisted of three columns representing the features of interest. The first column represents the electrical consumption for air conditioning, the second column is the hours of operation of the air conditioning equipment, and the last column is the average relative humidity of the location. To obtain the optimal number of clusters the features of interest were placed in a matrix (Matrix X) and evaluated with the number of clusters ranging from 1 to 20. The assessment involved calculating each case’s score, plotting it against the number of clusters of interest, and generating an elbow graph. After identifying the number k of clusters, the script obtained information on the sets, such as the centroids of each cluster and the labels associated with each
86
O. May Tzuc et al.
Fig. 8 Computational methodology for the categorization of State Social Housing using k-means clustering
data point. This enabled the interpretation of results in a 3-D virtual environment or territorial maps based on the identified patterns. The centroids’ data was essential to understand why the locations were grouped in a particular way, while the labels allowed for precise identification of the assigned clusters. For this example, it was found that the clusters of significant interest were between k = 3 and k = 5. Although the Elbow criterion helped to narrow down the space of interest, further analysis was conducted to identify the best option, as presented in Sect. 4.3.
4.3 Energy Consumption Categorization Analysis in SSH Figure 9 presents three selected clusters with the samples distributed in a threedimensional space. The X-axis represents the annual operating hours (per hundred) of the air conditioning equipment, the Y-axis represents the annual energy consumption of the equipment (kWh), and the Z-axis shows the average relative humidity (%) of each location in the country. Figure 9.1a shows an interesting data division for k = 3, where the most significant element is the scaling of the groups concerning the average relative humidity of the locations. The first group covers the lower relative humidity up to 55%, the second group corresponds to sites between 55 and 70% relative humidity, and the last group covers from 70% onwards. Additionally, the vertical segmentation persists when rotating the image (Fig. 9.3a). However, it is noteworthy that large data points agglomerates (Group 2 in red) could contain unanalyzed patterns. To verify this,
(3-c)
(2-c)
air conditioning consumption (kWh)
(2-a)
Operation Hours (Hundred)
air conditioning consumption (kWh)
(3-a)
Relative Humidity (%)
Relative Humidity (%)
3 CLUSTERS
Relative Humidity (%)
Operation Hours (Hundred)
(1-a)
(3-b)
Relative Humidity (%)
(2-b) Relative Humidity (%)
4 CLUSTERS
Relative Humidity (%)
Operation Hours (Hundred)
(1-b)
87
Relative Humidity (%)
5 CLUSTERS
Relative Humidity (%)
(1-c)
Relative Humidity (%)
Cluster Analysis as a Tool for the Territorial Categorization of Energy …
air conditioning consumption (kWh)
Fig. 9 Interpretation of the Elbow method for optimal cluster identification
Figs. 9.1b and 9.3b includes an additional cluster (k = 4) where the intermediate group was divided into two. According to Fig. 9.3b, although the effect of relative humidity is important, the appearance of the new cluster may be due to the intensive use of energy (or to the temperature to which it is strongly linked). This implies that regions with similar relative humidity, but different temperatures do not consume similarly. Finally, Fig. 9.1c and 9.3c shows the scenario of adding another group (k = 5). A pile-up of data is observed in the images from the relative humidity profile concerning hours of operation. However, for electrical power (Fig. 9.3c) this becomes relevant since intermediate consumption is appreciated at high relative humidity. It is essential to clarify that in an unsupervised approach, the correct choice of a cluster depends on the analyst’s knowledge of the subject and the information provided by the machine learning tool. Figure 10 presents the information for the most interesting cluster numbers from a territorial point of view (k = 4 and k = 5). This helps to understand the distribution of the 2,400 country’s locations not just as data in a 3-D plane but also providing physical meaning for the clusters. In the case of k = 4 (Fig. 10a), the territorial division is well defined. The first detail to highlight is the difference between the electricity consumption patterns for the country’s north. In the northeast, it is observed that c = 1 and c = 3 predominate, while in the northwest, these are classified as c = 2 and c = 4. In the center and south of the country, c = 1 predominates (with few exceptions). For the southeastern peninsular region, c = 4 is resumed, indicating similar patterns to the northwest of Mexico. On the other hand, for the case of k = 5 (Fig. 10b),
88
O. May Tzuc et al.
(a)
(b)
C1 C1
C3
C2
C4
C2
C4
C3
C5
Fig. 10 The territorial distribution of State Social Housing based on the k-means clustering results
it is observed that the spatial distribution throughout Mexico is highly fragmented, forming many isolated points alike islands. However, the patterns previously seen in the northwest of the country disappear. Therefore, although mathematically k = 5 could be interesting, territorially it does not adjust to the distribution of Mexican cities like k = 4. After identifying the best grouping of the Mexican territory mathematically and spatially, which turned out to be (k = 4), it is important to examine the centroid data (Table 5). As previously mentioned, relative humidity has a strong influence on the grouping of locations. In the same way, Table 5 highlights that as relative humidity increases, so does the energy consumption of low-income housing in Mexico. Additionally, the number of cities belonging to each cluster is computed, where the first cluster occupies the largest amount of national territory, while the fourth cluster (with the highest quality areas) is the second most interesting. This shows that regions with a hot humid and hot dry climate play an important role when deciding on electrical energy issues for indoor air conditioning. The results indicate that the application of k-means clustering provides a useful starting point to focus on the strategies and energy policies of net-zero buildings in Mexican social housing, despite being a simple machine-learning tool. This method helps to identify that the energy behavior throughout the country varies, despite having a pre-classification according to climatic and operational factors for social housing. Table 5 Information contained in the clusters’ centroids Cluster
Energy consumption (MWh)
Operation hours (Hrs)
Relative humidity (%)
Number of locations
Cluster 1
6.88
4600
57.9
1086
Cluster 2
10.70
4700
43.3
283
Cluster 3
14.15
5500
65.0
374
Cluster 4
18.05
6200
77.7
679
Cluster Analysis as a Tool for the Territorial Categorization of Energy …
89
5 Conclusions This chapter presents the application of the unsupervised machine learning technique k-means clustering for the territorial categorization of energy consumption in State Social Housing in Mexico. Using software for thermal simulation in buildings, the energy consumption of houses throughout the country was identified considering a set temperature of 24 °C for comfort conditions. The categorization considered relative humidity, air conditioning usage, and electricity consumption to group 2,400 locations in the country. In addition, it was identified that the relative humidity has great relevance in the formation of the groups. The results demonstrate that clustering algorithms are an effective tool for providing a first perspective on categorizing a country from easy-to-obtain environmental and operational pattern indicators. The presented approach is a viable alternative for designing plans and strategies to reduce energy consumption and CO2 emissions by large subdivisions made up of state social housing, which are common characteristics of Latin America. Additionally, the methodology presented can be easily applied for the territorial categorization of more complex phenomena, such as studies of socio-energetic, energetic-environmental, and health and energy phenomena. Acknowledgements Coauthor Mario Jiménez Torres is thankful for the financial support granted by CONAHCYT (CVU No. 930301, scholarship no. 785382) to pursue a doctoral grand in Universidad Autónoma de Yucatán, México. This work is part of project 053/UAC/2022 and is a derivative product of Thematic Network 722RT0135 “Red Iberoamericana de Pobreza Energética y Bienestar Ambiental” (RIPEBA), which provided financial support through the CYTED Program’s 2021 call for Thematic Networks.
References 1. Jiménez Torres M, Bienvenido-Huertas D, May Tzuc O, Bassam A, Ricalde Castellanos LJ, Flota-Bañuelos M (2023) Assessment of climate change’s impact on energy demand in Mexican buildings: projection in single-family houses based on representative concentration pathways. Energy Sustain Dev 72:185–201. https://doi.org/10.1016/j.esd.2022.12.012 2. Vázquez-Torres CE, Bienvenido-Huertas D, Beizaee A, Bassam A, Torres MJ (2023) Thermal performance of historic buildings in Mexico: an analysis of passive systems under the influence of climate change. Energy Sustain Dev 72:100–113. https://doi.org/10.1016/j.esd.2022.12.002 ˙ 3. Rashad M, Zabnie´ nska-Góra A, Norman L, Jouhara H (2022) Analysis of energy demand in a residential building using TRNSYS. Energy 254:124357. https://doi.org/10.1016/j.energy. 2022.124357 4. González-Torres M, Pérez-Lombard L, Coronel JF, Maestre IR, Yan D (2022) A review on buildings energy information: trends, end-uses, fuels and drivers. Energy Rep 8:626–637. https://doi.org/10.1016/j.egyr.2021.11.280 5. Magrini A, Marenco L, Bodrato A (2022) Energy smart management and performance monitoring of a NZEB: analysis of an application. Energy Rep 8:8896–8906. https://doi.org/10. 1016/j.egyr.2022.07.010
90
O. May Tzuc et al.
6. Hawila AAW, Pernetti R, Pozza C, Belleri A (2022) Plus energy building: operational definition and assessment. Energy Build 265:112069. https://doi.org/10.1016/j.enbuild.2022.112069 7. Rubel F, Brugger K, Haslinger K, Auer I (2017) The climate of the European Alps: Shift of very high resolution Köppen-Geiger climate zones 1800–2100. Meteorol Zeitschrift 26:115–125. https://doi.org/10.1127/metz/2016/0816 8. Walsh A, Cóstola D, Labaki LC (2022) Performance-based climatic zoning method for building energy efficiency applications using cluster analysis. Energy 255:124477. https://doi.org/10. 1016/j.energy.2022.124477 9. Pérez-Fargallo A, Bienvenido-Huertas D, Rubio-Bellido C, Trebilcock M (2020) Energy poverty risk mapping methodology considering the user’s thermal adaptability: the case of Chile. Energy Sustain Dev 58:63–77. https://doi.org/10.1016/j.esd.2020.07.009 10. Bienvenido-Huertas D, Marín-García D, Carretero-Ayuso MJ, Rodríguez-Jiménez CE (2021) Climate classification for new and restored buildings in Andalusia: analysing the current regulation and a new approach based on k-means. J Build Eng 43:102829. https://doi.org/10.1016/ j.jobe.2021.102829 11. Bienvenido-Huertas D, Rubio-Bellido C, Marín-García D, Canivell J (2021) Influence of the Representative Concentration Pathways (RCP) scenarios on the bioclimatic design strategies of the built environment. Sustain Cities Soc 72:103042. https://doi.org/10.1016/j.scs.2021. 103042 12. Xiong J, Yao R, Grimmond S, Zhang Q, Li B (2019) A hierarchical climatic zoning method for energy efficient building design applied in the region with diverse climate characteristics. Energy Build 186:355–367. https://doi.org/10.1016/j.enbuild.2019.01.005 13. Bhatnagar M, Mathur J, Garg V (2018) Determining base temperature for heating and cooling degree-days for India. J Build Eng 18:270–280. https://doi.org/10.1016/j.jobe.2018.03.020 14. Álvarez-Alvarado JM, Ríos-Moreno JG, Ventura-Ramos EJ, Ronquillo-Lomeli G, Trejo-Perea M (2020) An alternative methodology to evaluate sites using climatology criteria for hosting wind, solar, and hybrid plants. Energy Sources, Part A Recover Util Environ Eff 1–18. https:// doi.org/10.1080/15567036.2020.1772911 15. CONAGUA CN del A (2022) Precipitación 16. Energía S de (2019) Balance Nacional de Energía 2020. 145 17. Encuesta Nacional de Vivienda (2021) Comunicado de Prensa. Encuesta Nacional de vivienda (ENVI), 2020. Principales resultados. Comun Prensa 493/21 1:1–30 18. SENER S de E (2011) Norma Oficial Mexicana NOM-020-ENER-2011. 47 19. Mirkin B (2011) Principal component analysis and SVD. In: Mirkin B (ed) Springer London, London, pp 173–219 20. Subasi A (2020) Machine learning techniques. In: Subasi ABT-PML for DAUP (ed) Practical Machine Learning for data analysis using Python. Elsevier, pp 91–202 21. De S, Dey S, Bhatia S, Bhattacharyya S (2022) An introduction to data mining in social networks. In: De S, Dey S, Bhattacharyya S, Bhatia SBT-ADMT and M for SC (eds) advanced data mining tools and methods for social computing. Elsevier, pp 1–25 22. Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin C-T (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681. https:// doi.org/10.1016/j.neucom.2017.06.053 23. Kononenko I, Kukar M (2007) Cluster analysis. In: Kononenko I, Kukar MBT-ML and DM (eds) Machine Learning and data mining. Elsevier, pp 321–358 24. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666. https://doi.org/10.1016/j.patrec.2009.09.011 25. Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2:165– 193. https://doi.org/10.1007/s40745-015-0040-1 26. Raschka S (2019) Python Machine Learning: unlock deeper insights into machine learning with this vital guide to cutting-edge predictive analytics, 3er edn. Packt Publishing Ltd., Birmingham, UK 27. MathWorks (2015) Statistics and Machine Learning toolbox release notes. The MathWorks, Inc.
Cluster Analysis as a Tool for the Territorial Categorization of Energy …
91
28. Google Developers (2022) Machine Learning—clustering workflow. https://developers.goo gle.com/machine-learning/clustering?hl=es-419. Accessed 15 Feb 2023 29. Mathworks (2017) Statistics and Machine Learning Toolbox TM User’s Guide R2017a 30. Jimenez Torres M, Bienvenido-Huertas D, May Tzuc O, Ricalde Castellanos L, Flota Banuelos M, Bassam A (2022) Projection of the current and future panorama of thermal comfort in Mexico: An approach from CDH to face the climate change. In: 2022 7th international conference on smart and sustainable technologies (SpliTech). IEEE, pp 1–6 31. INEGI (2018) Primera encuesta nacional sobre consumo de energéticos en viviendas particulares (ENCEVI) 32. de Dear R (2004) Thermal comfort in practice. Indoor Air 14:32–39. https://doi.org/10.1111/ j.1600-0668.2004.00270.x 33. Remund J, Müller S, Schmutz M, Barsotti D, Graf P, Cattin R (2022) Meteonorm 8.1 Manual (Software). 63 34. DesignBuilder Software LTd (2022) DesignBuilder. https://designbuilder.co.uk/. Accessed 15 Feb 2023 35. Özbalta TG, Sezer A, Yildiz Y (2012) Models for prediction of daily mean indoor temperature and relative humidity: education building in Izmir, Turkey. Indoor Built Environ 21:772–781. https://doi.org/10.1177/1420326X11422163 36. McKinney W (2010) Data structures for statistical computing in Python. In: van der Walt S, Millman J (eds) {P}roceedings of the 9th {P}ython in {S}cience {C}onference, pp 56–61 37. Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE (2020) Array programming with NumPy. Nature 585:357– 362. https://doi.org/10.1038/s41586-020-2649-2 38. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2012) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830. https://doi.org/10.1007/s13398-014-0173-7.2 39. Hunter JD (2007) Matplotlib: A 2D graphics environment. Comput Sci Eng 9:90–95. https:// doi.org/10.1109/MCSE.2007.55
Machine Intelligence for Smart Social Living
A Quantum Machine Learning Model for Medical Data Classification Hamza Kamel Ahmed, Baraa Tantawi, Malak Magdy, and Gehad Ismail Sayed
Abstract The use of healthcare systems has a big impact on people’s well-being. To create efficient healthcare systems, new models are always being developed. The quick growth of the use of such models in the medical disciplines has created great possibilities for the development of new applications. However, the development of quick, precise, effective models suitable for medical applications still faces significant hurdles. In this chapter, the feasibility of applying the quantum support vector classifier algorithm (QSVC) is evaluated and tested on medical datasets. Ten datasets obtained from the UCI machine learning repository were adopted for this study. The experimental results revealed that the proposed intelligent model based on the QSVC obtained very promising results. The results showed the high classification outcomes of QSVC compared with state of art models. These high classification results can offer technical assistance for the enhancement of medical data classification. Keywords Intelligent model · Quantum computing · QSVC · Artificial intelligence · Medical data
1 Introduction The creation of intelligent systems and services for the healthcare industry has evolved into a multidisciplinary endeavor that may be developed in several important areas including machine learning, data mining, human–computer interaction, data analysis, deep learning, and so forth. Improved diagnostic models, in which disease classification plays a key role, are used in smart health care to better treat patients and track their progress. The quality of life has recently improved as a result of the recent advancements in machine learning algorithms. H. K. Ahmed · B. Tantawi · M. Magdy · G. I. Sayed (B) School of Computer Science, Canadian International College (CIC), New Cairo, Egypt e-mail: [email protected] H. K. Ahmed Computer Science Department, School of Science and Technology, Troy University, Troy, USA © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Adadi and S. Motahhir (eds.), Machine Intelligence for Smart Applications, Studies in Computational Intelligence 1105, https://doi.org/10.1007/978-3-031-37454-8_5
95
96
H. K. Ahmed et al.
Machine learning has emerged as a powerful tool in a wide range of applications, including image and speech recognition, natural language processing, recommendation systems, and healthcare. The goal of machine learning is to develop algorithms that can learn patterns and relationships from data and use these patterns to make predictions or decisions on new, unseen data. Machine learning has several advantages, including the ability to process and analyze large volumes of data, and to automatically extract complex patterns that may be difficult or impossible for humans to identify [1]. With the development of machine learning algorithms, medical data classification has recently become a major topic of research. These algorithms create learning models based on obtained medical information, enhancing the accuracy of medical diagnosis. Medical data comprises clinical measurements together with details on the patient’s social, environmental, and behavioral circumstances. It might be hard to categorize medical data for error-free judgment [2]. This is due to the fact that most medical data have many challenges. One of the main challenges is the high dimensionality of the data, which can make it difficult to analyze and visualize. Medical datasets can have thousands or even millions of features, which can cause the “curse of dimensionality.” This curse can lead to overfitting and reduce the generalization performance of the classifier. Another challenge with medical numerical data is the presence of noise, outliers, and missing values [2]. These issues can be particularly problematic in medical applications, where the data may be noisy or incomplete due to measurement errors or other factors. For example, medical imaging data may contain noise due to image acquisition or processing artifacts. Additionally, missing data can arise due to incomplete patient records or incomplete medical tests. Class imbalance is also a common problem in medical data, where the number of samples in each class may be unbalanced [3]. For example, in disease diagnosis tasks, the number of healthy patients may greatly outnumber the number of patients with the disease of interest. This imbalance can lead to biased classifiers that favor the majority class and perform poorly on the minority class. To address these challenges, various preprocessing techniques and feature selection methods can be applied to reduce the dimensionality of the data and remove noise and outliers. For instance, normalization, scaling, and feature selection can be used to reduce the dimensionality of the data and identify the most informative features for the classification task [4]. Additionally, various machine-learning algorithms can be used to build robust and accurate classifiers for high-dimensional numerical data in medical applications [5]. The choice of algorithm and parameter tuning may depend on the specific characteristics of the dataset, and careful evaluation and validation are essential to ensure reliable and meaningful results [6]. Hence, the concept of a building block for an intelligent model was born. Support vector classifier (SVC) is one of the machine learning algorithms. It has gained a lot of attention due to its ability to handle high-dimensional data and its strong generalization performance. SVC is a binary classification algorithm that finds the best hyperplane to separate the input data into two classes. The hyperplane is defined as the decision boundary that maximizes the margin between the two classes. The margin is the distance between the hyperplane and the closest data points from
A Quantum Machine Learning Model for Medical Data Classification
97
each class. This means that the SVC algorithm can classify new data points by determining which side of the decision boundary they fall on. SVC has been used in a variety of applications, including spam filtering [7], text classification [8], stock price prediction [9], and medical diagnosis [10]. In medical diagnosis, SVC has been used for the classification of various diseases, such as breast cancer [11], Alzheimer’s disease [12], heart disease [13], and skin lesions [14]. Despite the great performance of SVC, SVC has limitations. One of the limitations of the classical SVC is that it requires the dataset to be linearly separable, and it may not work well with datasets that have a lot of noise. To overcome these limitations, researchers have introduced new versions of SVC, and one of the adopted methodologies is using the principles of quantum mechanics. Quantum mechanics is a fundamental theory that explains the behavior of matter and energy at the atomic and subatomic levels. It is a probabilistic theory that describes the probability of a particle’s position, momentum, and other properties [15]. The three main principles of quantum mechanics are superposition, entanglement, and uncertainty. Superposition is the idea that a quantum system can exist in multiple states at the same time, entanglement is the concept that the state of one particle can depend on the state of another particle, and uncertainty is the principle that certain properties of a particle cannot be measured with complete precision. Quantum computing is a relatively new field that aims to utilize the principles of quantum mechanics to perform calculations that are infeasible on classical computers. One of the most promising applications of quantum computing is in the field of medicine, where it could have a significant impact on drug discovery and personalized medicine. By simulating the interactions between drugs and proteins at a quantum level, researchers could develop more effective drugs with fewer side effects [16]. Quantum machine learning is an emerging field that combines quantum computing and machine learning to develop new algorithms for processing large data sets. Quantum machine learning is expected to revolutionize the field of artificial intelligence and enable machines to perform more complex tasks that were previously impossible with classical computing. One of the main advantages of quantum machine learning is its ability to process data more efficiently than classical computing. This is due to the inherent parallelism of quantum computing, which allows for the simultaneous processing of multiple inputs. Additionally, quantum machine learning can perform certain computations exponentially faster than classical computers, such as factorization and database searching [17]. There are several applications of quantum machine learning in various fields such as chemistry, finance, and computer vision. For instance, quantum machine learning has been used to predict molecular properties, optimize financial portfolios, and classify images more accurately than classical methods. Quantum machine learning can have many applications in the field of medicine, particularly in analyzing large sets of numerical data. Medical data can often be very complex and high-dimensional and processing this data efficiently can be a challenging task for classical computers. Quantum machine learning has the potential to address this challenge and provide new insights and opportunities for medical research [18].
98
H. K. Ahmed et al.
Quantum Support Vector Classification (QSVC) is one of the well-known quantum machine learning algorithms. It has become an increasingly popular algorithm for classifying data into two or more classes and can deal with both non-linearly and linearly separable or high-dimensional data. QSVC uses a quantum kernel function to map input data to a high-dimensional feature space, where the classes can be separated by a hyperplane [19]. Compared to classical support vector machine (SVM) algorithms, QSVC can provide better accuracy [20]. In this paper, an intelligent model based on a QSVC is proposed for medical data classification, which has the potential to improve classification accuracy and decrease computational time compared to classical SVC. The model includes five phases: data preprocessing, feature selection, data encoder, classification, and performance evaluation. In the data preprocessing phase, the random undersampling technique is adopted to reduce the impact of imbalanced datasets on machine learning algorithms. Furthermore, in the feature selection phase, the Analysis of Variance (ANOVA) feature selection technique is applied to reduce the dimensionality of the dataset and remove irrelevant features. Through employing a feature selection technique, the computational time required for training and testing the model is significantly decreased. Then, the processed data is used as input for the data encoder phase. Through this phase, classical data is transformed into a quantum format. Then, the output used an input to QSVC in the classification phase. In this phase, the QSVC algorithm with a quantum kernel is used to map the input data into a higher-dimensional feature space, which can potentially improve the separability of the data. Finally, the performance of the proposed model is evaluated using several measurements, including accuracy, recall, precision, F1-score, and computational time, and the results are compared with classical machine learning algorithms. The proposed model has the potential to provide a more accurate and efficient approach to medical data classification, which can have important implications for medical diagnosis and treatment. The main contribution of this chapter is listed as follows: (a) An intelligent model based on QSVC for medical data classification is proposed. (b) The performance of the classical SVC with its quantum version is evaluated and compared. (c) A comparative experiment with the state of art models is conducted. (d) Several evaluation metrics are adopted. The structure of this chapter is as follows. Section 2 provides a brief introduction to quantum mechanics, support vector classifier, and quantum support vector classifier. The overall proposed intelligent model for the medical data classification model is shown in Sect. 5. Section 12 presents the experimental investigation, findings, and analysis utilizing ten benchmark datasets. Finally, Sect. 13 presents the main findings of this chapter and suggestions for possible work in the future.
A Quantum Machine Learning Model for Medical Data Classification
99
2 Background and Methodology 2.1 Quantum Mechanics Quantum mechanics is a fundamental theory in physics that describes the behavior of matter and energy on the atomic and subatomic scale. The principles of quantum mechanics are fundamentally different from classical mechanics, which governs the behavior of larger objects. In the quantum world, particles can exist in multiple states simultaneously, and their properties are described by probability distributions rather than definite values [21]. The field of quantum computing takes advantage of these unique properties to perform certain types of calculations more efficiently than classical computers. Quantum computers use quantum bits, or qubits, instead of classical bits, which can only exist in a single state (either 0 or 1). Qubits can exist in multiple states simultaneously, allowing quantum computers to perform certain calculations exponentially faster than classical computers. For example, the famous Shor’s algorithm, which was proposed in 1994, can factor large numbers exponentially faster than classical algorithms. This has important implications for cryptography, where large numbers are used for encryption [22]. While the potential of quantum computing is great, there are still many challenges to overcome to build practical quantum computers. These include the difficulty of building and maintaining stable qubits, the challenge of correcting errors that arise due to the delicate nature of quantum states, and the need for large-scale integration of qubits [23]. Recent advances in quantum computing have shown promise in overcoming some of these challenges. For example, researchers have developed new qubit designs that are more stable and resistant to errors [24]. One of the main obstacles to building practical quantum computers is the problem of decoherence. Decoherence is the process by which a quantum system interacts with its environment, causing it to lose its quantum properties and become a classical system. To perform calculations on a quantum computer, the quantum states of the qubits must be carefully controlled to prevent decoherence. This requires the use of specialized hardware and software to monitor and control the qubits [25]. Despite these challenges, there has been significant progress in the development of practical quantum computing in recent years. In 2019, a team of researchers at Google claimed to have achieved quantum supremacy, which is the point at which a quantum computer can perform a calculation that is beyond the reach of any classical computer. The calculation they performed was to generate random numbers, which may seem trivial, but it demonstrated that the quantum computer could perform a calculation that is difficult for a classical computer to simulate [26]. In addition to quantum computing, quantum mechanics has also had many other important applications, such as in the development of new materials, such as superconductors and quantum dots [27]. The principles of quantum mechanics have also been applied to the field of quantum cryptography, which is a method for securely transmitting information over a public channel [28].
100
H. K. Ahmed et al.
3 Support Vector Classifier Support Vector Classifier (SVC) is a popular supervised learning algorithm used for classification tasks. It belongs to the Support Vector Machines (SVMs) family of algorithms that were first introduced by Vapnik and colleagues in the 1990s [29]. The basic idea behind SVMs is to map input data to a high-dimensional feature space, where a hyperplane can be used to separate the data into different classes. The hyperplane is chosen such that it maximizes the margin between the classes, which in turn helps to improve the algorithm’s generalization performance. SVC works by finding a hyperplane that separates the training data into two classes. The hyperplane is chosen such that it maximizes the margin between the classes, just like in SVMs. The main difference between SVMs and SVC is that SVC is a linear classifier that can handle non-linearly separable data by applying a non-linear transformation to the input data. This is achieved by using a kernel function, which maps the input data to a high-dimensional feature space where the data becomes separable. SVC has been widely used in various applications such as text classification [30]. One of the advantages of SVC is that it is less prone to overfitting than other classifiers, such as decision trees or neural networks. Additionally, SVC can handle high-dimensional data efficiently, which makes it well-suited for tasks with a large number of features. Over the years, several variants of SVC have been developed, such as the NuSupport Vector Classification and the One-Class Support Vector Classification [31]. These variants provide additional flexibility in modeling complex data distributions and can be used for tasks such as anomaly detection or one-class classification. Recent research has focused on improving the performance of SVC by incorporating new techniques and methods. For example, some studies have proposed using ensembles of SVCs to achieve better accuracy and robustness in classification tasks [32].
4 Quantum Support Vector Classifier Quantum Support Vector Classifier (QSVC) is a quantum machine learning algorithm that is an extension of the classical support vector machine (SVM) algorithm. It is designed to run on quantum computers and is being explored as a potential application of quantum computing in machine learning. QSVC is based on the quantum version of the kernel function used in classical SVM. The kernel function is a measure of similarity between pairs of data points, and it is used to transform the input data into a higher-dimensional feature space where it becomes linearly separable. In the quantum version of the kernel function, the similarity measure is calculated using quantum mechanical operations, which allows for potentially more efficient computation on a quantum computer. One of the advantages of QSVC over classical SVM is the potential for exponential speedup in solving certain optimization problems [33]. This is because the
A Quantum Machine Learning Model for Medical Data Classification
101
optimization problem in classical SVM is NP-hard, while the quantum analogue is in the complexity class BQP (bounded-error quantum polynomial time), which is believed to be strictly larger than NP [34]. However, there are several challenges associated with developing QSVC. One of the challenges is the noise and errors inherent in current quantum computers, which can affect the accuracy of the algorithm. Another challenge is the need for efficient quantum algorithms for kernel computation and quantum data preparation [35]. Despite these challenges, there has been significant progress in the development of QSVC in recent years. For example, some studies have proposed hybrid quantum–classical approaches, where classical computers are used to process the data and optimize the parameters of the quantum circuit [36]. Others have explored the use of quantum error mitigation techniques to improve the accuracy of QSVC on noisy quantum hardware [37].
5 The Proposed Intelligent Medical Classification Model In this chapter, an intelligent model based on a quantum machine learning algorithm is used for medical data classification. The model has five main phases: the data preprocessing phase, the feature selection phase, the data encoding phase, the classification phase, and the evaluation phase. Figure 1 shows the block diagram of the proposed model. Additionally, Fig. 2 shows the flowchart of the proposed intelligent model. As can be observed the proposed model first check if the input dataset suffers from imbalance problem, then it checks the dimension length of the input dataset either it exceeds eight feature or not. Then, the processed dataset is used to feed QSVC. Next, a detailed description of each phase is introduced.
6 Dataset Description In this chapter, to evaluate the robustness of the proposed intelligent model, six medical datasets and four other non-medical datasets obtained from the UCI machine learning repository are used [38]. Table 1 shows the description of the used datasets in terms of the number of records, number of features, and number of classes.
7 Data Preprocessing Phase In the data preprocessing phase, each dataset is first tested for an imbalance problem. Imbalanced datasets mean that one or more classes have significantly fewer examples than the others. This chapter used the Shannon entropy equation to check the imbalance problem of the adopted dataset. Equation (1) shows the balance B equation, where M is the number of samples, k is the number of classes in the dataset, and L j
102
Fig. 1 Block diagram of the proposed model
H. K. Ahmed et al.
A Quantum Machine Learning Model for Medical Data Classification
103
Fig. 2 Flowchart of the proposed intelligent model
is the number of samples per i-th class. From this equation, when B tends to nearly zero, that means that the adopted dataset is unbalanced dataset and must be handled. While, when B is equal or near to one, this indicator that this dataset is a balanced dataset. In this paper, if B < = 0.5, then the random undersampling method (RUS) is applied to this dataset.
104 Table 1 The datasets description
H. K. Ahmed et al.
Dataset name
No. records
No. features
No. classes
Breast cancer 1995
569
30
2
Parkinsons
197
22
2
E Coli
336
7
8
Heart disease
304
13
2
Blood transfusion
748
4
2
Diabetes dataset
769
6
2
Iris
50
4
3
Seeds
211
7
3
Malware analysis dataset
43,876
15
2
Credit card fraud
285,299
5
2
B=
L L − kj=1 Mj log Mj logk
(1)
RUS is a data preprocessing technique used to address the imbalance problem [38]. It involves randomly removing examples from the majority class until the number of examples in that class is equal to the number of examples in the minority class. By applying this technique, the model can learn to make accurate predictions for both classes.
8 Feature Selection Phase The preprocessed data is used as input for the next phase, namely the feature selection phase. Feature selection involves selecting a subset of the original features that are most relevant. The main objective of this phase is to reduce the dimensionality of the dataset and remove irrelevant features. Thus, it can decrease the computational time. In this chapter, the Analysis of Variance (ANOVA) with the f-statistics test is applied to select the relevant features by assigning a rank to each feature based on their correlation with the target variable. Then, only the top eight features are considered. This number is selected because only eight qubits are considered in the proposed model, and each of these is assigned to a qubit. Additionally, it should be noted that some of the adopted datasets have a number of features less than eight. Therefore, the feature selection approach is not applied to these datasets.
A Quantum Machine Learning Model for Medical Data Classification
105
9 Data Encoding Phase In the encoding phase of the proposed model, classical data (processed data) is transformed into a quantum format that can be used as input to quantum algorithms. One of the most common approaches to achieve this is through the use of feature maps, and the ZFeatureMap is a popular method for encoding classical data into a quantum state. The ZFeatureMap circuit applies a set of Z rotations to the qubits based on the binary representation of the input features. Specifically, the number of Z rotations applied to each qubit is determined by the Hamming weight of the corresponding feature, which is the number of non-zero bits in the feature’s binary representation. The resulting quantum state can then be used as input to the quantum kernel in the QSVC, allowing the model to operate on the data in a quantum space. By encoding classical data into a quantum state using the ZFeatureMap circuit, the QSVC can potentially benefit from the advantages of quantum computing when performing classification tasks. These advantages include the ability to efficiently classify high-dimensional data, as well as the potential for improved classification performance through the use of quantum interference in feature space.
10 Classification Phase After pre-processing the dataset using Random Under-Sampling (RUS) and feature selection techniques, the next step in building a machine-learning model is the classification phase. This is where we use the pre-processed data to train a model that can accurately classify new instances. In this phase, the pre-processed dataset is divided into a training set (70%) and a testing set (30%). The training set is used to train the QSVC algorithm with a quantum kernel that maps the input numerical data into a higher-dimensional feature space. The ZFeatureMap with one repetition and a GPU backend is adopted to make the mapping process more efficient. Once the QSVC model is trained, it can be used to predict the class labels of new unseen data. The goal of the classification phase is to build a robust and accurate model that can classify new data with high accuracy. Figure 3 shows the applied quantum part in the proposed model, where reps indicates the number of circuit repetition. From this figure, it can be observed that the cleaned or the processed data is used as input for data encoder and QSVC circuit. Additionally, this figure shows that each feature represented with a qubit q the maximum of qubits is eight.
106
H. K. Ahmed et al.
Fig. 3 The quantum part of the proposed model
11 Evaluation Phase In this phase, the performance of the overall proposed intelligent model based on the QSVC algorithm is evaluated using various metrics. The first metric is accuracy, which is the proportion of correctly classified samples among all the samples. A high accuracy indicates that the model is performing well. The second metric namely recall, is the proportion of true positive samples that are correctly identified by the model. Recall is an important metric when the cost of false negatives is high, and it indicates how well the model can identify positive cases. The third metric is precision, which is the proportion of true positive samples among all the samples that are classified as positive by the model. Precision is important when the cost of false positives is high, and it indicates how well the model can identify true positive cases. The fourth metric is the f1-score, which is the harmonic mean of precision and recall. It provides a balanced measure of precision and recall and is often used when both precision and recall are important. Equation (2), (3), (4), and (5) shows the mathematical definition of accuracy, recall, precision, and f1-score, respectively. The last metric is computational time, which is the time taken by the model to make predictions on new data. This metric is important when the model needs to make predictions in real time or when there are constraints on the computational resources available. The used evaluation metrics provide a quantitative measure of the performance of the proposed model. By evaluating the proposed model using these metrics, the strengths and weaknesses of the model can be identified, and improvements can be made to enhance its performance.
A Quantum Machine Learning Model for Medical Data Classification
107
Accuracy = (T P + T N )/(T P + F N + T N + F P)
(2)
Recall = T P/(T P + F N )
(3)
Pr ecision(P S) = T P/(F P + T P)
(4)
F1Scor e = 2 ∗ P S ∗ Recall/(P S + Recall)
(5)
12 Experimental Results and Discussion In this section, the performance of the overall proposed intelligent medical classification model is evaluated and tested on ten benchmark datasets, six medical datasets, and four other non-medical datasets. Additionally, the importance of each phase of the proposed intelligent model is shown and tested. Moreover, in this section, the proposed intelligent model is compared with similar previously proposed models in the literature. The first experiment in Table 2 shows the decision output from the balance equation in Eq. (1). In this table, the imbalance problem is tested for each dataset using this equation, and the result is whether this dataset suffers from an imbalanced class distribution problem or not. If so, the RUS technique is applied to overcome this problem. For evaluation of the importance of the ANOVA F-Statistics feature selection technique, another experiment is conducted. Table 3 compares the accuracy results before and after applying the ANOVA F-Statistics feature selection technique for the datasets with a number of features less than eight. As can be observed, the overall performance of the overall proposed model has significantly improved by almost 2–5%. For further evaluation of the significance of applying the ANOVA F-Statistics feature selection approach, the computational time in seconds is calculated before and after applying it as shown in Table 4. As can be observed, the computational time is significantly reduced. From this table and the previous one, it can be concluded that the importance of applying a feature selection technique to a dataset. It can significantly decrease the complexity of the data and thus increase accuracy while decreasing computational time. Figure 4 shows the class distribution of one of the adopted benchmark datasets, namely the Malware Analysis Dataset before and after the RUS technique is applied. As can be seen, this dataset suffers from the class imbalance problem which can significantly influence the classification results. Table 5 shows a comparison between the proposed QSVC and the classical SVC in terms of accuracy. The results demonstrate that the QSVC outperforms the classical
108 Table 2 The decision output of the balance equation
H. K. Ahmed et al.
Dataset name Breast cancer Wisconsin (diagnostic)
Parkinsons
E Coli
Heart disease
Diabetes dataset
Blood transfusion dataset
Iris
Seeds
Malware analysis dataset
Credit card fraud
Needs balancing (yes/no)
A Quantum Machine Learning Model for Medical Data Classification Table 3 The performance of the proposed model before and after applying ANOVA F-Statistics feature selection technique in terms of accuracy
Table 4 The performance of the proposed model before and after applying ANOVA F-Statistics feature selection technique in terms of computational time in seconds
109
Dataset name
Accuracy before (%)
Accuracy after (%)
Breast cancer Wisconsin (diagnostic)
95
97
Parkinsons
90
95
Blood transfusion dataset
80
82
Malware analysis dataset
81
85
Dataset name
Computational time before (s)
Computational time after (s)
Breast cancer Wisconsin (diagnostic)
296
21
Parkinsons
236
8
Malware 70 analysis dataset
23
15
10
Heart disease
Fig. 4 Class distribution before and after RUS on malware analysis dataset
SVC on all ten benchmark datasets. Additionally, it can be observed that utilizing the quantum kernel with SVC can significantly improve its performance. For further evaluation of the performance of the overall proposed intelligent model, several evaluation metrics are considered. Table 6 displays the performance evaluation of the proposed QSVC in terms of accuracy, precision, recall, and F1score. The result demonstrates the effectiveness of the proposed QSVC, indicating
110 Table 5 QSVC versus SVC in terms of accuracy
H. K. Ahmed et al.
Dataset name
QSVC
Breast cancer Wisconsin (diagnostic)
94.7
SVC 93.7
Parkinsons
94.9
87.3
E Coli
89.6
87.9
Diabetes
80.6
79.1
Blood transfusion
82.1
77.7
Heart disease
88.5
82.4
iris
100
100
Seeds
100
95.6
Malware analysis
84.8
83.2
Credit card fraud
98.3
97.8
that the proposed intelligent classifciation model is capable of producing accurate predictions for the medical datasets. Additionally, Fig. 5 shows the confusion matrix of samples of the used datasets. These datasets are Breast Cancer Wisconsin (Diagnostic), Parkinson’s, Ecoli, and Iris datasets. As can be observed the proposed model can significantly increase the true positives and negatives samples and decreases the false positives and false negatives samples. The experiment in Table 7 compares the performance of the model with the state of the art models on samples of the used dataset. As can be observed the proposed model obtained the best results for all the used benchmark datasets. From all the obtained results, it can be observed the robustness of the proposed intelligent model. These results can be further employed as a launching point to integrate expert knowledge with knowledge obtained from other data sources to propose an electronic smart healthcare system. Table 6 The performance of the proposed model using QSVC in terms of accuracy, precision, recall, and F1-score Dataset name
Accuracy (%) Precision (%) Recall (%) F1-Score (%)
Breast cancer Wisconsin (diagnostic) 94.7
94.7
94.7
94.7
Parkinsons
94.9
95.9
94.9
95.1
E Coli
89.6
94.3
89.6
91.3
Diabetes
80.6
82.4
80.5
81.1
Blood transfusion
82.1
91.7
82.6
84
Heart disease
88.5
89.1
88.5
88.6
Iris
100
100
100
100
Seeds
100
100
100
100
Malware analysis
84.8
84.8
84.8
84.4
Credit card fraud
98.3
98.4
98.3
98.3
A Quantum Machine Learning Model for Medical Data Classification
111
Fig. 5 Confusion matrix of a Breast cancer Wisconsin (diagnostic), b Parkinsons data, c Ecoli, and d Iris data Table 7 The proposed model using QSVC versus the state of the art models Dataset name
The proposed model
[39]
[40]
[41]
[42]
[43]
Malware analysis
94.7
85
Credit card fraud
98.3
–
–
–
–
–
–
–
–
Diabetes
80.6
–
–
Heart disease
88.5
–
–
74.5
–
–
–
85.7
Iris
100
–
–
–
–
92.54
– 92.65
112
H. K. Ahmed et al.
13 Conclusion and Future Work The effectiveness of medical therapies and decision-making depends on accurate diagnosis. Therefore, cost-effective solutions are urgently needed to control disease processes and lower mortality and morbidity in poor nations. This research aims to design an intelligent medical classification model. The proposed model consists of five phases, data preprocessing, feature selection, data encoding, classification based on the QSVC phase, and evaluation phase. The experimental results revealed the effectiveness of the proposed model, where ten benchmark datasets are adopted. Moreover, the results have demonstrated that the proposed intelligent model can improve the classification of medical data. It obtained an accuracy of 94.7% for Breast Cancer Wisconsin (Diagnostic), 94.7% for Parkinson’s, 94.9% for E Coli, 89.6% for Diabetes, 79.6% for Blood Transfusion, 88.5% for Heart Disease, 100% for iris 100%, 100% for Seeds, 84.8% for Malware Analysis, and 98.3% for Credit Card Fraud. Meanwhile, the promising result of the proposed model may also be employed as a launching point to integrate expert knowledge with knowledge obtained from other data sources to propose an electronic smart healthcare system. In the future, the proposed intelligent model can be further applied to more complex datasets. Additionally, more quantum machine learning algorithms will be considered in the future. Acknowledgements We extend our sincere gratitude to Artivay.INC’s Quantum R&D department for their generous sponsorship and invaluable support. Their contribution has been instrumental in the success of our project, and we are truly grateful for their partnership. Data Availability All the adopted datasets are benchmark datasets collected from UCI machine learning repository [38]. Conflicts of Interest The author declares that there is no conflict of interest.
References 1. Rudin C, Chen C, Chen Z, Huang H, Semenova L, Zhong C (2022) Interpretable machine learning: fundamental principles and 10 grand challenges. Stat Surv 16:1–85. https://doi.org/ 10.1214/21-SS133 2. Juddoo S, George C (2020) A qualitative assessment of machine learning support for detecting data completeness and accuracy issues to improve data analytics in big data for the healthcare industry. In: 2020 3rd international conference on emerging trends in electrical, electronic and communications engineering (ELECOM). IEEE, Balaclava, Mauritius, pp 58–66. https://doi. org/10.1109/ELECOM49001.2020.9297009 3. Picek S, Heuser A, Jovic A, Bhasin S, Regazzoni F (2019) The curse of class imbalance and conflicting metrics with machine learning for side-channel evaluations. IACR Trans Cryptogr Hardw Embed Syst 2019(1):209–237. https://doi.org/10.13154/tches.v2019.i1.209-237 4. Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79. ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2017.11.077
A Quantum Machine Learning Model for Medical Data Classification
113
5. Wang M, Heidari AA, Chen H (2023) A multi-objective evolutionary algorithm with decomposition and the information feedback for high-dimensional medical data. Appl Soft Comput 110102. ISSN 1568-4946. https://doi.org/10.1016/j.asoc.2023.110102 6. Chen RC, Dewi C, Huang SW et al (2020) Selecting critical features for data classification based on machine learning methods. J Big Data 7(52):1–26. https://doi.org/10.1186/s40537020-00327-4 7. Bharathi PP, Pavani G, Krishna Varshitha K, Radhesyam V (2021) Spam SMS filtering using support vector machines. In: Hemanth J, Bestak R, Chen JIZ (eds) Intelligent data communication technologies and internet of things, lecture notes on data engineering and communications technologies, vol 57. Springer, pp 637–647. https://doi.org/10.1007/978-981-15-9509-7_53 8. Luo X (2021) Efficient english text classification using selected machine learning techniques. Alex Eng J 60(3):3401–3409. https://doi.org/10.1016/j.aej.2021.02.009 9. Nti I, Adekoya A, Weyori B (2020) Efficient stock-market prediction using ensemble support vector machine. Open Comput Sci 10(1):153–163. https://doi.org/10.1515/comp-2020-0199 10. Saqlain SM, Sher M, Shah FA et al (2019) Fisher score and Matthews correlation coefficientbased feature subset selection for heart disease diagnosis using support vector machines. Knowl Inf Syst 58:139–167. https://doi.org/10.1007/s10115-018-1185-y 11. Ed-daoudy A, Maalmi K (2020) Breast cancer classification with reduced feature set using association rules and support vector machine. Netw Model Anal Health Inform Bioinforma 9(34):1–10. https://doi.org/10.1007/s13721-020-00237-8 12. de Mendonça LJC, Ferrari RJ (2023) Alzheimer’s disease classification based on graph kernel SVMs constructed with 3D texture features extracted from MR images. Expert Syst Appl 211:118633. https://doi.org/10.1016/j.eswa.2022.118633 13. Shah SMS, Shah FA, Hussain SA, Batool S (2020) Support vector machines-based heart disease diagnosis using feature subset, wrapping selection and extraction methods. Comput Electr Eng 84:106628. https://doi.org/10.1016/j.compeleceng.2020.106628 14. Kumar NV, Kumar PV, Pramodh K, Karuna Y (2019) Classification of Skin diseases using Image processing and SVM. In: 2019 international conference on vision towards emerging trends in communication and networking (ViTECoN). IEEE, Vellore, India, pp 1–5. https:// doi.org/10.1109/ViTECoN.2019.8899449 15. Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S (2017) Quantum machine learning. Nature 549(7671):195–202. https://doi.org/10.1038/nature23474 16. Cao Y, Romero J, Aspuru-Guzik A (2018) Potential of quantum computing for drug discovery. In: IBM J Res Dev 62(6):6:1–6:20. https://doi.org/10.1147/JRD.2018.2888987 17. Raj CV, Phaneendra HD, Shivakumar MS (2006) Quantum algorithms and hard problems. In: 2006 5th IEEE international conference on cognitive informatics. IEEE, Beijing, China, pp 783–787. https://doi.org/10.1109/COGINF.2006.365589 18. Peters E, Caldeira J, Ho A et al (2021) Machine learning of high dimensional data on a noisy quantum processor. npj Quantum Inf 7(161):1–5. https://doi.org/10.1038/s41534-021-00498-9 19. Sergioli G, Militello C, Rundo L et al (2021) A quantum-inspired classifier for clonogenic assay evaluations. Sci Rep 11:1–10. https://doi.org/10.1038/s41598-021-82085-8 20. Cattan GH, Quemy A (2023) Case-based and quantum classification for ERP-based braincomputer interfaces. Brain Sci 13(2):303. https://doi.org/10.3390/brainsci13020303 21. Petri J, Niedderer H (1998) A learning pathway in high-school level quantum atomic physics. Int J Sci Educ 20(9):1075–1088. https://doi.org/10.1080/0950069980200905 22. Shor PW (1994) Algorithms for quantum computation: discrete logarithms and factoring. In: Proceedings 35th annual symposium on foundations of computer science. Santa Fe, NM, USA, pp 124–134. https://doi.org/10.1109/SFCS.1994.365700 23. Korotkov AN (2009) Special issue on quantum computing with superconducting qubits. Quantum Inf Process 8(1–2):51–54. https://doi.org/10.1007/s11128-009-0104-2 24. Egan L, Debroy DM, Noel C et al (2021) Fault-tolerant control of an error-corrected qubit. Nature 598:281–286. https://doi.org/10.1038/s41586-021-03928-y 25. Macchiavello C, Huelga SF, Cirac JI, Ekert AK, Plenio MB (2002) Decoherence and quantum error correction in frequency standards. In: Kumar P, D’Ariano GM, Hirota O (eds) Quantum
114
26. 27. 28.
29. 30.
31.
32.
33. 34. 35. 36. 37.
38. 39. 40.
41.
42. 43.
H. K. Ahmed et al. communication, computing, and measurement, vol 2. Springer, pp 455–464. https://doi.org/ 10.1007/0-306-47097-7_45 Gibney E (2019) Hello quantum world! Google publishes landmark quantum supremacy claim. Nature 574(7779):461–462. https://doi.org/10.1038/d41586-019-03213-z De Franceschi S, Kouwenhoven L, Schönenberger C et al (2010) Hybrid superconductor– quantum dot devices. Nature Nanotech 5:703–711. https://doi.org/10.1038/nnano.2010.173 Bozzio M, Vyvlecka M, Cosacchi M, et al (2022) Enhancing quantum cryptography with quantum dot single-photon sources. npj Quantum Inf 8:104. https://doi.org/10.1038/s41534022-00626-z Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297. https://doi.org/ 10.1007/BF00994018 Joshi S, Abdelfattah E (2021) Multi-class text classification using machine learning models for online drug reviews. In: 2021 IEEE World AI IoT Congress (AIIoT). IEEE, Seattle, WA, USA, pp 0262–0267. https://doi.org/10.1109/AIIoT52608.2021.9454250 Hassannataj Joloudari J, Azizi F, Nematollahi MA, Alizadehsani R, Hassannatajjeloudari E, Nodehi I, Mosavi A (2022) GSVMA: a genetic support vector machine ANOVA method for CAD diagnosis. Front Cardiovasc Med. 8:1–14. https://doi.org/10.3389/fcvm.2021.760178 Abdar M, Acharya U, Sarrafzadegan N, Makarenkov V (2019) NE-nu-SVC: a new nested ensemble clinical decision support system for effective diagnosis of coronary artery disease. IEEE Access 7:67605–167620. https://doi.org/10.1109/ACCESS.2019.2953920 Liu Y, Arunachalam S, Temme KA (2021) rigorous and robust quantum speed-up in supervised machine learning. Nat Phys 17:1013–1017. https://doi.org/10.1038/s41567-021-01287-z Hidary JD, Hidary JD (2019) Complexity theory. In: Quantum computing: an applied approach. pp 43–50.https://doi.org/10.1007/978-3-030-23922-0_4 Blank C, Park DK, Rhee JKK, et al (2020) Quantum classifier with tailored quantum kernel. npj Quantum Inf 6(41):1–7. https://doi.org/10.1038/s41534-020-0272-6 Benedetti M, Lloyd E, Sack S, Fiorentini M (2019) Parameterized quantum circuits as machine learning models. Quantum Sci Technol 4(4):1–20. https://doi.org/10.1088/2058-9565/ab4eb5 Lowe A, Gordon MH, Czarnik P, Arrasmith A, Coles PJ, Cincio L (2021) Unified approach to data-driven quantum error mitigation. Phys Rev Res 3(3):033098-12. https://doi.org/10.1103/ PhysRevResearch.3.033098 Dheeru D, Taniskidou K (2017) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine. http://archive.ics.uci.edu/ml Saini S, Khosla P, Kaur M et al (2020) Quantum driven machine learning. Int J Theor Phys 59(12):4013–4024. https://doi.org/10.1007/s10773-020-04656-1 Mancilla J, Pere C (2022) A preprocessing perspective for quantum machine learning classification advantage in finance using NISQ algorithms. Entropy 24(11):1656. https://doi.org/10. 3390/e2411165 Maheshwari D, Sierra-Sosa D, Garcia-Zapirain B (2022) Variational quantum classifier for binary classification: real versus synthetic dataset. IEEE Access 10:3705–3715. https://doi. org/10.1109/ACCESS.2021.3139323 Tomono T, Natsubori S (2022) Performance of Quantum kernel on initial learning process. EPJ Quantum Technol 9(35):1–12. https://doi.org/10.1140/epjqt/s40507-022-00157-8 Ma W, Hou X (2022) Big data value calculation method based on particle swarm optimization algorithm. Comput Intell Neurosci 2022:1–8. https://doi.org/10.1155/2022/5356164
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB): Perspectives and Implications Robertas Damaševiˇcius, Rytis Maskeliunas, ¯ and Sanjay Misra
Abstract The rise of the Internet of Behaviors (IoB) has paved the way for new opportunities to support and shape human decision-making. IoB refers to the collection, analysis, and use of digital data generated by human activities to inform decision-making processes. This technology has the potential to significantly impact various aspects of our lives, including healthcare, education, finance, and transportation. This chapter explores the perspectives and implications of using IoB for supporting and shaping human decision-making. The chapter examines the extent to which IoB systems can influence human behavior and decision-making, and the ethical implications of such influence. It considers the benefits and challenges of using IoB in various domains, such as healthcare and education. The chapter also analyzes real-world examples of IoB systems in use, and their impact on human decisionmaking and behavior. The findings of this chapter will have important implications for both researchers and practitioners in the field of IoB. For researchers, the chapter provides a comprehensive overview of the current state of knowledge on the impact of IoB on human decision-making. For practitioners, the chapter offers insights into the ethical and practical considerations of using IoB for shaping human decisions and behavior, and offers recommendations for future research and development in this field. Keywords Internet of Behaviors · Decision-making · Decision support · Artificial intelligence · Human-computer interaction
R. Damaševiˇcius (B) Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania e-mail: [email protected] R. Maskeli¯unas Faculty of Informatics, Kaunas University of Technology, Kaunas, Lithuania S. Misra Institute for Energy Technology, Halden, Norway © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Adadi and S. Motahhir (eds.), Machine Intelligence for Smart Applications, Studies in Computational Intelligence 1105, https://doi.org/10.1007/978-3-031-37454-8_6
115
116
R. Damaševiˇcius et al.
1 Introduction The Internet of Behaviors (IoB) is a relatively new and evolving field of study that seeks to understand and leverage the impact of technology on human behavior. It refers to the collection, analysis, and use of data on individual and collective behaviors, with the goal of shaping and supporting human decision-making [1]. The IoB is seen as a natural evolution of the Internet of Things (IoT) [2, 3], which focuses on the interconnectivity of physical devices, and the Internet of People (IoP) [4], which seeks to understand the interconnectedness of people through technology. The IoB builds upon the concepts of behavioral science, including psychology, sociology, and economics, and leverages the advances in data analytics, machine learning (ML) and artificial intelligence (AI) to enable real-time and context-aware understanding of human behavior [5]. By capturing and analyzing data on individuals’ actions and decisions, the IoB has the potential to support and shape human decision-making in a wide range of domains, including healthcare [6], digital marketing [7], customer services [8], construction [9] and education [10–12]. IoB can be originated in the vision of Fourth Industrial Revolution, which involves a fusion of technologies that blur the lines between the physical, digital, and biological worlds [13]. The integration of technologies such as the IoT, AI, and biotechnology is transforming the way we live, work, and relate to one another [14]. The IoT and related technologies can generate vast amounts of data, which can be used to gain insights into human behavior and preferences. This is the basis of the concept of the IoB, which aims to use data analytics to influence and shape human behavior [15]. In this chapter, we aim to explore the impact of IoB systems on human decisionmaking and behavior. We will provide an overview of the key concepts and technologies involved in the IoB, and discuss the benefits and challenges of these systems for supporting and shaping human decision-making. We will also consider the ethical and social implications of the IoB, and identify opportunities for research to address the key questions and concerns that emerge from the study of this field. The objectives of this study are stated as follows: – To explore the extent to which IoB systems can influence or nudge human behavior towards certain outcomes. – To investigate the benefits and drawbacks of IoB systems on human decisionmaking. – To analyze the ethical and privacy implications of IoB systems and their impact on individual autonomy. – To evaluate the impact of IoB systems on organizations and society, including issues related to data privacy, security, and bias. – To provide recommendations for the design and implementation of IoB systems that support human decision-making and decision-shaping. To guide this study, we formulate the following research questions: – How does IoB impact human decision-making processes and behavior?
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
117
– What are the factors that influence the extent to which IoB systems can shape human behavior and decision-making? – How does the use of IoB systems affect individual autonomy and free will in decision-making? – How can the unintended consequences of IoB be mitigated or avoided? – What future developments can be expected in the field of IoB and how will they impact human decision-making processes and behavior? The chapter, firstly, will provide insights into the extent to which IoB systems can shape human behavior and the implications of such influence. Secondly, the study will contribute to the ongoing debate about the ethics of using technology to influence human decision-making. Thirdly, it will offer recommendations for how to design IoB systems that balance the benefits of shaping behavior. Finally, the study will highlight areas for further study. The findings of this study will be of interest to researchers, practitioners, and policymakers working in the field of IoB and human decision-making.
2 Literature Review IoB is a relatively new term that refers to the use of data collected from IoT devices, along with other digital sources such as social media, to track and influence human behavior. As such, the state-of-the-art in IoB research is constantly evolving. Díaz and Boj [16] propose two projects, Data Biography and Machine Biography, to critically examine the present and future of social transformations produced by Big Data and AI. These projects analyze human behavior to generate books that explore the future and question social control and loss of freedom. Bzai et al. [17] provide a review of around 300 sources and discuss various cutting-edge methods and applications that combine ML and IoT. They also classify challenges to IoT into four categories and suggest that exploiting IoT opportunities can help make societies more prosperous and sustainable. Srimaharaj and Chaisricharoen [18] present a ML model for identifying cognitive performance using the principles and characteristics of IoB and human brainwaves, which can enhance classification. Embarak [10] presents a paradigm for smart educational systems that integrates explainable AI (XAI) and IoB technologies. This system monitors students’ actions to personalize educational systems to meet their cognitive demands and assist them when face-to-face instruction is not available. The study shows the major influence of IoB on learner assistance and system adaptations for higher achievement. Embarak [10] use neutrosophic theory and the Analytic Hierarchy Process (AHP) to create a decision-support system based on rules for assessing the impactful elements of IoT in smart education. They suggest that a sustainable IoT ecosystem requires strong IoT security measures, and that
118
R. Damaševiˇcius et al.
IoT ethics should also be considered. Elayan et al. [5] propose a decentralized IoB framework for achieving energy sustainability by tracking, analyzing, and optimizing human behavior using various technologies and approaches, such as 6G networks and decentralized system structures. Zhang et al. [19] studied the impact of air pollution on people’s activities of daily living (ADLs) based on an IoB. They proposed a methodology to quantify the impact of environmental events on citizens’ ADLs and found that air pollution can significantly affect people’s ADLs. Stary [20] proposed a conceptual design of agent-based Digital twins for Cyber-Physical Systems (CPS) based on the IoB. Digital twins mirror the physical CPS part and integrate it with the digital part, facilitating dynamic adaptation and (re-)configuration. Subject-oriented models can be executed to increase the transparency of design and runtime, and subject-oriented runtime support enables dynamic adaptation and the federated use of CPS components. Javaid et al. [8] discussed how the IoB can be used to develop an in-depth understanding of clients that every company needs, connecting all cell phones in the app to see their errors and get visual recommendations. IoB is used to collect information from customers through sharing between connected devices monitored through a single computer in real life. They identified and discussed IoB applications for better customer services. Halgekar et al. [21] presented a survey of the current state of IoB technology, focusing on its evolution from existing IoT technologies, identifying application scenarios, and exploring challenges and open issues that the technology faces. They studied the technology from a research perspective and explored possible research directions. Kuftinova et al. [9] discussed the main trends in the development of enterprise management technologies as the basis for the model of hyperautomation of business processes, considering the use of the IoB, and the ability to carry out the management process from anywhere in space. The study highlights the need for improved privacy and cybersecurity, and suggests that while the cost of introducing innovations may be high, it will decrease over time. Summarizing, these papers discuss various aspects of IoB, and highlight how the integration of ML and IoT is helping to make our environments smarter and how IoB can be applied in various domains such as education, healthcare, and sustainability. The researchers propose several models and frameworks that integrate ML and IoT to analyze human behavior and achieve energy sustainability. The papers also discuss the challenges associated with IoB, including privacy, security, and ethics. Finally, the papers emphasize the importance of using critical thinking and analysis to assess the benefits and risks of IoB.
3 Conceptual Framework of IoB 3.1 Key Concepts and Relationships IoB requires the collection, analysis, and use of data generated from various sources, including wearable devices, social media platforms, and other digital technologies.
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
119
Fig. 1 Mindmap of concepts in IoB
IoB aims to use this data to gain insights into people’s behavior, preferences, and needs, and to develop personalized experiences and services based on these insights. IoB relies heavily on the collection of data from various sources, including wearable devices, social media platforms, and other digital technologies [17]. This data can be used to gain insights into people’s behavior, preferences, and needs. The data collected through IoB must be analyzed to make sense of it. This involves using AI and ML algorithms to identify patterns and trends in the data [22]. IoB has the potential to be used to influence people’s behavior. IoB has the potential to generate significant business value for organizations [23]. By understanding people’s behavior and preferences, organizations can develop more effective marketing campaigns, product offerings, and customer experiences [8]. These concepts (summarized as a mindmap in Fig. 1) are interconnected and form the foundation of IoB research and development. The relations between them are complex and dynamic, requiring interdisciplinary collaboration and a holistic approach to address the opportunities and challenges of IoB. Such mindmap can serve as a starting point for IoB ontology [24].
120
R. Damaševiˇcius et al.
3.2 IoB-Human Interaction Model The IoB-Human interaction model describes the way in which IoB systems and humans interact and shape one another’s behavior and decision-making processes. This model consists of three main components: IoB systems are technology-driven systems that gather data about human behavior, analyze it, and use it to influence or nudge human behavior in specific ways. These systems can be based on various technologies, such as wearable devices, sensors, and other forms of Internet-connected technologies. Human behaviors are the actions and decisions that are shaped by various factors, including personal preferences, motivations, emotions, and external cues and feedback. Note that in IoB systems, humans are not active users, but occupants, or visitors who do not directly interact with the system and are generally unaware that they are a part of it, but who can benefit from how it works [25]. The interaction dynamics describe the way in which IoB systems and human behavior interact to shape one another’s decision-making and behavior. These dynamics can include positive feedback loops, in which positive outcomes reinforce certain behaviors, and negative feedback loops, in which negative outcomes discourage certain behaviors. In this model, the IoB systems gather data about human behavior and use it to influence or nudge human decision-making in specific ways, e.g., an IoB system could use data about a person’s physical activity to suggest a more active lifestyle, or use data about a person’s eating habits to suggest healthier eating options. At the same time, human behavior and decision-making also shape the way in which IoB systems operate, e.g., a person’s choices and decisions may impact the accuracy of the data that is gathered by IoB systems, and may also affect the way in which the data is analyzed and used to influence behavior. Emotion recognition plays a crucial role in IoB as it enables the collection, analysis, and understanding of emotional states of individuals [26]. IoB is concerned with analyzing data collected from a variety of sources to influence human behavior, and emotions are a key factor that affects human behavior. Emotion recognition technology involves using machine learning algorithms and computer vision techniques to analyze facial expressions, tone of voice, and other physical cues to identify and classify emotions. This technology can be used in a variety of applications, including monitoring customer satisfaction, analyzing employee engagement, and identifying potential health issues [27]. In IoB, emotion recognition can be used to track the emotional responses of individuals to various stimuli, including advertising, marketing campaigns, and even political messages. This data can then be used to shape future messaging and content in order to elicit the desired emotional response from the target audience. The IoB-Human interaction model (see Fig. 2) provides a framework for understanding how IoB systems and human behavior interact to shape one another’s decision-making and behavior in order to ensure that IoB systems are used in ways that promote positive outcomes for individuals and society.
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
121
Fig. 2 Entity relationship diagram showing IoB-human relationship model
3.3 Reference Model of IoB The IoB should support should support the following processes: – Data Collection: collection of data from various sources, including wearable devices, smart homes, and other connected devices, that monitor and track human behavior. – Data Processing: extraction of meaningful insights about human behavior and decision-making patterns. This component could include data cleaning, normalization, aggregation, and analysis. – Decision Support: provide decision-making support to individuals based on the insights extracted from the processed data. It could include personalizing recommendations, providing nudges, or making automated decisions based on a predefined set of rules.
122
R. Damaševiˇcius et al.
– Feedback: provide feedback to individuals about the impact of their behavior on various outcomes. This could include information about energy consumption, physical activity, or other relevant metrics. – Evaluation: evaluate the effectiveness of the IoB system in shaping human decision-making and behavior by measuring changes in behavior, monitoring the impact of nudges, and evaluating the effectiveness of decision support. – Governance: follow the policies, guidelines, and regulations to govern the use of IoB systems. This could include data protection, privacy, and security policies, as well as ethical and legal considerations. A reference model for an IoB system typically includes several layers and components that work together to implement the abovementioned processes. Here is a suggested reference model for an IoB system: – Sensing layer includes sensors, devices, and networks that collect data from the physical world. It could include things like wearables, IoT devices, cameras, and other sensors. – Edge layer processes the data collected by the sensing layer and performs initial analysis and filtering of data. It could include gateways, edge servers, and other devices that can perform data processing and filtering. – Cloud layer stores and processes data in the cloud, providing scalability, flexibility, and advanced analytics capabilities. It could include cloud storage, cloud computing, and AI/ML services, and their composition [28]. – Decision-making layer is responsible for analyzing the data stored in the database and making decisions based on the insights and predictions generated by the data analysis layer. – Actuation layer is responsible for implementing the decisions made by the decision-making layer, by sending commands to devices, such as wearable devices or smart home devices, to take specific actions. – Application layer provides end-user applications, interfaces, and services that utilize the data and insights generated by the IoB system. It could include dashboards, alerts, and other tools that help users to monitor and take actions based on the insights generated by the system. – Security and privacy layer provides security and privacy features to ensure that the data collected, stored, and processed by the system is protected from unauthorized access and misuse. It could include authentication, encryption, and other security and privacy features [29]. – Governance and management layer provides tools and features to manage and govern the IoB system, including data management, access control, and compliance management.
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
123
3.4 Reference Architecture The reference architecture should be designed to be flexible and adaptable, allowing organizations to easily incorporate new technologies and innovations as they emerge [25]. A reference architecture for IoB systems could have several components and be organized in a hierarchical manner as follows: – Data Collection and Management: This includes data collection from various sources such as IoT devices, sensors, and human input. The data collected is stored in a secure and scalable database, where it can be accessed, processed, and analyzed. – Data Analytics and Insights: The application of various analytics and data processing techniques to derive insights from the collected data. This includes data warehousing, big data processing, ML, and deep learning algorithms, among others. The insights derived from this component are then used to inform decision-making processes. – Behaviour Prediction and Shaping: A decision-making engine that can make recommendations or automate certain processes based on the insights derived from the data analytics component. This component should be designed to be scalable, flexible, and customizable to meet the needs of different organizations and industries [30]. – User Interface and Feedback: The intuitive and user-friendly interface that allows users to interact with the IoB system. It should be designed to be responsive, accessible, and easy to use, regardless of the device being used. – Interoperability: The integration of the IoB system with other existing systems and technologies, such as smart homes, smart cities, health systems, and retail systems, among others. This component should be designed to ensure seamless integration and interoperability between systems, allowing for seamless data sharing, visualization and presentation. Figure 3 shows a high-level overview of the IoB system architecture and helps to understand the relationships between the different components and how they work together to support human decision-making. The main components of the IoB system include: Data Collection and Management, Data Processing and Analysis, Data Visualization and Presentation, Behavior Prediction and Shaping, and User Interaction and Feedback. The Data Collection and Management component is responsible for collecting and storing data from various sources such as IoT devices, sensors, and user input. The Data Processing and Analysis component processes the data and generates insights and recommendations. The Data Visualization and Presentation component is responsible for visualizing the data and presenting it to the users through dashboards and reports. The Behavior Prediction and Shaping component predicts user behavior based on the data and provides recommendations to shape the behavior. The User Interaction and Feedback component is responsible for the user interface and receiving user feedback.
124
Fig. 3 Component diagram of IoB system
R. Damaševiˇcius et al.
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
125
3.5 Deployment The deployment of the IoB platform can be described as follows (Fig. 4): The IoB Cloud Server contains the main components of the platform, including the IoB Platform API, Data Management Module, and Analytics Engine. The IoB Gateway is responsible for collecting data from the Smart Devices and aggregating it. The Smart Devices are the connected devices in the IoT ecosystem, such as a Smart Thermostat, Smart Lock, and Smart Light. Finally, the Mobile App serves as the primary interface for the end-user and includes the User Interface and Notification Module. The arrows represent the flow of data and communication between the different components.
4 Scenarios of Human Decision-Shaping Using IoB Scenarios of human decision-shaping using IoB can provide valuable insights into how these systems may influence human behavior in various contexts. In this section, we will discuss a few hypothetical scenarios that highlight different aspects of the impact of IoB on human decision-making processes.
Fig. 4 Deployment diagram of an exemplar IoB platform
126
R. Damaševiˇcius et al.
4.1 Physical Activity Tracking An IoB wearable device that tracks the user’s daily physical activity and provides personalized health recommendations. The device uses algorithms to analyze the data collected from the user’s activity levels, sleep patterns, and dietary habits to make suggestions on how to improve their health, e.g., the device may suggest the user take a 10 min walk after lunch to boost their energy levels or remind them to drink more water throughout the day. In this scenario, the IoB system can influence the user’s behavior by nudging them towards making healthier decisions. However, this can also raise questions about the extent of the device’s influence on the user’s autonomy and free will in decision-making. Figure 5 shows the interactions between the different components of the IoB system and how the device can influence the user’s behavior by providing personalized recommendations. The User is represented by a participant, who wears the wearable device. The Wearable device collects data on the user’s activity, sleep, and dietary habits and sends it to the Algorithm component. The Algorithm component analyzes the data and generates recommendations, which are then sent to the User via the Recommendation component.
4.2 Home Security An IoB home security system that uses ML algorithms to learn the user’s behavior and predict potential security threats. The system can then take preventive actions, such as locking doors and windows or activating the alarm, to protect the user’s home and property. In this scenario, the IoB system can have a significant impact on the
Fig. 5 Sequence diagram of an IoB system for human activity tracking
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
127
Fig. 6 Sequence diagram of an IoB system for home security
user’s decision-making by providing a higher level of security and peace of mind. However, the system’s predictive capabilities may also raise concerns about privacy and the potential for false alarms. Figure 6 shows the interaction between the user, the IoB system, the door, the window, and the alarm in the scenario of an IoB home security system. The system starts by collecting data on the user’s behavior, including their patterns of movement and interactions with doors and windows. This data is then processed by the ML algorithms to identify potential security threats. When a potential security threat is detected, the IoB system sends a signal to lock the doors and windows and activate the alarm. The system then takes preventive actions by locking the door and window and activating the alarm to protect the user’s home and property.
4.3 Traffic Management An IoB-powered traffic management system that uses data from connected vehicles and other sources to optimize traffic flow and reduce congestion. The system can provide real-time traffic updates, suggest alternative routes, and even control traffic signals to reduce wait times. In this scenario, the IoB system can have a positive impact on the user’s decision-making by helping them avoid traffic and reach their destinations more quickly and efficiently. However, the system’s control over traffic signals and its ability to manipulate traffic flow may also raise questions about the potential unintended consequences of its use. Figure 7 shows the interaction between the User, IoB System, Connected Vehicle, and Traffic Signals. The User requests real-time traffic updates from the IoB System, which collects traffic data from connected vehicles and analyzes it to provide updated
128
R. Damaševiˇcius et al.
Fig. 7 Sequence diagram of an IoB system for traffic management
information. If the User requests an alternative route, the IoB System suggests one and provides an updated estimated time of arrival. The IoB System also can to control traffic signals to reduce wait times and optimize traffic flow.
5 IoB Systems and Applications 5.1 Smart Homes and Cities IoB systems and applications for smart homes and smart cities can have a major impact on how people live and work in these environments as follows: – Smart Home Energy Management System to monitor and manage energy usage, e.g., a smart thermostat can learn a household’s energy usage patterns and automatically adjust the temperature to optimize energy efficiency. The system could also be programmed to automatically turn off lights and appliances when not in use [31]. – Predictive Maintenance for Smart Homes to predict and prevent equipment failures in smart homes, e.g., a smart washing machine could send data on its performance
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
129
and usage to the manufacturer, allowing the company to predict when it will need maintenance or replacement. – Smart City Traffic Management System to optimize traffic flow and reduce congestion, e.g., a smart traffic light system could monitor traffic patterns in real-time and adjust the timing of traffic lights to optimize flow. The system could also use data from connected vehicles to reroute traffic in real-time in response to accidents or road closures. – Predictive Emergency Response System in smart cities to predict and respond to emergencies, e.g., a smart fire alarm system could use data from sensors to predict the likelihood of a fire and automatically dispatch fire trucks to the scene before the fire has a chance to spread. – Personalized Healthcare System to provide personalized healthcare, e.g., a smart wearable device could monitor a person’s vital signs and send data to a healthcare provider for analysis. The provider could use this data to predict and prevent potential health problems before they occur [32]. These are just a few examples of the many potential applications of IoB systems in smart homes and smart cities. As the technology continues to develop, it is likely that even more innovative systems and applications will emerge such as for smart city security [33].
5.2 Health and Wellness The IoB has the potential to revolutionize the way we approach health and wellbeing by providing new insights into human behavior and decision-making processes. Here are a few hypothetical examples of future IoB systems and applications that could support health and wellbeing: – Personalized Wellness Coaching: IoB systems could provide personalized wellness coaching based on an individual’s behavior patterns and health data, e.g., an IoB system could track physical activity levels, sleep patterns, food intake, and stress levels, and then provide tailored recommendations for maintaining a healthy lifestyle [34]. – Predictive Healthcare: IoB systems could be used to predict health outcomes and provide early interventions, e.g., an IoB system that monitors heart rate, sleep patterns, and other health data could predict the onset of certain conditions such as sleep apnea or heart disease, and provide recommendations for preventative care [35]. – Smart Medicine Management: IoB systems could be used to manage medicine regimens and ensure that patients take their medication as prescribed, e.g., an IoB system could send reminders and notifications to patients when it’s time to take their medication, and monitor their behavior to ensure they are taking their medicine as directed [36].
130
R. Damaševiˇcius et al.
– Mental Health Support: IoB systems could be used to monitor and support mental health, e.g., an IoB system could track stress levels, sleep patterns, and physical activity levels, and provide tailored recommendations for managing stress and improving mental wellbeing [37].
5.3 Retail and Marketing The retail and marketing sector could greatly benefit from the IoB and its ability to gather data on consumer behavior and preferences. With the help of IoB technologies, retailers can collect data on consumer behavior such as their browsing and purchasing habits, their preferences, and more. This information can then be used to personalize and optimize the shopping experience for each individual customer, providing them with more relevant and appealing advertisements, offers, and recommendations. One example of a future IoB system and application in the retail and marketing sector is the use of smart mirrors in dressing rooms. These mirrors use IoB technologies such as sensors, cameras, and AI to gather data on consumer behavior and preferences, e.g., a customer trying on a dress in the dressing room could receive recommendations for matching shoes or accessories based on their body type, skin tone, and previous shopping behavior. This personalized experience would not only improve the customer’s shopping experience, but also increase the likelihood of making a purchase. Another example of a future IoB system and application in the retail and marketing sector is the use of IoB-powered in-store displays. Retailers can use these displays to provide customers with personalized advertisements, recommendations, and offers based on their individual preferences and behavior, e.g., a customer browsing the electronics section of a store could receive a notification on their smartphone offering a discount on the latest laptop model that they have been researching.
5.4 Public Services and Administration Public services and administration could benefit from IoB applications, e.g., the following are some hypothetical examples of future IoB systems and applications that could transform public services and administration: – Predictive Maintenance of Public Infrastructure: IoB could use data from various sources to predict when public infrastructure such as roads, bridges, and buildings will need maintenance. This would enable public administration to schedule maintenance work and minimize downtime [38]. – Intelligent Traffic Management: IoB could be used to optimize traffic flow in cities, reducing congestion and improving air quality, e.g., real-time traffic data
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
131
could be used to adjust traffic signals and reroute vehicles, reducing travel time and increasing road safety [39]. – Citizen Engagement: IoB could be used to enhance communication between citizens and public administration, e.g., a smart city platform could allow citizens to report issues, suggest ideas for improvement, and track the progress of their requests [40]. – Public Safety and Emergency Response: IoB could be used to improve public safety and emergency response, e.g., smart cameras could be used to detect and respond to public safety incidents [41, 42], while real-time data could be used to optimize the deployment of emergency services. – Environmental Monitoring: IoB could be used to monitor environmental factors, such as air quality and water quality [43]. This information could be used to inform on public administration decisions, such as the implementation of green initiatives and the development of new infrastructure projects. The IoB systems have the potential to transform the education sector by providing personalized and adaptive learning experiences for students for 21st century education [44]. Here are some potential use cases for IoB systems in education: – Personalized learning: IoB systems can use data from sensors and other devices to create personalized learning paths for students, e.g., an IoB system could use data on a student’s learning style, preferences, and progress to recommend specific resources and activities. – Early intervention: IoB systems can detect early warning signs of academic and behavioral problems in students, e.g., an IoB system could use data from sensors to detect signs of stress or fatigue in a student, which could indicate that they are struggling with a particular subject or task. – Adaptive assessment: IoB systems can use data from sensors and other devices to create adaptive assessments that adjust in real-time based on a student’s performance, e.g., an IoB system could adjust the difficulty of questions based on a student’s previous responses, or provide additional support if the student is struggling. – Monitoring and support: IoB systems can monitor students’ progress and provide support and feedback, e.g., an IoB system could use data on a student’s engagement and attention to provide feedback on their performance, or send alerts to teachers or parents if a student is struggling or disengaged. – Remote learning: IoB can support remote learning by providing real-time feedback and interaction between students and teachers, e.g., an IoB system could use data on a student’s attention and engagement to provide feedback to the teacher, or provide interaction between students and teachers. These are just a few examples of the benefits of IoB systems and applications in public services and administration. By leveraging the data generated by these systems, public administration could make informed decisions that improve the quality of life for citizens and create more sustainable, resilient communities.
132
R. Damaševiˇcius et al.
6 Results and Analysis 6.1 Impact of IoB on Human Decision-Making IoB refers to a growing network of digital systems that track, monitor, and influence human behavior. The impact of IoB on human decision-making is a complex and multifaceted issue that is currently being explored by researchers in a number of fields, including psychology, sociology, and computer science. One key impact of IoB on human decision-making is the ability of these systems to shape behavior through nudging. Nudging refers to subtle, implicit cues or suggestions that influence people’s decisions without restricting their freedom of choice, e.g., IoB systems can track people’s patterns of behavior and use this information to make recommendations or provide feedback that encourages healthier or more sustainable decision-making. This type of nudging has been shown to be effective in changing behaviors such as physical activity, energy use, and even financial decision-making. The use of IoB to shape human behavior raises important ethical and legal questions, particularly with regard to individual autonomy and free will. Some argue that the use of IoB to nudge behavior is a form of manipulation that undermines people’s ability to make their own choices [45]. Others argue that nudging is an acceptable form of influence as long as it is transparent, fair, and respects people’s right to choose [46]. Another impact of IoB on human decision-making is the way that these systems can affect the way people perceive their own behavior and the consequences of their actions, e.g., IoB systems can provide people with detailed information about the environmental impact of their actions, such as the carbon footprint of their daily activities. This information can help people to understand the impact of their decisions and to make more informed and sustainable choices. The use of IoB to gather and analyze data about human behavior also raises important privacy and security concerns. People may feel uncomfortable with the idea of having their behavior monitored and analyzed, and there is a risk that this data could be used for malicious purposes, such as targeted advertising or political manipulation. Finally, the impact of IoB on human decision-making is likely to change as the technology continues to evolve and become more sophisticated, e.g., the advances in AI and ML may allow IoB systems to more effectively influence human behavior, but also raise new ethical questions about the role of technology in shaping human decision-making.
6.2 Ethical and Privacy Concerns of IoB The ethical and privacy concerns of the IoB are significant, as they can potentially compromise the privacy and autonomy of individuals as follows: – Data Collection and Use: IoB systems collect a vast amount of data on individuals, including their habits, behaviors, and personal information. This data can be used
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
133
for marketing purposes, or it can be shared with third-party organizations without the consent of individuals. – Bias and Discrimination: IoB systems can perpetuate existing biases and lead to discrimination, e.g., algorithms used in IoB systems may be biased towards certain groups, leading to unequal treatment and outcomes. – Autonomy and Free Will: IoB systems can influence human decision-making processes, which can lead to a loss of autonomy and free will, e.g., by delegating it to a digital twin [47]. This can lead to individuals feeling that their choices are no longer their own and that their decisions are being made for them by outside entities. – Control and Power Imbalance: IoB systems can create a power imbalance, with those who have control over the data and algorithms having significant control over individuals and society. To address these concerns, it is important to develop and implement ethical and privacy-by-design principles for IoB systems. This includes ensuring that data collection is transparent, secure, and controlled by the individuals themselves. Algorithmic bias should be actively monitored and prevented, and individuals should have control over their own data and how it is used. Individuals should have access to information about how IoB systems work and how they impact their lives. This can be achieved through education and awareness-raising initiatives, as well as through the provision of accessible and understandable information on the data collection and use practices of IoB systems.
7 Answers to Research Questions 7.1 How Does IoB Impact Human Decision-Making Processes and Behavior? The impact of the IoB on human decision-making processes and behavior is a complex and multifaceted issue. IoB refers to a system of technologies, including wearable devices, sensors, and data analytics, that track and monitor human behavior. The primary goal of IoB is to collect data on human behavior and use it to shape and influence decision-making and behavior. One way in which IoB impacts human decision-making is through the provision of personalized feedback and information. For instance, IoB systems can track physical activity levels, eating habits, and sleep patterns, and use this information to provide individuals with recommendations and suggestions for improving their health and well-being. By providing individuals with personalized information and feedback, IoB systems can help to nudge individuals towards making healthier and more informed decisions. Another way in which IoB impacts human decision-making is through the use of algorithms and machine learning models. IoB systems can use data on individual behavior to create predictive models that anticipate future behavior and make recommendations for decisions,
134
R. Damaševiˇcius et al.
e.g., IoB systems may use data on an individual’s past purchasing habits to suggest products that they may be interested in purchasing in the future. The impact of IoB on human decision-making also extends to the realm of privacy and security. The collection and analysis of personal data by IoB systems raises important questions about the protection of individual privacy and autonomy. The potential for IoB systems to be manipulated or used for malicious purposes also raises important questions about the responsibility of technology companies.
7.2 What Are the Factors that Influence the Extent to Which IoB Systems Can Shape Human Behavior and Decision-Making? The factors that can influence the extent to which IoB systems can shape human behavior and decision-making include: – The extent to which IoB systems can shape human behavior and decision-making is influenced by the amount and quality of data available. The more data available, the better the insights generated and the more precise the recommendations. – Algorithmic accuracy in IoB systems is crucial in shaping human behavior and decision-making. If the algorithms are not accurate, they may provide incorrect recommendations, leading to wrong decisions. – Trust and Transparency: Users must trust the IoB and understand the underlying factors that led to a particular recommendation. A lack of transparency and trust in the system can cause users to ignore or reject recommendations, resulting in a lower level of influence on their behavior [48]. – Personalization is key to shaping human behavior and decision-making. The ability of an IoB system to provide personalized recommendations based on individual preferences, behavior, and context can significantly influence the extent to which it can shape human behavior. – Social norms and cultural factors can also impact the extent to which IoB systems can shape human behavior. Users are more likely to follow recommendations that align with their cultural norms and social expectations. – Legal and ethical considerations play an important role in shaping the extent to which IoB systems can shape human behavior. Compliance with regulations and ethical standards is necessary to ensure that the system operates within acceptable boundaries and respects user privacy and autonomy. These factors, among others, play a critical role in determining the effectiveness of IoB systems in shaping human behavior and decision-making. A well-designed and well-implemented IoB system that considers these factors can have a significant impact on influencing behavior and decision-making.
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
135
7.3 How Does the Use of IoB Systems Affect Individual Autonomy and Free Will in Decision-Making? The use of IoB systems has the potential to affect individual autonomy and free will in decision-making by influencing or nudging human behavior towards certain outcomes. IoB systems collect vast amounts of data on human behavior, which can then be used to tailor experiences and information in ways that may lead individuals to make certain decisions. This is often done through techniques such as persuasive technology or gamification. While these techniques can be effective in influencing human behavior, they also raise questions about the extent to which individuals are freely making decisions and the impact of such influence on their autonomy, e.g., if individuals are repeatedly exposed to information or experiences that are designed to nudge them towards certain decisions, they may begin to internalize these biases and make decisions based on them without realizing it. Furthermore, the use of IoB systems may also result in a loss of privacy, as the data collected can be used to infer sensitive information about individuals, such as their political views, health status, and personal relationships. This loss of privacy can have a negative impact on individuals’ autonomy, as they may feel that their decisions are being shaped by others without their consent.
7.4 How Can the Unintended Consequences of IoB Be Mitigated or Avoided? The unintended consequences of IoB systems can be mitigated or avoided through a combination of technological solutions and ethical considerations. On the technological side, it is important to design IoB systems that respect user privacy and autonomy. This can be achieved through the use of secure and transparent data management practices, as well as through the use of machine learning algorithms that are transparent and interpretable, allowing users to understand and control the factors that are shaping their behavior. In addition to these technological solutions, it is also important to consider the ethical implications of IoB systems. This involves considering the implications of nudging people towards certain behaviors and ensuring that these nudges are aligned with the interests and values of individuals. It is also important to ensure that the use of IoB systems does not perpetuate or amplify existing inequalities or biases. Mitigating the unintended consequences of IoB systems requires a holistic approach that considers both the technical and ethical aspects of these systems. This may involve collaboration between researchers, designers, policymakers, and users to develop solutions that are effective, equitable, and respect individual autonomy and free will in decision-making.
136
R. Damaševiˇcius et al.
7.5 What Future Developments Can Be Expected in the Field of IoB and How Will They Impact Human Decision-Making Processes and Behavior? The field of IoB is rapidly evolving, and it is important to consider the potential future developments that may impact human decision-making processes and behavior. Some of the possible future developments in the field of IoB include: – Advancements in machine learning and AI algorithms, which could result in more sophisticated and effective IoB systems. These systems would be able to process and analyze vast amounts of data in real-time, enabling them to make more accurate predictions about human behavior and provide more tailored recommendations to individuals [49]. – Increased integration of IoB systems with wearable technologies, such as fitness trackers, smartwatches, and smart glasses. This integration could enable IoB systems to collect more comprehensive and continuous data about human behavior, further increasing their ability to shape decision-making processes and behavior. – The development of IoB systems designed to address specific decision-making scenarios, such as those related to health and wellness, financial management, and environmental sustainability. These systems could be designed to promote behavior change in specific domains, and could have a significant impact on human decision-making processes and behavior. – The increasing use of IoB systems in corporate and government settings, where they could be used to monitor and manage employee behavior and promote compliance with certain policies and regulations.
7.6 Implications for Researchers, Policymakers, and Practitioners The implications for researchers, policymakers, and practitioners of IoB are significant. Understanding and addressing these implications will be critical for realizing the full potential of IoB for shaping human decisions and behavior. – For researchers, IoB presents an opportunity to study the impact of technology on human decision-making and behavior in a new and evolving area. This can provide valuable insights into the complex relationship between humans and technology and help to inform the development of new technologies that better support and shape human decision-making. – For policymakers, IoB raises important questions about the regulation and governance of technology that impacts human behavior. Policymakers must consider the potential unintended consequences of IoB and take steps to mitigate these risks while also promoting the positive benefits of IoB.
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
137
– For practitioners, IoB presents both challenges and opportunities. On the one hand, practitioners must ensure that IoB systems are designed and implemented in a way that is ethical, privacy-sensitive, and protects individual autonomy. IoB offers opportunities for practitioners to develop new and innovative applications that can positively impact human decision-making and behavior.
7.7 Limitations and Suggestions for Future Research The study of IoB is still in its early stages, and much work remains to be done in order to fully understand its impact on human decision-making, behavior, and autonomy. As such, there are several limitations and suggestions for future research that can help further our understanding of this important topic. – Long-term Impact: While the impact of IoB on human decision-making and behavior is currently being studied, it is important to consider the long-term implications of these systems on individuals, society, and the economy. Future research should examine the potential long-term consequences of IoB, including how it may change our sense of self, autonomy, and free will. – Interactions between Systems: The impact of IoB systems on decision-making and behavior is not isolated to a single system or application. Future research should consider the complex interactions between different IoB systems, and how these systems interact with other forms of technology and data systems. – Contextual Factors: The impact of IoB on decision-making and behavior is not universal. Contextual factors such as age, culture, education, and socio-economic status can all play a role in shaping the impact of these systems [50]. Future research should consider the influence of these contextual factors in shaping the impact of IoB. – Regulation and Policy: As IoB continues to grow and evolve, it will be important for policymakers and regulators to consider its implications for privacy, autonomy, and the rights of individuals. Future research should examine the role of regulation and policy in shaping the development and deployment of IoB systems. – Human Centered Design: As IoB systems are developed and deployed, it will be important to ensure that they are designed with the needs and preferences of users in mind [51]. Future research should focus on developing human-centered design principles that can be used to guide the development of IoB systems. There is much work to be done in order to fully understand the impact of IoB on human decision-making, behavior, and autonomy. By addressing these limitations and suggesting future research directions, we can continue to build a more complete understanding of this important topic.
138
R. Damaševiˇcius et al.
8 Future of IoB 8.1 Emerging Trends and Innovations IoB and related technologies are rapidly evolving and several new trends and innovations are emerging [52]. These developments are shaping the way IoB systems are designed and deployed, and are driving their increasing adoption across various domains, such as healthcare, retail, marketing, and public services. – AI is playing a crucial role in the development of IoB systems. AI-based IoB systems can analyze large amounts of behavioral data and provide real-time insights into human behavior. This enables these systems to personalize experiences, make more informed decisions, and drive better outcomes [53]. – Edge and fog computing is an emerging trend in the IoB. With edge computing, data processing and analysis can be done locally, reducing the latency and increasing the speed of IoB systems [54]. This is important in applications where real-time decision-making is critical, such as in healthcare [55]. – Wearables and IoT devices are increasingly being used to capture behavioral data and provide real-time feedback to users, e.g., fitness trackers can monitor physical activity, heart rate, and sleep patterns, and provide personalized coaching to users based on their behavior [56]. – Blockchain is also integrated into IoB systems to ensure the privacy and security of user data. With blockchain, data can be securely stored, tracked, and analyzed, providing users with greater control over their data [57, 58]. – Virtual and Augmented Reality technologies, which re evolving to a Metaverse [59], are being explored for use in IoB systems to provide more immersive and interactive experiences for users [60]. These technologies can be used to provide users with a more intuitive and interactive way to engage with their behavior and make decisions. These are just some of the emerging trends and innovations in the field of IoB. As IoB systems become more advanced and sophisticated, it is likely that new trends and innovations will emerge that will further shape the field and drive its impact on human decision-making.
8.2 Predicted Impact on Society and Economy The predicted impact of IoB on society and economy is a topic of ongoing discussion and speculation among experts and researchers. On one hand, IoB has the potential to revolutionize various industries and bring about significant improvements in terms of efficiency, convenience, and quality of life. For example, in the retail and marketing industry, IoB systems could help businesses personalize their offerings based on individual preferences and behaviors, leading to increased customer satisfaction
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
139
and sales [7, 8]. In the healthcare industry, IoB systems could be used to monitor patients and deliver personalized treatment plans, leading to improved health outcomes and reduced healthcare costs [34]. In education, IoB systems have the potential to transform the education sector by providing personalized, adaptive, and data-driven learning experiences for students [10]. In the public services and administration sector, IoB systems could be used to streamline processes, improve citizen engagement and service delivery, and increase transparency and accountability. In smart homes and smart cities, IoB systems could be used to optimize energy consumption, reduce waste, and improve safety and security [61]. The widespread use of IoB systems also raises a number of concerns, including privacy and security risks, the potential for unintended consequences, and the impact on employment and job displacement. It is important for policymakers, industry leaders, and researchers to work together to address these challenges and ensure that the benefits of IoB are realized while minimizing any negative impacts on society and the economy.
8.3 Challenges and Opportunities The IoB is a rapidly growing field with the potential to shape and improve many aspects of human life. However, as with any new technology, there are also challenges and opportunities that must be considered. In this section, we discuss some of the most significant challenges and opportunities of IoB. The main challenges of IoB are: – Privacy and security: The use of IoB systems requires large amounts of personal data to be collected and analyzed. This raises concerns about privacy and security, especially with the potential for sensitive personal information to be misused or stolen [62, 63]. – Bias and discrimination: The algorithms and models used in IoB systems have the potential to perpetuate and amplify biases and discrimination, especially if the data used to train these models is biased [64]. – Technical limitations: IoB systems require the integration of multiple technologies and platforms, which can be technically challenging and costly. Scalability of these systems remains an issue, as they may not be able to handle the large amounts of data generated by billions of connected devices [65]. – Regulation and ethical considerations: The regulation of IoB systems is still in its early stages, and there are many ethical considerations that need to be taken into account, such as the potential for IoB systems to influence human behavior in ways that are harmful or unethical [66]. The opportunities provided by the IoB systems are: – Improved decision-making by providing real-time data and insights, as well as nudging people towards more desirable behaviors and outcomes.
140
R. Damaševiˇcius et al.
– Streamlining and automation of processes [67], leading to increased efficiency and reduced costs in various industries, such as healthcare and retail. – Better health and wellbeing, e.g., by tracking physical activity and dietary habits, or by monitoring environmental conditions to reduce exposure to pollutants [19]. As the field of IoB continues to evolve, it will be important to carefully consider the challenges and opportunities, and to ensure that IoB systems are developed and used in ways that benefit society as a whole.
8.4 Roadmap for Development and Deployment The development and deployment of IoB systems are complex and multi-disciplinary projects that require the collaboration of various stakeholders from academia, industry, and government. The following is a roadmap for the development and deployment of IoB systems. Phase 1: Research and Development (R&D) – Conduct fundamental research on the psychological and behavioral aspects of human decision-making and the potential impact of IoB on these processes. – Develop proof-of-concept prototypes of IoB systems and applications in relevant domains such as health and wellness, retail and marketing, public services and administration, and smart homes and cities. – Evaluate the performance and impact of prototypes using rigorous empirical methods and conduct user studies to gather feedback from stakeholders. Phase 2: Standards Development and Regulation – Establish technical standards for IoB systems and applications to ensure interoperability, security, and privacy. – Develop ethical and legal frameworks to govern the deployment and use of IoB systems and to protect the rights and autonomy of individuals. – Engage with relevant stakeholders such as regulatory bodies, industry organizations, and advocacy groups to obtain support and buy-in for the deployment of IoB systems. Phase 3: Deployment and Commercialization – Conduct pilot deployments of IoB systems and applications in select locations and domains to validate their performance, scalability, and impact. – Partner with industry and government organizations to commercialize IoB systems and applications and bring them to market. – Continuously monitor and evaluate the performance and impact of IoB systems and applications and make necessary improvements and modifications. Phase 4: Mainstream Adoption and Deployment
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
141
– Establish IoB systems and applications as mainstream technologies and deploy them widely across different domains and locations. – Develop and offer training and support programs for stakeholders to ensure effective deployment and use of IoB systems. – Continuously monitor and evaluate the performance and impact of IoB systems and applications and make necessary improvements and modifications. The roadmap presented here provides a high-level overview of the development and deployment of IoB systems and is meant to serve as a general guide. In practice, the actual timeline and specifics of the roadmap may vary depending on various factors such as the level of investment, the availability of technology, and the support of stakeholders.
9 Conclusion The IoB has the potential to greatly impact human decision-making and behavior in many ways. This chapter has explored the current state of IoB research, as well as its potential implications for individuals and society as a whole. The IoB has been shown to be capable of influencing human decision-making processes through the use of various techniques, such as nudging and behavioral targeting. While these techniques have the potential to produce positive outcomes, they can also raise important ethical and privacy concerns. It is important for researchers, policymakers, and practitioners to continue exploring the benefits and drawbacks of IoB systems. This will help to ensure that these systems are developed and deployed in a manner that is responsible, ethical, and effective. The future of IoB is highly uncertain, but it is likely that it will have a profound impact on the way that people live, work, and interact with each other. As such, it is important for researchers, policymakers, and practitioners to continue to explore and understand the potential implications of this rapidly-evolving technology. In conclusion, the IoB holds great promise for shaping human decisionmaking and behavior in many domains. While there are important ethical and privacy concerns that must be considered, the benefits of IoB systems are undeniable. Further research and exploration is needed to ensure that the development and deployment of IoB systems is done in a responsible, ethical, and effective manner.
References 1. Rahaman T (2022) Smart things are getting smarter: an introduction to the Internet of Behavior. Medical reference services quarterly 41(1):110–116 2. Sutikno T, Thalmann D (2022) Insights on the Internet of Things: past, present, and future directions. Telkomnika (Telecommun Comput Electron Control) 20(6):1399–1420
142
R. Damaševiˇcius et al.
3. Yunana K, Alfa AA, Misra S, Damasevicius R, Maskeliunas R, Oluranti J (2021) Internet of Things: applications, adoptions and components—a conceptual overview. Advances in intelligent systems and computing, vol 1375 4. Conti M, Passarella A (2018) The Internet of people: a human and data-centric paradigm for the next generation internet. Comput Commun 131:51–65 5. Elayan H, Aloqaily M, Karray F, Guizani M (2022) Decentralized IoB for influencing IoT-based systems behavior. In: IEEE international conference on communications, vol 2022, May, pp 3340–3345 6. Adewoyin O, Wesson J, Vogts D (2022) The PBC model: supporting positive behaviours in smart environments. Sensors 22(24):9626 7. Afor ME, Sahana S (2022) The Internet of Behaviour (IOB) and its significant impact on digital marketing. In: 2022 international conference on computing, communication, and intelligent systems (ICCCIS) 8. Javaid M, Haleem A, Singh RP, Rab S, Suman R (2021) Internet of Behaviours (IoB) and its role in customer services. Sens Int 2:100122 9. Kuftinova NG, Ostroukh AV, Maksimychev OI, Odinokova IV (2021) Road construction enterprise management model based on hyperautomation technologies. In: Intelligent technologies and electronic devices in vehicle and road transport complex. TIRVED 2021 10. Embarak OH (2022) Internet of Behaviour (IoB)-based AI models for personalized smart education systems. Procedia Comput Sci 203:103–110 11. Embarak OH, Aldarmaki FR, Almesmari MJ (2022) Towards smart education in IoT and IoB environment using the neutrosophic approach. Int J Neutrosophic Sci 19(1):82–98 12. Embarak OH, Almesmari MJ, Aldarmaki FR (2022) Apply neutrosophic AHP analysis of the Internet of Things (IoT) and the Internet of Behavior (IoB) in smart education. Int J Neutrosophic Sci 19(1):200–211 13. Schwab K (2016) The fourth industrial revolution. Foreign Aff 96:44–52 14. Schwab K (2018) Shaping the fourth industrial revolution. Foreign Aff 97:43–50 15. Schwab K (2021) The fourth industrial revolution. Currency 16. Díaz D, Boj C (2023) A critical approach to machine learning forecast capabilities: creating a predictive biography in the age of the internet of behaviour (iob). Artnodes 31:2023 17. Bzai J, Alam F, Dhafer A, Bojovi´c M, Altowaijri SM, Niazi IK, Mehmood R (2022) Machine learning-enabled Internet of Things (IoT): data, applications, and industry perspective. Electronics 11(17) 18. Srimaharaj W, Chaisricharoen R (2022) Internet of Behavior and brain response identification for cognitive performance analysis. In: 2022 Asia-Pacific signal and information processing association annual summit and conference, APSIPA ASC 2022, pp 1308–1311 19. Zhang G, Poslad S, Rui X, Yu G, Fan Y, Song X, Li R (2021) Using an Internet of Behaviours to study how air pollution can affect people’s activities of daily living: a case study of Beijing, China. Sensors 21(16) 20. Stary C (2021) Digital twin generation: re-conceptualizing agent systems for behavior-centered cyber-physical system development. Sensors 21(4):1–24 21. Halgekar A, Chouhan A, Khetan I, Bhatia J, Shah N, Srivastava K (2021) Internet of Behavior (IoB): a survey. In: 2021 5th international conference on information systems and computer networks, ISCON 2021 22. Embarak O (2022) An adaptive paradigm for smart education systems in smart cities using the Internet of Behaviour (IoB) and explainable artificial intelligence (XAI). In: 8th international conference on information technology trends: industry 4.0: technology trends and solutions, ITT 2022, pp 74–79 23. Stary C (2020) The Internet-of-Behavior as organizational transformation space with choreographic intelligence. Communications in computer and information science, vol 1278 24. Damaševiˇcius R (2009) Ontology of domain analysis concepts in software system design domain. In: Information systems development: towards a service provision society, pp 319– 327
Supporting and Shaping Human Decisions Through Internet of Behaviors (IoB). . . .
143
25. Moghaddam MT, Muccini H, Dugdale J, Kjagaard MB (2022) Designing Internet of Behaviors systems. In: IEEE 19th international conference on software architecture, ICSA 2022, pp 124– 134 26. Pal S, Mukhopadhyay S, Suryadevara N (2021) Development and progress in sensors and technologies for human emotion recognition. Sensors 21(16) 27. Kaklauskas A, Abraham A, Ubarte I, Kliukas R, Luksaite V, Binkyte-Veliene A, Vetloviene I, Kaklauskiene L (2022) A review of AI cloud and edge sensors, methods, and applications for the recognition of emotional, affective and physiological states. Sensors 22(20):7824 28. Barati M (2020) A formal technique for composing cloud services. Inform Technol Control 49(1):5–27 29. Patel S, Doshi N (2022) Internet of Behavior in cybersecurity: opportunities and challenges. Lecture notes in electrical engineering, vol 936 30. Elayan H, Aloqaily M, Karray F, Guizani M (2022) Internet of Behavior (IoB) and explainable AI systems for influencing IoT behavior. IEEE Netw 1–8 31. Ngerem E, Misra S, Oluranti J, Castillo-Beltran H, Ahuja R, Damasevicius R (2021) A home automation system based on bluetooth technology using an android smartphone. Lecture notes in electrical engineering, vol 694 32. Karthick GS, Pankajavalli PB (2020) A review on human healthcare Internet of Things: a technical perspective. SN Comput Sci 1(4) 33. Rehman A, Saba T, Khan MZ, Damaševiˇcus R, Bahaj SA (2022) Internet-of-Things-based suspicious activity recognition using multimodalities of computer vision for smart city security. Secur Commun Netw 2022. Article ID 8383461. https://doi.org/10.1155/2022/8383461 34. Javaid M, Haleem A, Singh RP, Khan S, Suman R (2022) An extensive study on Internet of Behavior (IoB) enabled healthcare-systems: features, facilitators, and challenges. BenchCouncil Trans Benchmarks Stand Eval 2(4):100085 35. Srivastava J, Routray S, Ahmad S, Waris MM (2022) Internet of Medical Things (IoMT)-based smart healthcare system: trends and progress. Comput Intell Neurosci 1–17:2022 36. Wagan SA, Koo J, Siddiqui IF, Attique M, Shin DR, Qureshi NMF (2022) Internet of Medical Things and trending converged technologies: a comprehensive review on real-time applications. J King Saud Univ—Comput Inform Sci 34(10):9228–9251 37. Gutierrez LJ, Rabbani K, Ajayi OJ, Gebresilassie SK, Rafferty J, Castro LA, Banos O (2021) Internet of Things for mental health: open issues in data acquisition, self-organization, service level agreement, and identity management. Int J Environ Res Pub Health 18(3):1327 38. Mahmoodian M, Shahrivar F, Setunge S, Mazaheri S (2022) Development of digital twin for intelligent maintenance of civil infrastructure. Sustainability 14(14):8664 39. Damadam S, Zourbakhsh M, Javidan R, Faroughi A (2022) An intelligent IoT based traffic light management system: deep reinforcement learning. Smart Cities 5(4):1293–1311 40. Kummitha RKR, Crutzen N (2019) Smart cities and the citizen-driven Internet of Things: a qualitative inquiry into an emerging smart city. Technol Forecast Soc Change 140:44–53 41. Mu H, Sun R, Yuan G, Wang Y (2021) Abnormal human behavior detection in videos: a review. Inform Technol Control 50(3):522–545 42. Dianyou Yu, He Z (2022) Digital twin-driven intelligence disaster prevention and mitigation for infrastructure: advances, challenges, and opportunities. Nat Haz 112(1):1–36 43. Pantelic J, Nazarian N, Miller C, Meggers F, Lee JKW, Licina D (2022) Transformational IoT sensing for air pollution and thermal exposures. Front Built Environ 8. https://doi.org/10.3389/ fbuil.2022.971523 44. Wogu IAP, Misra S, Assibong PA, Olu-Owolabi EF, Maskeli¯unas R, Damasevicius R (2019) Artificial intelligence, smart classrooms and online education in the 21st century: implications for human development. J Cases Inform Technol 21(3):66–79 45. Congiu L, Moscati I (2021) A review of nudges: definitions, justifications, effectiveness. J Econ Surv 36(1):188–213 46. de Ridder D, Kroese F, van Gestel L (2021) Nudgeability: mapping conditions of susceptibility to nudge influence. Perspect Psychol Sci 17(2):346–359
144
R. Damaševiˇcius et al.
47. Mariani S, Picone M, Ricci A (2022) About digital twins, agents, and multiagent systems: a cross-fertilisation journey. LNAI of Lecture notes in computer science, vol 13441 48. Sharma V, You I, Andersson K, Palmieri F, Rehmani MH, Lim J (2020) Security, privacy and trust for smart Mobile-Internet of Things (M-IoT): a survey. IEEE Access 8:167123–167163 49. Zhang J, Tao D (2021) Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet Things J 8(10):7789– 7817 50. El-Haddadeh R, Weerakkody V, Osmani M, Thakker D, Kapoor KK (2019) Examining citizens’ perceived value of Internet of Things technologies in facilitating public sector services engagement. Govern Inform Quar 36(2):310–320 51. Quentin C, Bellmunt J, Lespinet-Najib V, Mokhtari M (2018) Human centered design conception applied to the Internet of Things: contribution and interest. In: Lecture notes in computer science, pp 11–22 52. Amin F, Abbasi R, Mateen A, Ali Abid M, Khan S (2022) A step toward next-generation advancements in the Internet of Things technologies. Sensors 22(20) 53. Elayan H, Aloqaily M, Karray F, Guizani M (2021) Internet of Behavior (IoB) and explainable AI systems for influencing IoT behavior 54. Salis A (2021) Towards the Internet of Behaviors in smart cities through a fog-to-cloud approach. HighTech Innov J 2(4):273–284 55. Laroui M, Nour B, Moungla H, Cherif MA, Afifi H, Guizani M (2021) Edge and fog computing for IoT: a survey on current research activities & future directions. Comput Commun 180:210– 231 56. Passos J, Ivan Lopes S, Clemente FM, Moreira PM, Rico-González M, Bezerra P, Rodrigues LP (2021) Wearables and Internet of Things (IoT) technologies for fitness assessment: a systematic review. Sensors 21(16):5418 57. Chanson M, Bogner A, Bilgeri D, Fleisch E, Wortmann F (2019) Blockchain for the IoT: privacy-preserving protection of sensor data. J Assoc Inform Syst 20(9):1271–1307 58. Uddin MA, Stranieri A, Gondal I, Balasubramanian V (2021) A survey on the adoption of blockchain in IoT: challenges and solutions. Blockchain: Res Appl 2(2). https://doi.org/10. 1016/j.bcra.2021.100006 59. Dwivedi YK, Hughes L, Baabdullah AM et al (2022) Metaverse beyond the hype: multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. Int J Inform Manag 66:102542 60. Chin J, Callaghan V, Allouch SB (2019) The Internet-of-Things: reflections on the past, present and future from a user-centered and smart environment perspective. J Amb Intell Smart Environ 11(1):45–69 61. Bellini P, Nesi P, Pantaleo G (2022) IoT-enabled smart cities: a review of concepts, frameworks and key technologies. Appl Sci 12(3):1607 62. Yaacoub, JA Salman O, Noura HN, Kaaniche N, Chehab A, Malli M (2020) Cyber-physical systems security: limitations, issues and future trends. Microprocess Microsyst 77. https://doi. org/10.1016/j.micpro.2020.103201 63. Malhotra P, Singh Y, Anand P, Bangotra DK, Singh PK, Hong W (2021) Internet of Things: evolution, concerns and security challenges. Sensors 21(5):1–35 64. Varona D, Suárez JL (2022) Discrimination, bias, fairness, and trustworthy AI. Appl Sci 12(12):5826 65. Sun J, Gan W, Chao H-C, Yu PS, Ding W (2022) Internet of Behaviors: a survey. https://doi. org/10.48550/arXiv.2211.15588 66. Baldini G, Botterman M, Neisse R, Tallacchini M (2018) Ethical design in the Internet of Things. Sci Eng Ethics 24(3):905–925 67. Brous P, Janssen M, Herder P (2020) The dual effects of the Internet of Things (IoT): a systematic review of the benefits and risks of IoT adoption by organizations. Int J Inform Manag 51:101952
A Machine Learning Based Approach for Diagnosing Pneumonia with Boosting Techniques A. Beena Godbin and S. Graceline Jasmine
Abstract Detecting pneumonia early can improve lung patient survival rates. A chest X-ray (CXR) is the most common method for locating and diagnosing pneumonia. However, CXR images pose a challenge for competent radiologists to detect the disease. Especially in emerging nations with pollution-intensive energy sources and energy poverty, pneumonia poses a serious threat to millions of people. Even though effective tools for preventing, detecting, and treating pneumonia are available, deaths from pneumonia remain common in most nations. In this study, boosting approaches are used to identify pneumonia. Machine learning algorithms are able to quickly and easily recognize, identify, and forecast the condition. This method reduces the workload of doctors and radiologists. These machine learning algorithms are known as boosting approaches. The study uses the light gradient boosting machine (LightGBM), gradient boosting machines (GBM), Ada boosters (AdaBoost), and extreme gradient machines (XGBoost). The proposed machine learning method achieved accuracy rates of 98.77% for the XGB and GBM classifiers. The performance of a model is evaluated by its accuracy, precision, and sensitivity. Keywords Ada boosting · Feature extraction · Gradient boosting · Machine learning · Light gradient Boosting · Pneumonia
1 Introduction Humans breathe in oxygen and exhale carbon dioxide via the respiratory system, composed of many organs. The lungs, which exchange gases, are the main component of the respiratory system. The American Lung Association states that during inhalation, red blood cells collect oxygen from the lungs and send it to the parts of A. B. Godbin · S. G. Jasmine (B) School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Adadi and S. Motahhir (eds.), Machine Intelligence for Smart Applications, Studies in Computational Intelligence 1105, https://doi.org/10.1007/978-3-031-37454-8_7
145
146
A. B. Godbin and S. G. Jasmine
the body that need it. During exhalation, they collect oxygen and transfer it back to the lungs that our exhaled air leaves behind when we exhale. Air must travel through bronchioles or bronchi to reach the alveoli of the lungs. Most lungs are right or left lungs. Pulmonary circulation supplies blood to the lungs. It is caused by a variety of disorders that cause the lungs to not function normally, causing lung disease. In addition to being a pathogenic illness of the lung parenchyma, pneumonia is more commonly caused by bacteria and viruses than by microbes such as fungi [2]. Pneumonia is especially contagious among the elderly and children with weak immune systems. In 2015, there were 920,000 pneumonia-related deaths in the world [3]. One of the leading causes of death is pneumonia, according to a World Health Organization report. Most pneumonia victims are young children, the elderly, and those with compromised immune systems [4]. There is a report on pediatric infectious disease fatalities in reference number five. Pneumonia is highlighted in this paper, along with discussions of child fatalities. Global cooperation could save 5.3 million lives by 2030, according to this report [5]. Around 1.4 million children under five die from it every year, making up nearly 18% of all children under five worldwide who die from pneumonia. World Pneumonia Day is observed annually on November 12th in order to raise awareness of the disease. For this year’s World Pneumonia Day [1], we’re focusing on Invest in Child Health: Stop Pneumonia. Imaging is essential for identifying and treating pneumonia. CT scanning, chest X-rays, and ultrasound are several imaging methods for lung pneumonia. X-ray imaging is more accurate, noninvasive, and painless than CT scanning. When an infected person coughs or sneezes, anyone who breathes in the contaminated air contracts pneumonia because of pneumonia’s airborne transmission. A thorough examination of chest X-rays is necessary to find pneumonia. In less developed areas without specialized knowledge, it can be difficult and timeconsuming to interpret chest X-rays. In order to diagnose pneumonia, radiologists need to have experience and knowledge, which makes it difficult [4]. Computer-aided diagnosis systems are therefore needed. Sect. 2 discusses the extraction of features, while Sect. 3 presents the experimental results. Section 4 concludes with a discussion and conclusion.
1.1 Related Works As computer-aided diagnosis has advanced rapidly over the past ten years, it helps radiologists analyze medical images by using data from the computer. The workload of experts is reduced, and diagnostic accuracy is improved [3].Yu et al. employed convolutional neural networks to detect pneumonia in their study. Numerous classification models based on complex neural networks have been described in references [6, 7]. Li et al. investigated deep learning to identify pneumonia in radiographic images. A deep neural network was used by Ge et al. to predict post-stroke pneumonia [9]. AI was used by Chassagnon et al. to quantify, stage, and predict COVID-19 pneumonia [10]. According to reference [11], COVID-19 was detected with wavelet
A Machine Learning Based Approach for Diagnosing Pneumonia …
147
Renyi Entropy and three-segment biogeography in reference 12, whereas Wang and coauthors studied COVID-19 classification using deep fusion and transfer learning. COVID-19 diagnosis is based on wavelet entropy and three-segment biogeography, whereas Wang et al. investigated COVID-19 classification using deep fusion and transfer learning.A Naive Bayes approach was used by Postalcioglu and Kesli to diagnose pneumonia [13]. During the decision-making process, physicians and machines work together in a partnership, according to Ramezanpour et al. [14]. Khan et al. applied deep learning to the classification of brain tumors [15]. Using computerassisted diagnosis, Galván Tejada et al. Investigated the classification of benign and malignant tumors based on a multivariate approach [16]. A normal and diseased chest X-ray data set is used in this study to diagnose pneumonia. Boosting techniques, machine learning approaches. This study’s objective is to analyze X-ray images to diagnose pneumonia. Quick and precise diagnosis is possible. As a result, the study’s objective is to use this created tool to minimize hospital concentrations, particularly during the COVID-19 era. Pediatric pneumonia fatalities have also been reported. In the absence of a specialist physician, a diagnosis can be made swiftly using this type of technology. Without a radiologist, diagnosis can be made accurately and fast with the technology developed. Consequently, treatment can begin right away.
2 Materials and Methods There are various methods of supervised learning classifiers in machine learning. Figure 1 illustrates machine learning algorithms for supervised learning. In machine learning, the ensemble method trains weak learners to solve problems and then combines their efforts to produce better results [17]. Weak models can be exactly merged to produce more precise models [18]. Two types of ensemble approaches exist: bagging and boosting. Bagging is a straightforward aggregation technique. In order to merge it, some model averaging techniques are used [19]. Boosting is a technique that takes homogeneous weak learners into account, teaches them sequentially, and then combines them [18]. It is advantageous to use boosting algorithms because they automate the selection of variables and models and are adaptable and stable under high-dimensional conditions. Therefore, boosting algorithms offer biological researchers an appealing alternative [20]. A classification model, on the other hand, is what your classifier produces based on machine learning. An algorithm, sometimes referred to as a collection of rules, categorizes data. A classifier is used to train a model, which then uses the classifier to categorize your data. The classifier can be supervised or unsupervised. An unsupervised machine learning classifier only receives unlabeled datasets, and it classifies the data based on patterns, structures, and anomalies. Training datasets are provided to supervised and semi-supervised classifiers in order to teach them how to divide data into particular groups. In sentiment analysis, classifiers are trained to look for opinion polarity in text and categorize the text as positive, neutral, or negative. Customers are using machine learning classifiers to automatically assess comments from social
148
A. B. Godbin and S. G. Jasmine Types of Classifiers
Support vector machine
Decision tree
Neural networks
Nearest neighbour
Ensemble method
Linear SVC
Decision tree classifiers
MLP classifiers
KNN
Boosting
AdaBoost
Gradient boosting
LightGBM
XGboost
Fig. 1 Various types of the machine learning algorithms
media, emails, online reviews, etc., in order to find out what they think about your company. Figure 2 illustrates the suggested process flow for diagnosing pneumonia based on X-ray images. The boosting methods in this study include AdaBoost, LightGBM, XGBoost, and GBM. Two steps are involved in the suggested strategy. First, GLCM features are extracted from CXR images. Then, significant features must be selected from those features. Finally, machine learning classifiers are used to determine if the image contains pneumonia or not. To provide the model with the best accuracy, the ensemble model is employed.
2.1 Data Set A chest X-ray data set was found on the Kaggle website [21]. An uneven distribution of data is found when examining the data distribution. The distribution of data is correct. Figure 3 illustrates some representative images gathered from the data collection. Approximately 20% of the data set is used for testing and 80% for training. There are 1341 normal images and 1310 pneumonia images in the data set.
2.2 Traditional Machine Learning Approach The two main processes in using image processing in machine learning are feature extraction and classification. Additional processes may be inserted between these steps based on the application. Figure 1 illustrates how the model works. The first and most important step is to extract features from grayscale photos. The next step
A Machine Learning Based Approach for Diagnosing Pneumonia …
Input acquisition (CXR image)
GLCM feature extraction
Select best features
Boosting classifiers (XGBM, LGBM, Adaboost, GBM)
Classify results (Pneumonia or Normal) Fig. 2 Workflow of proposed model
Fig. 3 Sample images from the data set a pneumonia images b normal images
149
150
A. B. Godbin and S. G. Jasmine
is to adapt a mathematical technique to reduce the number of features to be taught. Classification algorithms are the next step.
2.2.1
Feature Extraction
Using OpenCV and Python, the Gray Tone Spatial Dependency Matrix, or GLCM (Gray Level Co-Occurrence Matrix), has been used to extract the majority of texture information from chest X-rays to predict pneumonia. Figure 4 shows GLCM computation. Oversampling has been applied to correct an unbalanced data set. Godbin et al. introduced the GLCM feature extraction method for recognizing covid pictures [34]. There is an imbalanced data problem when one class label outnumbers the other. In this case, the algorithm is trained to only predict the class label that has a majority, leaving out the minority class label. One of the corrective measures is oversampling, which uses data points repeatedly in order to evenly distribute records with both class labels. To conduct testing and training, the following features were taken: Blurriness, Contrast, Correlation, Brightness, Energy, PeakColor, ClusterProminence, Homogeneity, Entropy, Variance, Standard Deviation, Smoothness, Kurtosis, IDM, Cluster Shade, Skewness, Maximal Probability, Difference Variance, Sum Average, Sum Variance, Sum Entropy, Auto Correlation, Difference Entropy, the following key characteristics are explained: Contrast: An image with a high contrast value likely contains a wide range of different elements. Correlation: The correlation feature can be used to determine how linearly the gray levels of neighbors or a set of points depend on one another. For areas with similar gray levels, greater values can be determined. Blurriness: It modifies the pixel value, changing the sharpness and smoothness of the edges by allowing low frequency to enter and high frequency to exit. The image is convoluted with a Gaussian function to create the blurriness.
Fig. 4 Example of calculating GLCM a gray level image—three levels: black, dark gray, and light gray b representation of gray level in image -1 = black, 2: dark gray, 3: light gray, and c for 0°, one-pixel separation, two occurrences of 1–2
A Machine Learning Based Approach for Diagnosing Pneumonia …
151
Homogeneity: Contrast is great when homogeneity is low, and vice versa. The homogeneity of an image measures how uniformly the various shades of gray are dispersed. It is roughly inversely proportional to contrast. Peak colour: Peak colour calculates the image’s histogram. In an image, the peak colour range contains the dominant colour. The frequency of pixel intensity values can be viewed in histograms. Clusters are made using histogram values, and the peak color is chosen from the centroid of the greatest cluster. Cluster Prominence: Asymmetry in the image can be found using the feature known as cluster prominence. A higher cluster prominence value denotes less symmetry in the image. The GLCM matrix, on the other side, has a peak close to the mean values if the value of cluster prominence is lower. Energy: Energy is a metric for an image’s uniformity. It is the transition between monotonic gray levels; greater values denote textural homogeneity. An image’s homogeneity is indicated by a greater energy value. To put it another way, energy is also known as an Angular Second Moment.
2.2.2
Gradient Boosting Machine
A machine learning strategy for categorization issues is the family of boosting techniques [33]. A gradient boosting model sequentially adds new methods from a set of weak methods with regard to the mistake of the complete ensemble model learned at each iteration, a new weak, base-learner model is taught [25], because each new model has the potential to reduce the loss function [22]. Using the loss function increases accuracy overall. However, boosting must finally come to an end in order to prevent the model from being over fit. A threshold model or the most possible models that can be constructed [23] can serve as the halting condition. Estimation of a model’s f() regression function, which connects the predictor variables X with the result Y [24].
2.2.3
Light Gradient Boosting Machine
A histogram-based method is used in LightGBM. By making variables with continuous values discrete, it lowers calculation costs. Decision tree training duration is inversely correlated with the amount of calculations and divisions [26]. It uses fewer resources and cuts down on training time [27]. Learning in decision trees can be done leaf-wise or depth-wise. The balance in the tree is preserved as the tree expands in a level-based technique. The splitting process starts from the leaves, which lowers the loss, is continued in the leaf-oriented strategy. This characteristic sets LightGBM apart from other boosting algorithms. With a leaf-oriented method, the model learns more quickly and with a lower error rate. When the amount of data is low, the model is vulnerable to over learning due to the leaf-focused growth strategy. The technique is therefore better suited for application in huge data. Additionally, factors like depth and leaf count can be tuned to avoid over learning. These algorithms’
152
A. B. Godbin and S. G. Jasmine
Fig. 5 LightGBM versus XGBoost
benefits include quick training times and low memory requirements. The key factor contributing to improved accuracy is the leaf-wise split technique, which can result in significantly more complicated trees [28]. The decision tree is of the boosting type gradient boosting type. LightGBM uses a learning rate of 0.1 and a number of boosted trees of 100.The assumption is that each tree has 31 leaves. The maximum tree depth has no upper bound. The smallest amount of instance weight required in a leaf is 0.001 pounds. Twenty data points are considered to be the bare minimum required in a leaf. Figure 5 shows the difference between LightGBM and XGBost.
2.2.4
Extreme Gradient Boosting
XGBoost refers to Extreme Gradient Enhancement [29]. It is a machine-learning method that employs gradient boosting and decision trees. In our studies, XGBoost has low computing complexity, fast processing speed, and high accuracy. A few weak classifiers are combined into one strong classifier using the boosting algorithm. Booster parameters determine the characteristics of learners. An effective classifier model is created using the Gbtree booster algorithm by using a weak learner regression tree. XGBoost, a lifting tree method, combines different tree methods to create an effective classifier model [30]. In XGBoost, there are two types of boosted tree models. These include regression trees and classification trees. Based on n-labeled samples with characteristics M, the tree ensemble technique [31] forecasts labels using additive functions.
A Machine Learning Based Approach for Diagnosing Pneumonia …
2.2.5
153
Ada Boosting
In 1996, Yoav Freund and Robert Schapire proposed Ada Boost, an ensembleboosting classifier. To improve the classifier’s accuracy, weak classifiers are combined. The Ada Boost classifier combines weak classifiers to build a powerful classifier with high accuracy. To achieve precise predictions of infrequent events, Ada Boost trains a data sample for every iteration and weights the classifier appropriately. Ada Boost must adhere to two conditions: The classifier should be interactively trained on a variety of weighted training cases to take into account the weights from the training set. The basis classifier can be any machine learning algorithm that considers weights from the training set. In order to provide these instances with the most accurate match throughout each iteration, it reduces training errors. Godbin [35] explains general pneumonia detection methods using machine learning models.
3 Implementation and Result Analysis A machine running Windows 10 was used to compile the training and test results of the pneumonia classification based on CXT pictures. An Intel Core i5 8th generation processor and 8 GB of RAM were used. Python 3.7.10 and Scikit-learn 0.23.1 were both used. XGBoost, LGBM, Adaboost, and GBM are classifiers. Hyperparameters are tuning parameters used to control classifier learning. We controlled several variables for each categorization method. To build the hyperparameters for each classification model, grid-searching with a tenfold CV was used. XGB and GBM classifiers produce similar results in some cases. Because of this, particular values were chosen from the allowed interval for these models. In order to generate more precise and realistic results in this study, the tenfold Cross Validation method was employed. Sensitivity, precision, accuracy, and F1 support were used to evaluate the model’s correctness. An accurate prediction is defined as the proportion of accurate predictions to all other hypotheses. The ratio of correct positive class predictions to correct negative class guesses is known as precision. Sensitivity is the ratio of accurate positive class predictions to correct negative class guesses. Precision and Recall are added to determine the F1 support score. An accurate classification of positive class labels is indicated by the term "true positive" (TP). The number of incorrectly expected negative class examples is known as TN. The overall number of negative class samples that were mispredicted as positive is known as the "false positive" rate (FP). In false negatives, positive cases are mistakenly labeled as negatives. Accuracy =
TP +TN T P + FP + T N + FN
Precision =
TP T P + FP
154
A. B. Godbin and S. G. Jasmine
F1 − Score = 2 ∗
Pr ecision ∗ Sensitivity Pr ecision + Sensitivity
Sensitivity =
TP T P + FN
Specificity =
TN T N + FP
The performance results of the proposed method, which examined the effects of many feature sets, are displayed in the tables. In this scenario, it is vital to keep in mind the fact that several features combinations were explored. We extracted the features using a Python package. This package [32, 33]’s mean value of these values determines the value of each GLCM characteristic for each angle degree separately. As a result, our characteristics are calculated for the mean of various degree values [0, 45, 90, 135]. We also employ trial and error by varying values for each feature extraction’s distance method. As we could see, these experiments had minimal bearing on the results. Table 1 displays the categorization results for GLCM features. The greatest accuracy result with GBM and XGB classifier was 98.7%, and the best sensitivity score with LGBM and GBM was 100%. The best accuracy of the LBGM is 98.58%, while the best F1 scores are obtained utilizing the LGBM, GBM model, and XGB, which is 99. When performance data are reviewed, a strong association can be said to exist if the MCC is close to 1. Sensitivity is the capacity of a test to precisely identify patients with an illness. Specificity refers to a test’s capacity to correctly identify those who don’t have the disease. It’s crucial for the diagnosis to be accurate that it is both specific and sensitive. Results for specificity are marginally nearer 1 than those for sensitivity. We can conclude that the harmonic average of the sensitivity and precision numbers, or F1-Score, provided more accurate results. The total number of all inaccurate results divided by the overall number of the data set yields the same error rate. The ideal score is one where the mistake rate is close to zero. The performance metrics of the LGBM, GBM, ABM, and XGB classifiers for the classes pneumonia and normal are shown in Figs. 6, 7. Figure 8 displays a confusion matrix for the data set used by the XGB and LGBM classifiers. For the XGB classifier, there are 518 TP, 530 TN, 0 FP, and 13 FN. Table 1 Features of GLCM GLCM (Gray Level Co occurrence Matrix) Accuracy (%)
Precision (%)
Sensitivity (%)
F1 score (%)
LGBM
98.58
97
100
99
GBM
98.77
98
100
99
ABM
96.22
95
97
96
XGB
98.77
98
100
99
A Machine Learning Based Approach for Diagnosing Pneumonia …
155
Normal
101 100 99 98 97 96 95 94 93 92 LGBM
GBM Accuracy(%)
ABM
Precision(%)
Sensitivity(%)
XGB F1 score(%)
Fig. 6 Performance of classifiers for Normal class
102
Pneumonia
100 98 96 94 92
LGBM
GBM Accuracy(%)
Precision(%)
ABM Sensitivity(%)
XGB F1 score(%)
Fig. 7 Performance of classifiers for Pneumonia class
Figure 8b displays a confusion matrix for the LGBM classifier. The LGBM classifier has 518 True Positives, 528 True Negatives, 0 False Positives, and 15 False Negatives. A confusion matrix of the data set for an Ada boost classifier is shown in Fig. 9. The Ada boost classifier has TPs, TNs, FPs, and FNs of 505, 516, 13, and 27, respectively. For the LGBM classifier, there are 518 True Positives, 530 True Negatives, 0 False Positives, and 13 False Negatives. Figure 10 shows the model’s ROC, accuracy, and recall curves. The ratio of areas in the model that were properly predicted to the entire data set is used to calculate the accuracy value. True negative, True positive, false negative and false positive are each represented by TN, TP, FN, and FP, respectively. True positive
156
Fig.8 Confusion matrix a combined b XGB c LGBM
Fig. 9 Confusion matrix a ada boost b GBM
A. B. Godbin and S. G. Jasmine
A Machine Learning Based Approach for Diagnosing Pneumonia …
157
Fig. 10 a ROC curve b precision recall curve
displays the event values that were accurately predicted. Event values that were incorrectly forecasted as false positives. True negative is used for successfully expected noevent values. False negative shows improperly foreseen non event values. Confusion matrix-based performance validation expressions must be identical. Performance evaluations include accuracy, precision, F1-Score, sensitivity, specificity, error rate, and log-loss.
158
A. B. Godbin and S. G. Jasmine
4 Conclusion In this work, feature extraction based on machine learning is used to automatically distinguish between images with pneumonia and unaffected images. Two classes were used to train the models. Utilizing Accuracy, Sensitivity, and Specificity, the models were assessed. The main objective of this study is to detect pneumonia using CXR pictures. A key objective is to assess the impact of feature extraction methods on classification accuracy. However, the older systems, which were built using sophisticated approaches and characteristics, had gray level intensity in the medical images. Therefore, intensity-based features make it possible to investigate important aspects of images. Utilizing this approach, the diagnosis must be made quickly. For the suggested technique, the data set was assembled from numerous articles and collected using different CT modalities. GLCM, which has 96.2% accuracy for the Ada boost classifier, 98.77% accuracy for the XGB, GBM, and LGBM classifiers, and 98.5% accuracy for all three. Each GLCM or GLRLM characteristic is separately calculated for each of the four directions of 0, 45, 90, and 135 degrees. They do not therefore need an angle-based experiment. This study’s key finding is that qualities with a gray level are more practical. For GLCM features, we get over good accuracy with XGB, GBM, and LGBM classifiers. Ada boost accuracy score of 96.2% was lower compared to other mod. These results were obtained using the reliable and robust tenfold CV. Another important discovery is the conclusion of the classifier in this investigation. Comparing Ada boost classifiers against XGB, GBM, and LGBM classifiers reveals poor performance. The GBM and XGB classifier delivers the best outcomes for the majority of feature vectors. The results of LGBM are very important. The GBM, XGB, and the LBGM classifiers should therefore be used to categorize pneumonia in this study’s readers. Future research will test our technique against a variety of different datasets and improve its accuracy to diagnose pneumonia disease more quickly and precisely utilizing CXT and CT images. On the basis of this research, studies using various models can be conducted to identify various disorders.
References 1. Singh N, Sharma R, Kukker A (2019) Wavelet transform based pneumonia classification of chest X-Ray Images. In: International Conference on Computing, Power and Communication Technologies (GUCON); New Delhi,India, pp 540–545 2. Irfan A, Adivishnu AL, Sze-To A, Dehkharghanian T, Rahnamayan S, Tizhoosh HR (2020) Classifying pneumonia among chest X-Rays using transfer learning. In: 42nd International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, Canada, pp 2186–2189 3. Mubarok AF, Dominique JAM, This AH (2019) Pneumonia detection with deep convolutional architecture. In: International Conference of Artificial Intelligence and Information Technology (ICAIIT), Yogyakarta, Indonesia, pp 486–489
A Machine Learning Based Approach for Diagnosing Pneumonia …
159
4. Sharma H, Jain JS, Bansal P, Gupta S (2020) Feature extraction and classification of chest X-Ray images using CNN to detect pneumonia. In: 10th International Conference on Cloud Computing, Data Science & Engineering, Noida, India, pp 227–231 5. Save the children fighting for breath—A call to action on childhood pneumonia:Save the Children 1 stJohn’s Lane; 2017 [cited05.06.2021]. p 83 Availablefrom:https://www.savethech ildren.org.uk/content/dam/global/reports/health-and-nutrition/fighting-for-breath-low-res.pdf 6. Jain R, Nagrath P, Kataria G, Kaushik VS, Hemanth DJ (2020) Pneumonia detection in chest X-ray images using convolutional neural networks and transfer learning. Measurement 165:1– 10 7. Yu X, Wang SH, Zhang YD (2021) CGNet: A graph-knowledge embedded convolutional neural network for detection of pneumonia. Inf Process Manag 58(1):1–25 8. Li Y, Zhang Z, Dai C, Dong Q, Badrigilan S (2020) Accuracy of deep learning for automated detection of pneumonia using chest X-Ray images: A systematic review and meta-analysis. Comput Biol Med 123:1–8 9. Ge Y, Wang Q, Wang L, Wu H, Peng C, Wang J, Xu Y, Xiong G, Zhang Y, Yi Y (2019) Predicting post-stroke pneumonia using deep neural network approaches. Int.J. Med. Inform. 132:1–8. Postalcioglu, S.15 Brazilian Archives of Biology and Technology. Vol.65: e22210322, 2022 www.scielo.br/babt 10. Chassagnon G, Vakalopoulou M, Battistella E, Christodoulidis S, Hoang-Thi TN, Dangeard S et al (2021) AI-driven quantification, staging and outcome prediction of COVID-19 pneumonia. Med Image Anal 67:1–16 11. Wang SH, Nayak DR, Guttery DS, Zhang X, Zhang YD (2021) COVID-19 classification by CCSHNet with deep fusion using transfer learning and discriminant correlation analysis. Inf Fusion. 68:131–148 12. Shuihua W, Xiaosheng W, Yu-Dong Z, Chaosheng T, Xin Z (2020) Diagnosis of COVID-19 by wavelet renyi entropy and three-segment biogeography-based optimization. Int J Comput Intell 13(1):1332–1344 13. Postalcıo˘glu S, Ke¸sli A (2020) Diagnosis of pneumonia by naive bayes method. In: 3rd International Conference on Data Science and Applications (ICONDATA’20), 25–28; Istanbul, Turkey, pp 208–211 14. Ramezanpour A, Beam AL, Chen JH, Mashaghi A (2020) Statistical physics for medical diagnostics: learning, inference, and optimization algorithms. Diagnostics 10(11):1–16 15. Khan MA, Ashraf I, Alhaisoni M, Damaševiˇcius R, Scherer R, Rehman A, Bukhari SAC (2020) Multimodal brain tumor classification using deep learning and robust feature selection: a machine learning application for radiologists. Diagnostics 10(8):1–19 16. Galván-Tejada CE, Zanella-Calzada LA, Galván-Tejada JI, Celaya-Padilla JM, GamboaRosales H, Garza-Veloz I, Martinez-Fierro ML (2017) Multivariate feature selection of image descriptors data for breast cancer with computerassisted diagnosis. Diagnostics 7(1):1–17 17. Rahman S, Irfan M, Raza M, Moyeezullah KG, Yaqoob S, Awais M (2020) Performance analysis of boosting classifiers in recognizing activities of daily living. Int J Environ Res Public Health 17(3):1–15 18. Rocca J, Ensemble methods: bagging, boosting and stacking, [Internet]. [cited 05.06.2021]. Available from: https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-sta cking-c9214a10a205 19. Devhunter. Gradient Boosting. Available from:https://devhunteryz.wordpress.com/2018/07/ 11/gradyan-arttirmagradient-boosting 20. Binder H, Gefeller O, Schmid M, Mayr A (2014) The evolution of boosting algorithms. Methods Inf Med 53(6):419–427 21. Chestx-raypneumonia, (2020). Available from:https://www.kaggle.com/paultimothymooney/ chest-xray-pneumonia 22. Rorasa. l0 norm, l1 norm, l2 norm, l infinity norm, [Internet]. [cited 01.11.2020] Available from:https://rorasa.wordpress.com/2012/05/13/l0-norm-l1norm-l2-norm-l-infinity-norm 23. Abdullahi A, Raheem L, Muhammed M, Rabiat OM, Saheed AG (2020) Comparison of the catboost classifier with other machine learning methods. Int J Adv Comput (IJACSA). 11(11):738–748
160
A. B. Godbin and S. G. Jasmine
24. Reif D, Alison M, Mckinney B, Crowe J, Moore J (2006) Feature selection using a random forests classifier for the integrated analysis of multiple data types. In: EEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology; 28–29. Canada; pp 1–8. https://doi.org/10.1109/CIBCB.2006.330987. 25. Alexey N, Knoll A (2013) Gradient boosting machines. A Tutorial. Frontiers in Neurorobotics. 7:1–21 26. Logistic regression: loss and regularization, [Internet]. [cited 05.06.2021] Available from:https://developers.google.com/machine-learning/crash-course/logistic-regression/ model-training 27. Muratlar ER. LightGBM. [Internet]. [cited 06.12.2020]. Available from: https://www.veribi limiokulu.com/lightgbm/ 28. Minastireanu E, Mesnita G (2019) Light GBM machine learning algorithm to online click fraud detection. Journal of Information Assurance & Cybersecurity. 2019:1–12 29. Gumus M, Kiran MS (2017) Crude oil price forecasting using XGBoost. In: International Conference on Computer Science and Engineering (UBMK). Antalya, pp 1100–1103. https:// doi.org/10.1109/UBMK.2017.8093500 30. Wang Y, Guo Y (2020) Forecasting method of stock market volatility in time series data based on mixed model of ARIMA and XGBoost. China Communications. 17(3):205–221 31. Long J, Yan Z, Shen Y, Liu W, Wei Q (2018) Detection of epilepsy using MFCC-Based feature and XGBoost. In: 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISPBMEI). Beijing, China; pp 1–4. https://doi.org/10.1109/ CISP-BMEI.2018.8633051. 32. Liao X, Cao N, Li M, Kang X (2019) Research on Short-Term load forecasting using XGBoost based on similar days. In: International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS). Changsha, China; pp 675–678. https://doi.org/10.1109/ICITBS.2019. 00167 33. Dorogush A, Ershovc V, Gulin A (2017) CatBoost: Gradient boosting with categorical features support. Proc. Workshop ML Syst. Neural Inf. Process. Syst. (NIPS). [cited 01.12.2020]; pp 1–7. Available from:https://arxiv.org/pdf/1810.11363.pdf 34. Godbin A, Beena, Graceline Jasmine S (2023) Screening of COVID-19 based on GLCM features from CT images using machine learning classifiers." SN Computer Science 4.2 1–11 35. Godbin A, Beena S, Graceline Jasmine (2022) Analysis of pneumonia detection systems using deep learning-based approach. In: 2022 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES). IEEE
Harnessing the Power of ChatGPT for Mastering the Maltese Language: A Journey of Breaking Barriers and Charting New Paths ˙ Jacqueline Zammit
Abstract The language model, Chat Generative Pretrained Transformer (ChatGPT), which is a cutting-edge artificial intelligence (AI) technology developed by OpenAI, employs deep learning algorithms to generate human-like text. Despite its recent popularity, ChatGPT is expected to play an increasingly important role in shaping the future of technology and AI. This study used a mixed methods approach to assess the effectiveness of ChatGPT in facilitating the learning of Maltese for 41 international multilingual adult students and to explore their experiences and perceptions of using ChatGPT to learn Maltese. The participants used ChatGPT for a period of two weeks to support their Maltese learning and completed a survey to evaluate their experience and the usefulness of the tool in the Maltese learning process. An online focus group approach was employed, in which the participants discussed their use of ChatGPT for Maltese learning. According to the findings, ChatGPT had some limitations when it came to providing accurate information related to Maltese grammar and vocabulary, and sometimes produced non-existent words or incorrect answers. Additionally, the tool struggled to comprehend and respond to Maltese questions and statements. Participants also noted that ChatGPT had limitations in supporting grammar, vocabulary, creating dialogues or stories, translation, summarization, proofreading, and conversational practice, especially when compared to its effectiveness in assisting with the learning of the English language. Despite 98% of the participants indicating that ChatGPT lacked cultural context in the survey, a Palestinian participant was able to persuade others during the focus group session by demonstrating how the tool was helpful in understanding Malta’s cultural context. Furthermore, ChatGPT does not provide immediate feedback to allow Maltese students to quickly correct errors and improve their language skills. The study highlights the need for improvement in the training of ChatGPT on the Maltese language, as well as collaboration with Maltese language and AI experts, to better meet the needs of Maltese students. The findings have important implications for the design and implementation of language learning
˙ J. Zammit (B) University of Malta, Msida, Malta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Adadi and S. Motahhir (eds.), Machine Intelligence for Smart Applications, Studies in Computational Intelligence 1105, https://doi.org/10.1007/978-3-031-37454-8_8
161
162
˙ J. Zammit
programs and for the development and deployment of language learning tools for less widely spoken languages like Maltese. Keywords ChatGPT · Maltese language learning · Artificial intelligence (AI) · Computer-assisted language learning (CALL) · Student experience and perception · Natural language processing (NLP)
1 Introduction Chat Generative Pretrained Transformer (ChatGPT) is an example of artificial intelligence (AI) and machine learning technology that is rapidly changing the way we interact with technology. It is being utilized to improve natural language processing (NLP) systems, allowing them to understand human speech and interact with humans. AI is also being used to create intelligent personal assistants and deep learning replicas with the potential to improve machine learning models’ accuracy. In the field of education, AI has the potential to revolutionize the teaching and learning process. However, it is important to consider the ethical implications of AI use as well. Overall, AI and machine learning technologies such as ChatGPT are becoming increasingly popular and have the potential to transform various aspects of our lives, including the way we learn foreign languages.
1.1 Background Information Maltese is a language spoken by approximately 500,000 people on the Mediterranean island of Malta. This language has a long history of being used by the Maltese people to communicate with each other. Despite its rich linguistic heritage and wide usage in the small island of Malta, Maltese is a minority language, and its usage is slowly declining by the young generation. Furthermore, there is a lack of digital resources for teaching and learning the Maltese language [2]. Current advancements in AI and NLP are playing a crucial role in the development of language learning applications. NLP enables computers to understand and manipulate human language through techniques such as machine translation, cognitive dialogues, information retrieval, and natural language generation. Bidirectional Encoder Representations from Transformers (BERT) and Generative Pretrained Transformer (GPT), which are based on the Transformer architecture, are key methods employed in NLP and have the ability to improve the accuracy of NLP tasks. ChatGPT, a deep learning-based NLP system, has been touted as a potential tool for language acquisition. As a powerful tool for understanding and translating natural languages, ChatGPT has been employed in various applications, including language translation and language learning [2]. ChatGPT is a chatbot powered by the GPT-3 language model and can provide conversational practice for language
Harnessing the Power of ChatGPT for Mastering the Maltese Language …
163
students, assess their ability to use the language, and provide targeted feedback. AI models can also analyze large amounts of text data in a target language to help language learners build their proficiency. As a result, AI provides a personalized, interactive, and efficient language learning experience that can lead to improved language acquisition outcomes.
1.2 Statement of the Problem In Malta, the popularity of distance education for teaching Maltese to international adults is growing due to its convenience and cost-effectiveness [20]. However, educators face a significant challenge in maintaining the same level of instructional quality as in a traditional classroom setting when teaching Maltese remotely [19]. This is because distance education requires different materials and techniques, which can be challenging to manage and evaluate [13]. To overcome this challenge, the integration of AI and machine learning, such as ChatGPT, into the Maltese distance education curriculum is necessary to improve teaching and learning quality. The objective of this study is to determine if and how ChatGPT can effectively teach Maltese to international adults through distance education. This study seeks to answer the following research question: “Can ChatGPT improve the learning of Maltese as a foreign language?”.
1.3 The Aims of the Study The primary objective of this study is to assess the capability of AI and machine learning technology, specifically ChatGPT, in fostering the acquisition of the Maltese language. The research will evaluate the potential benefits and challenges of incorporating ChatGPT into Maltese language learning, and its impact on the overall language-learning experience of adults. Additionally, the study will outline a framework for developing a ChatGPT-based Maltese language-learning system and analyze the text generated by ChatGPT. The ultimate goal of this study is to determine the role that ChatGPT can play in advancing Maltese language learning and breaking down language barriers.
2 Literature Review AI and machine learning have seen a surge of growth in their applications for language processing. In particular, the development of large language models like ChatGPT has opened new avenues for mastering foreign languages. However, while ChatGPT is a cutting-edge tool for learning languages, there has been a significant amount of
164
˙ J. Zammit
debate among scholars about the benefits and disadvantages of using it for language learning, despite its recent development. AI and machine learning can aid in comprehending the structure of a language and enhancing the precision of natural language processing (NLP) tasks. The usage of AIpowered models can lead to accurate identification of sentence boundaries, spelling mistakes, and words with multiple meanings [5]. This can improve the performance of applications such as text-to-speech and machine translation [10]. AI can also refine the accuracy of automatic text summarization, generating a brief summary of the text through NLP techniques [12]. AI can identify crucial keywords and phrases and provide a deeper understanding of the text’s composition. In the future, machine learning and AI could lead to the development of advanced applications like sentiment analysis, which categorizes a text’s sentiment automatically [5]. This could be used to determine the text’s tone, like positive or negative, and to detect hate speech and abuse in social media comments. Shen et al. [17] aimed to evaluate the potential advantages and drawbacks of using large language models, such as ChatGPT, for NLP applications. The researchers conducted a systematic review of the available literature, focusing on the most recent studies, to understand the impact of these models. The review showed that while ChatGPT and other large language models offer a higher level of accuracy compared to traditional methods, they also come with several risks. These include the possibility of bias in the models, difficulty in interpreting their outputs, and susceptibility to adversarial attacks, which may limit their practical usefulness. The researchers concluded that large language models are a double-edged sword because they have both benefits and drawbacks, and therefore it is crucial to weigh the risks before implementing them for NLP applications. Moore [12] explores Kaplan’s discussion on AI, covering its history, current state, and changes over time. The research findings demonstrate that AI has advanced significantly since its inception, with an increase in its capabilities and applications. AI has transformed from a novel concept to a tool that can assist in decision-making and even perform tasks such as translation, proofreading, and summarizing texts, with the ability to surpass human effort. This research provides a comprehensive overview of AI’s development and its impact on society. AI has the potential to revolutionize various fields, from predictive analysis to automation. However, it is crucial to consider the ethical implications of AI to ensure its responsible use. As AI continues to evolve, it is important to stay informed about its advancements to make the most of its potential. Baidoo-Anu and Owusu Ansah [3] showed that ChatGPT could improve the quality of education by providing more personalized learning experiences. It could also increase student engagement, reduce the workload of educators, and enable the delivery of personalized educational materials. ChatGPT can also provide language learners with access to a large and diverse corpus of language data. This data can help learners build a comprehensive understanding of the language they are learning and also improve their writing skills. ChatGPT can also provide personalized feedback to learners, enabling them to receive instant feedback on their language use, which can improve their learning efficiency [14].
Harnessing the Power of ChatGPT for Mastering the Maltese Language …
165
ChatGPT has the potential to enhance the learning experience by providing an engaging and interactive platform for learners. This can motivate and encourage language learners to continue practicing their language skills [10]. It can help learners develop their language skills in a more natural and intuitive way, which can be especially beneficial for those who are learning a language for the first time [10]. Biswas’ [4] study found that using ChatGPT helped respondents save time and increase productivity, leading to improved writing quality. Study participants believed that ChatGPT could be used to generate ideas and create initial drafts, although they still needed to edit and revise to produce high-quality documents. The study concluded that most medical writers feel that ChatGPT is not yet capable of replacing traditional medical writing methods. Jiao et al. [8] conducted a preliminary evaluation of the quality of ChatGPT as a neural machine translation system. They evaluated its performance using a ChineseEnglish translation task and found that ChatGPT performed well in comparison to other translation systems, with a high translation accuracy. Topsakal and Topsakal [18] conducted a study on the development of a foreign language teaching software for children using AR, Voicebots, and ChatGPT. They compared the performance of students using their proposed framework versus a traditional language teaching method in an experimental study that included two groups of 20 students each. The students were assessed on their language proficiency through listening, reading, writing, and speaking tasks. The results showed that students using the proposed framework performed significantly better than those using the traditional method. The research also included a qualitative analysis of the students’ experience, which revealed that they felt more engaged, motivated, confident, and believed the framework was effective in helping them learn. Despite its advantages, there are also some disadvantages to using ChatGPT for language learning. Hwang and Chang [7] pointed out that the language generated by ChatGPT may not always be accurate or appropriate for language learners, resulting in learners receiving incorrect information or being exposed to language that is not suited to their proficiency level. Additionally, ChatGPT may not provide learners with enough context or support to fully understand the language they are learning [7], making it difficult for learners to integrate the language into their own experiences and contexts. Another disadvantage of using ChatGPT for language learning is the difficulty in assessing the quality of the generated language. According to Frances and Zimotti [6], the language generated by ChatGPT may be less sophisticated compared to that generated by human teachers or language tutors. This could lead to difficulties for learners in understanding and applying the language they are learning, potentially resulting in misunderstandings or incorrect usage [6]. The use of ChatGPT for language learning has been widely discussed among scholars [1]. Despite the advantages, such as access to a vast language data corpus, personalized feedback, and an interactive and engaging platform, there are also drawbacks to consider, including the potential for inaccuracies in the generated language, a lack of cultural context and support, and challenges in assessing the quality of the generated language [1]. Hence, language learners must carefully evaluate the
166
˙ J. Zammit
advantages and disadvantages of using ChatGPT for language learning and make an informed decision based on their specific needs and objectives. In summary, recent research has explored the potential of ChatGPT for improving language mastery. Several studies, such as those by Jiao et al. [8], Kasneci et al. [10], and Li et al. 11], have examined both the benefits and challenges of using ChatGPT for language learning and translation. Other studies, such as Shen et al. [17], found that large language models can be used to spread misinformation, Jiao et al. [8] found that ChatGPT is less effective than human translators; and Baidoo-Anu and Wusu Ansah [3] pointed out that there are ethical considerations to keep in mind when utilizing ChatGPT for teaching and learning. Therefore, it is important to carefully consider the benefits and challenges of using ChatGPT in language learning before deciding to utilize it, particularly in learning the Maltese language.
3 Methodology The current study employed purposive sampling to analyze the perspectives and experiences of 41 international students who were studying Maltese in individual or group evening courses at an intermediate level (Level B1) and used ChatGPT for two weeks. All participants were adults between the ages of 21 and 76, multilingual, and from diverse countries such as Morocco (4 participants), India (4 participants), Pakistan (5 participants), Serbia (4 participants), Russia (4 participants), Ukraine (3 participants), Somalia (4 participants), Kenya (3 participants), Libya (5 participants), Venezuela (3 participants), and Colombia (2 participants). After the study was approved by the University Research Ethics Committee (UREC) at the University of Malta, the participants were selected and were asked to use ChatGPT for two weeks before data collection. The mixed methods approach was employed for analysis of the data. This approach offered the benefits of both qualitative and quantitative methods, resulting in a deeper and more nuanced understanding of the research problem [15]. The first stage of the study involved collecting quantitative data. After the twoweek period of using ChatGPT, the participants were asked to evaluate ChatGPT’s effectiveness in helping them learn Maltese through a survey. The survey consisted of structured questions designed to gauge the participants’ overall satisfaction with ChatGPT, its comprehension and response capabilities in Maltese, and the quality of its output. This data provided a systematic and standardized way of measuring the participants’ attitudes, experiences, and opinions, and identifying patterns and trends. In the second stage, qualitative data was collected through an online focus group discussion via Zoom with the participants to obtain more in-depth feedback and insights. This data provided rich and detailed information about the participants’ experiences and perspectives on using ChatGPT to learn Maltese.
Harnessing the Power of ChatGPT for Mastering the Maltese Language …
167
Both sets of data were analyzed and compared using SPSS statistical tests and NVivo, allowing for triangulation of the results and a more comprehensive understanding of the research problem [15]. The mixed methods approach provided a thorough examination of the complex relationships between the variables of interest, experience, and the use of ChatGPT in learning Maltese, and offered a more nuanced understanding than either qualitative or quantitative methods alone could have provided.
4 Analysis of Results and Discussion The results of the analysis revealed that all participants had negative experiences with ChatGPT and did not have a positive attitude towards using it as a tool for learning Maltese compared to their experiences learning other languages, especially English.
4.1 ChatGPT’s Ineffectiveness for Maltese Learning To assess the effectiveness of ChatGPT in helping participants learn Maltese, a twotailed t-test was conducted, as presented in Table 1. The t-test compared the participants’ average ratings on the survey questions related to the helpfulness of ChatGPT in learning Maltese and the frequency of their use of ChatGPT for learning Maltese. The results of the t-test showed a statistically significant difference in the mean ratings, with a t-score of −3.83 and a p-value of 0 and less than or equal to 1. • Colsample_bytree: is the ratio of subsample columns when building each tree. The default value is 0 and a range of (0, 1).
3 Data Collection Methodology 3.1 Data Collection Process An efficient and sufficient data set is the key to a successful machine learning problem, and for this purpose, we followed a careful procedure for collecting training data. This procedure can be illustrated in the following 4 steps (Fig. 4).
186
J. Abdouni et al.
Fig. 4 Data collection process
3.1.1
Solving the Continuous Problem with the A-RRT* Method
This step involves finding a path, i.e., a series of points between the start point and the goal point. Several research works have used traditional algorithms such as DISKTRA [24], and A* [25] to train their robots, however, these last ones present an important computational cost, which makes the execution very slow, whereas the algorithms based on sampling are known for their capacity to solve complex and high dimensional problems. In our study, we have chosen the A-RRT* method because of its rapidity and efficiency.
3.1.2
Discretization of Space and Continuous Path
The path planning algorithm used consists of treating problems in continuous environments, then, to adapt the path to the configurations of our robot, we must discretize the space, Grid MAP is the most adapted to this type of situation.
3.1.3
Optimization of the Path
After discretizing the path, the result obtained is not always optimal and presents unnecessary actions to be performed by the robot, as represented in Fig. 5. Therefore an optimization step is necessary to eliminate unnecessary actions to allow the robot to imitate an efficient movement.
3.1.4
Data Extraction
In this section, we treat each position to extract the distance between the robot and the obstacles, in addition to the coordination of the goal point regarding the robot as explained above. And through the next position, we can conclude the action related to the current state. The objective is to form a state-action pair that is used to train the robot.
A New Autonomous Navigation System of a Mobile Robot. . . .
187
Fig. 5 Optimization of the path
3.2 Data-Set Generation For the generation of the data set, we have built three types of environments of dimension (530, 530), type A, B, and C classified according to their difficulty degree (Fig. 6).
3.2.1
Environment Type . A
Contains 70 square obstacles of size 20 generated randomly.
3.2.2
Environment Type . B
Contains 70 rectangular obstacles, whose length and width are randomly chosen between {20, 30}, and whose position is randomly generated at each environment’s construction.
Fig. 6 Simulation environments
188
3.2.3
J. Abdouni et al.
Environment Type .C
Contains 70 rectangular obstacles, whose length and width are randomly chosen between {20, 30, 40}, and whose position is randomly generated at each environment’s construction. In each environment, we followed the process we described previously for collecting the data set, the coordinates of the starting point and the ending point are changed each time to cover all directions, this action was repeated 200 times in each type of environment, which allowed us to collect about 265194 action-state vector lines.
4 System Design 4.1 Preprocessing and Model Selection Like any classification problem, the pre-processing and model selection step is an essential step to obtaining a good prediction score. To do this, we tested several classification models to choose the most appropriate one for our data, and the results are presented in the Table 1. All the models are tested with their default parameters, and to properly evaluate the performance, we opted for the k-fold cross-validation method with cv = 5. We decided to process the data separately to improve the performance of our navigation system and to work with the XGCB algorithm as it is the most adequate for our collected data. The proposed system is presented in Fig. 7.
Table 1 Accurancy score of classifiers LR KNN Classifiers Accuracy Classifiers Accuracy
0.689 SGDC 0.460
0.713 DT 0.698
Fig. 7 Prediction model architecture
SVC
LSVC
MLPC
0.729 RF 0.751
0.340 XGBC 0.796
0.772 ADABOST 0.603
A New Autonomous Navigation System of a Mobile Robot. . . .
189
And to maximize performance, we tested different combinations of parameters using the GridSearchCV method, this allowed us to achieve 82% of correct predictions with the following parameters: • • • • •
’xgbclassifier__subsample’: 0.8, ’xgbclassifier__min_child_weight’: 3, ’xgbclassifier__max_depth’: 7, ’xgbclassifier__gamma’: 2.5, ’xgbclassifier__colsample_bytree’: 0.4.
4.2 Model Corrector Although our prediction model has a score of 82%, it is still not enough for our navigation system because it presents 18% of wrong predictions that lead us to actions that present a risk of collision or to repeated actions that let the robot fall into infinite loops and not reach the goal. To address this problem, we added an action correction model, presented in Fig. 8. This model takes as inputs the probabilities of the 9 possible actions extracted from the prediction model. The principle of this corrector is to check if the action with the highest probability does not present any risk of collision and does not send the robot back to a position already visited. If this action verifies these conditions, the model sends it as an output, otherwise, it goes to the next action (the action with the highest probability after the first action). If no action satisfies these two actions, the corrector resumes the verification only with the non-collision condition. The corrector model flowchart is shown in Fig. 9.
Fig. 8 Model corrector
190
J. Abdouni et al.
Fig. 9 Action corrector flowchart
5 Simulation and Results To verify the effectiveness of the proposed system, several simulations were performed in the three types of environments (400 times in each type). The simulation was executed on a PC with an AMD Ryzen 5 5600G processor at 3.9 GHz and 16 GB of internal RAM, and the program code was written in Python. Some examples of navigation simulation using our approach are shown in Figs. 10, 11, and 12. From the Figs. 10, 11 and 12, we can see that the proposed system successfully achieved the objective in most cases. The test results are presented in Tables 2 and 3. Tables 1 and 2 show that the action correction model achieved a success rate that exceeded 95%, which proves the effectiveness of the proposed navigation system.
Fig. 10 An overview of simulation in environment A
A New Autonomous Navigation System of a Mobile Robot. . . .
191
Fig. 11 An overview of simulation in environment B
Fig. 12 An overview of simulation in environment C Table 2 The simulation results of the navigation system without action corrector Type A Type B Type C Number of environments Successful environments Failed envirnoments The success rate
400
400
400
353
317
286
47 82.25%
83 79.25%
114 71.50%
Table 3 the simulation results of the navigation system with the action corrector Type A Type B Type C Number of environments Successful environments Failed envirnoments The success rate
400
400
400
400
396
383
0 100%
4 99.00%
17 95.75%
192
J. Abdouni et al.
6 Conclusion In this article, we propose an autonomous navigation system composed of a prediction model and an action corrector. The system was implemented on an omnidirectional robot, and to assess performance, the robot was trained and tested in three different types of environments. The results showed high efficiency and allowed the robot to successfully complete more than 95% of navigation missions and reach the target. In the future, we will focus more on the implementation of other types of sensors, such as cameras, and test the principle of this system on other types of robots, such as autonomous cars. Author Contributions All authors contributed to the conception and design of the study. Data collection and analysis were conducted by Jawad Abdouni. The first draft of the manuscript was written by Jawad Abdouni and all authors commented on earlier versions of the manuscript. The final manuscript was read and approved by all authors. Data Availability The robot training data collected by the procedure described in this article can be viewed at: https://doi.org/10.6084/m9.figshare.21621618.v1. Code Availability The code is available at https://github.com/jawadabdouni/imitationlearning-navigation-.git. Declarations Conflict of Interests The authors declare that they do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
References 1. Chen G, Pan L, Xu P, Wang Z, Wu P, Ji J, Chen X (2020) Robot navigation with map-based deep reinforcement learning. In: 2020 IEEE international conference on networking, sensing and control (ICNSC). IEEE, pp 1–6 2. Rosique F, Navarro PJ, Fernández C, Padilla A (2019) A systematic review of perception system and simulators for autonomous vehicles research. Sensors 19(3):648. https://doi.org/10.3390/ s19030648 3. Sugihara K (1988) Some location problems for robot navigation using a single camera. Comput Vis Graph Image Process 112–129 4. Lluvia I, Lazkano E, Ansuategi A (2021) Active mapping and robot exploration: a survey. Sensors 21(7):2445. https://doi.org/10.3390/s21072445 5. Ying Y, Li Z, Ruihong G, Yisa H, Haiyan T, Junxi M (2019) Path planning of mobile robot based on improved RRT algorithm. In: 2019 Chinese automation congress (CAC). IEEE, pp 4741–4746 6. González D, Pérez J, Milanés V, Nashashibi F (2016) A review of motion planning techniques for automated vehicles. IEEE Trans Intell Transp Syst 17(4):1135–1145. https://doi.org/10. 1109/TITS.2015.2498841
A New Autonomous Navigation System of a Mobile Robot. . . .
193
ˇ M, Yong SZ, Yershov D, Frazzoli E (2016) A survey of motion planning and 7. Paden B, Cáp control techniques for self-driving urban vehicles. IEEE Trans Intell Veh 1(1):33–55. https:// doi.org/10.1109/TIV.2016.2578706 8. Waga A, Lamini C, Benhlima S, Bekri A (2021) Fuzzy logic obstacle avoidance by a NAO robot in unknown environment. In: Fifth international conference on intelligent computing in data sciences (ICDS). Fez, Morocco, pp 1–7. https://doi.org/10.1109/ICDS53782.2021.9626718 9. Xiao X, Liu B, Warnell G et al (2022) Motion planning and control for mobile robot navigation using machine learning: a survey. Auton Robot 46:569–597. https://doi.org/10.1007/s10514022-10039-8 10. Demidova K, Logichev M, Zhilenkova E, Dang B (2020) Autonomous navigation algorithms based on cognitive technologies and machine learning. In: IEEE conference of russian young researchers in electrical and electronic engineering (EIConRus), vol 2020, pp 280–283. https:// doi.org/10.1109/EIConRus49466.2020.9039465 11. Yonetani R, Taniai T, Barekatain M, Nishimura M, Kanezaki A (2021) Path planning using neural A* search. In: Proceedings of the 38th international conference on machine learning (ICML), PMLR 139, pp 12029–12039 12. Cèsar-Tondreau B, Warnell G, Stump E, Kochersberger K, Waytowich NR (2021) Improving autonomous robotic navigation using imitation learning. Front Robot AI 8:627730. https://doi. org/10.3389/frobt.2021.627730. PMID: 34141727; PMCID: PMC8204187 13. Kishore A, Choe TE, Kwon J, Park M, Hao P, Mittel A (2021) Synthetic data generation using imitation training. In: IEEE/CVF international conference on computer vision workshops (ICCVW), vol 2021, pp 3071–3079. https://doi.org/10.1109/ICCVW54120.2021.00342 14. Liu B, Xiao X, Stone P (2021) A lifelong learning approach to mobile robot navigation. IEEE Robot Autom Lett 6(2):1090–1096. https://doi.org/10.1109/LRA.2021.3056373 15. Tsai C-Y, Nisar H, Hu Y-C (2021) Mapless LiDAR navigation control of wheeled mobile robots based on deep imitation learning. IEEE Access 9:117527–117541. https://doi.org/10. 1109/ACCESS.2021.3107041 16. Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: a survey of learning methods. ACM Comput Surv (CSUR) 50(2):1–35 17. Qiang L, Nanxun D, Huican L, Heng W (2018) A model-free mapless navigation method for mobile robot using reinforcement learning. In: 2018 Chinese control and decision conference (CCDC). IEEE, pp 3410–3415 18. Zuo B, Chen J, Wang L, Wang Y (2014) A reinforcement learning based robotic navigation system. In: 2014 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 3452–3457 19. Guo S, Zhang X, Zheng Y, Du Y (2020) An autonomous path planning model for unmanned ships based on deep reinforcement learning. Sensors 20(2):426. https://doi.org/10.3390/ s20020426 20. Jawad A, Tarik J, Abderrahim W, Idrissi SE, Meryem EM, Ihssane S (2022) A new sampling strategy to improve the performance of mobile robot path planning algorithms. In: International conference on intelligent systems and computer vision (ISCV), vol 2022, pp 1–7. https://doi. org/10.1109/ISCV54655.2022.9806128 21. Moravec H, Elfes AE (1984) Cartes haute résolution á partir d’un sonar grand angle. In: Procédure. 1985 Conférence internationale IEEE sur la robotique et l’automatisation. IEEE Computer Society Press, Silver Spring, Missouri, pp 116–121. https://doi.org/10.1109/ROBOT. 1985.1087316. S2CID 41852334 22. Zhang J, Wang X, Xu L, Zhang X (2022) An occupancy information grid model for path planning of intelligent robots. ISPRS Int J Geo-Inform 11(4):231. https://doi.org/10.3390/ ijgi11040231 23. Noreen I, Khan A, Habib Z (2016) Optimal path planning using RRT* based approaches: a survey and future directions. Int J Adv Comput Sci Appl 7(11) 24. Huijuan W, Yu Y, Quanbo Y (2011) Application of Dijkstra algorithm in robot path-planning. In: Second international conference on mechanic automation and control engineering, vol 2011, pp 1067–1069. https://doi.org/10.1109/MACE.2011.5987118
194
J. Abdouni et al.
25. Guruji AK, Agarwal H, Parsediya DK (2016) Time-efficient A* algorithm for robot path planning. Procedia Technol 23:144–149
Jawad Abdouni is a Ph.D. student at the Advance Systems Engineering Laboratory of the National School of Applied Sciences of Ibn Tofail University. He obtained his engineering degree in electromechanics from the Ecole Nationale Supérieure des Mines in Rabat in 2017. His field of research mainly focused on trajectory planning algorithms in autonomous navigation systems. Tarik Jarou is University Professor at the National School of the Sciences Applied of the Ibn Tofail University , Kenitra, Morocco. He received Doctorat dregree (2008) in Electric Engineering from the Engineer School Mohammedia of the Mohamed V University, Rabat, Morocco. He is a member of the Advanced Systems Engineering Laboratory and Ex-member of LEECMS (Laboratory of Electric Engineering, Computing and Mathematical Sciences) of Ibn Tofail University, Kenitra, Morrocco. His main research area include the modelling, the control electronic and embedded systems for smart electric and cyberphysical system and their application fields in the automotive and aeronautical industry. Abderrahim Waga obtained his baccalaureate in mathematical sciences and after, he started the faculty of sciences ibn Tofail in Kenitra to follow a fundamental license course in mathematics and computer science. The next destination will be the Moulay Ismaïl Faculty of Sciences in Meknes to follow the master’s course in computer networks and embedded systems. He was enrolled in doctoral training within the same faculty in order to study the problem of navigation of mobile robots using Deep Learning techniques. Sofia El Idrissi is a Ph.D. student in the Advance Systems Engineering Laboratory at the National School of Applied Sciences within Ibn Tofail University. She received her engineering degree in embedded systems from the National School of Applied Sciences in 2019. Her field of research is mainly focused on autonomous vehicle’s motion planning and control systems in an urban environment. Younes El Koudia is a Ph.D. student in the Electrical Department at the National School of Applied sciences in Kenitra, Morocco. Degree in Mechanical and automated systems Engineering from the National School of Applied sciences Fez, Morocco 2021. In the same year, he started his PhD within in the Engineering of the Advanced Systems Laboratory, directed by Prof. Tarik JAROU, researching on Self-Driving Cars. Sabah Loumiti holds a master’s degree in electronics from the University of Science of Ibn Tofail University. She is currently studying her Ph.D. degree in IIoT and AI-based predictive maintenance system at the National School of the Sciences Applied of the Ibn Tofail University, Kenitra, Morocco. From December 2019 to present She holds the position of Plant maintenance engineer in the automotive sector.
Evolutionary AI-Based Algorithms for the Optimization of the Tensile Strength of Additively Manufactured Specimens Akshansh Mishra, Vijaykumar S. Jatti, and Shivangi Paliwal
Abstract In this research work, we investigated the optimization of input parameters for additively manufactured specimens fabricated by Fused Deposition Modeling (FDM) to maximize tensile strength. We employed two metaheuristic optimization algorithms, Particle Swarm Optimization (PSO) and Differential Evolution (DE), to determine the optimal input parameters, including infill percentage, layer height, print speed, and extrusion temperature. Additionally, we coupled both PSO and DE algorithms with the XGBoost algorithm, a powerful gradient boosting framework, to assess their performance based on Mean Squared Error (MSE) and Mean Absolute Error (MAE) values. Our study revealed that the MSE and MAE values of the coupled PSO-XGBoost algorithm were lower compared to the DE-XGBoost algorithm, indicating superior performance in finding optimal input parameters. The results suggest that the integration of PSO with XGBoost can provide an effective approach for optimizing FDM-fabricated specimens, leading to improved tensile strength and overall mechanical properties. This research offers valuable insights into the applicability of metaheuristic optimization algorithms in additive manufacturing and highlights the potential benefits of coupling these algorithms with machine learning models for enhanced parameter optimization. The findings contribute to the ongoing development of optimization techniques in additive manufacturing, providing a foundation for future work in this area. Keywords Evolutionary algorithms · Additive manufacturing · Artificial intelligence · Fused deposition modeling
A. Mishra (B) Politecnico Di Milano, Milan, Italy e-mail: [email protected] V. S. Jatti Symbiosis Institute of Technology, Pune, India S. Paliwal University of Kentucky, Kentucky, USA © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Adadi and S. Motahhir (eds.), Machine Intelligence for Smart Applications, Studies in Computational Intelligence 1105, https://doi.org/10.1007/978-3-031-37454-8_10
195
196
A. Mishra et al.
1 Introduction Additive manufacturing, also known as 3D printing, is a process of building parts layer by layer from a digital model. This technology has the potential to revolutionize the manufacturing industry by enabling the production of complex and customized parts at a lower cost and with less waste. However, the process of 3D printing is complex and involves multiple parameters that need to be optimized in order to produce high-quality parts. One of the major areas in which optimization is needed in additive manufacturing is in the selection of process parameters. These parameters include the type and size of the build envelope, the layer thickness, and the print speed, among others. The selection of these parameters can greatly affect the quality of the final printed part, and optimization algorithms have been used to determine the optimal parameter settings. For example, studies have used evolutionary algorithms to optimize the build parameters of 3D-printed parts in order to improve mechanical properties such as strength and toughness [1], genetic algorithm to optimize the build parameters for reducing the internal stress in metal parts [2], other studies also applied various optimization algorithms like Artificial Bee Colony and particle swarm optimization for 3D printing process parameter optimization [3]. Another important area where optimization is needed in additive manufacturing is in the design of the parts themselves. Due to the complexity of the 3D printing process, parts that are designed for traditional manufacturing methods may not be suitable for 3D printing. Optimization algorithms have been used to redesign parts to make them more suitable for 3D printing. This can include the optimization of the geometry of the part, the orientation of the part in the build envelope, and the support structure of the part [4]. Additionally, optimization is also needed in the post-processing of the printed parts, particularly in the area of surface finishing. Surface roughness and accuracy can be affected by post-processing parameters such as polishing time and pressure, the use of abrasives, and temperature. Optimization algorithms have been used to optimize these parameters to improve the surface finish of printed parts [5]. In conclusion, optimization is a crucial aspect of additive manufacturing, as the selection and control of process parameters, the design of the parts, and postprocessing can greatly affect the quality and performance of the final printed parts. Various optimization algorithms such as evolutionary algorithms, genetic algorithm, Artificial Bee Colony, and particle swarm optimization have been used to optimize these parameters and improve the quality of the final parts. In this research work, we have made significant contributions to the field of additive manufacturing, specifically in optimizing the input parameters for Fused Deposition Modeling (FDM) fabricated specimens to maximize tensile strength. Our approach leveraged two metaheuristic optimization algorithms, Particle Swarm Optimization (PSO) and Differential Evolution (DE), to identify the optimal input
Evolutionary AI-Based Algorithms for the Optimization of the Tensile …
197
parameters, namely infill percentage, layer height, print speed, and extrusion temperature. Furthermore, we coupled both PSO and DE algorithms with the XGBoost algorithm to provide an in-depth analysis and comparison of their performance based on Mean Squared Error (MSE) and Mean Absolute Error (MAE) values.
2 Background 2.1 Additive Manufacturing in Industrial Sectors Additive manufacturing (AM), also known as 3D printing, is a rapidly developing technology that has the potential to significantly impact various industrial sectors. By utilizing AM, companies can create complex geometries and customize products, leading to increased efficiency and cost savings. In the aerospace industry, AM is being utilized to create lightweight parts for aircraft, resulting in fuel savings and improved performance. Boeing, for example, has been using AM to produce over 20,000 parts for their 737 and 777 aircraft, including flight deck components and cabin interior parts [6]. This not only reduces the weight of the aircraft but also leads to cost savings by reducing the need for traditional manufacturing methods. In the medical field, AM has become a valuable tool for creating customized prosthetics and implants. It allows for the production of complex geometries that would be difficult or impossible to manufacture using traditional methods [7]. Additionally, AM can also be used to produce surgical instruments and customized implants for patients, leading to improved patient outcomes [8]. In the automotive industry, AM is being used to create lightweight and strong parts for vehicles, resulting in improved fuel efficiency and performance. Local Motors, for example, has used AM to produce an electric vehicle with a body made entirely from 3D printed parts [9]. In the construction industry, AM has the potential to revolutionize the way buildings and infrastructure are constructed. It allows for the creation of complex shapes, which can lead to more efficient use of materials and faster construction times [10].
2.2 Evolutionary Based Algorithms in Manufacturing Evolutionary algorithms are a class of optimization and search algorithms that are inspired by the process of natural evolution. These algorithms are well suited for solving complex optimization problems that are typically characterized by a large number of parameters and multiple local minima. The basic mechanism of evolutionary-based algorithms involves the generation of an initial population of solutions, known as individuals or chromosomes. Each individual is then evaluated according to a fitness function, which measures the
198
A. Mishra et al.
quality of the solution. The individuals with the highest fitness values are then selected to form the next generation. The next generation is formed through the process of reproduction, which involves the combination of genetic information from the selected individuals. The genetic information can be recombined in various ways, such as crossover, in which the genetic information from two individuals is combined to form a new individual, or mutation, in which small random changes are made to the genetic information of an individual. The process of reproduction and selection is then repeated to form the next generation. Over time, the individuals in the population will evolve towards better solutions. The process continues until a satisfactory solution is found or a stopping criterion is met. Evolutionary-based algorithms have been applied to a wide range of optimization problems, such as function optimization, feature selection, and neural network training. They are particularly effective in problems with a large search space and no gradient information available. One of the most popular evolutionary-based algorithms is the genetic algorithm (GA), which was first proposed by John Holland in the 1970s. Genetic algorithms are based on the principles of natural selection and genetic inheritance. They have been applied to a wide range of optimization problems and have been shown to be effective in many cases. Another popular evolutionary-based algorithm is the Particle Swarm Optimization (PSO), which was first proposed by Kennedy and Eberhart in 1995. Particle Swarm Optimization algorithm is a population-based optimization algorithm that is inspired by the social behavior of birds and fish. PSO has been applied to a wide range of optimization problems and has been shown to be effective in many cases. In recent years, evolutionary algorithms have been increasingly applied in the field of manufacturing to solve a wide range of problems, including scheduling, resource allocation, and design optimization. One of the most widely used evolutionary algorithms in manufacturing is the genetic algorithm (GA). GAs are based on the principles of natural selection and genetics and are used to search for the optimal solution to a given problem. They have been applied to a variety of manufacturing problems, such as scheduling, resource allocation, and design optimization. For example, GAs have been used to optimize scheduling in manufacturing systems [11] and to design optimal control policies for robotic systems [12]. PSO is a population-based optimization algorithm that is inspired by the behavior of bird flocks. PSO has been applied to a wide range of manufacturing problems, such as machining parameter optimization [13] and robotic path planning [14]. Another family of evolutionary algorithms is the Differential Evolution (DE) algorithm, which is a simple and efficient stochastic optimization algorithm for solving global optimization problems. DE has been applied in areas such as scheduling and control in production systems, in the optimization of cutting parameters in CNC (Computer Numerical Control) machines [15], for evaluating the optimal design in the production line.
Evolutionary AI-Based Algorithms for the Optimization of the Tensile …
199
3 Materials and Methods The FDM process involves several key steps. A CAD model is first created, and then converted into STL files for stereolithography. These files represent threedimensional surfaces as an assembly of planar triangles, with a greater number of triangles resulting in increased accuracy. The next step is slicing, which includes describing the 3D part, dividing it into slices, and determining the necessary support material and tool path, as well as the angle of the tool. Different parameters are set in the STL file to control how the machines will operate in each layer. A Creality Ender 3 machine with a bed size of 220 × 220 × 250 mm3 is used to create FDM samples. CATIA software is utilized for the design of parts and then converted into an STL file and sliced into machine-readable g-code files using the Cura engine of the Repetier software. The dimensions of the tensile specimen are 63.5 × 9.53 × 3.2 mm3 , which are as per the ASTM D638 requirements, the impact specimen dimensions are 63 × 12.7 × 3.2 mm3 , following the ASTM D256 requirements and the dimensions of the flexural specimen are 125 × 12.7 × 3.2 mm3 , which are as per the ASTM D790 requirements. The dimensions are shown in Fig. 1. The material used to create the specimen is polylactic acid (PLA), which is a common material used for FDM-processed parts. The tensile strength of the specimens is determined using uniaxial tensile tests. The obtained experimental results are shown in Table 1. The obtained experimental dataset is stored as a CSV file in the system and is further imported into the Google Colaboratory environment for further evaluation purposes. Figure 2 shows the implemented framework in the present work. The current study focused on finding the optimal input parameters for obtaining the best optimal output parameter by implementing two Evolutionary AI-Based algorithms i.e., Differential Evolution and Particle Swarm optimization algorithm. Fig. 1 Dimensional sketch of tensile specimen
200
A. Mishra et al.
Table 1 Experimental results Infill percentage
Layer height (mm)
Print speed (mm/s) Extrusion temp (◦ C)
Tensile strength (MPa)
78
0.32
35
220
46.17
10.5
0.24
50
210
42.78
33
0.16
35
220
45.87
33
0.32
35
200
41.18
33
0.16
65
200
43.59
100.5
0.24
50
210
54.2
78
0.16
35
200
51.88
33
0.32
65
200
43.19
78
0.32
65
200
50.34
33
0.16
65
220
45.72
78
0.16
35
220
53.35
55.5
0.24
50
210
49.67
33
0.32
35
220
45.08
55.5
0.24
50
190
47.56
55.5
0.24
50
210
48.39
78
0.32
65
220
46.49
55.5
0.24
50
210
47.21
55.5
0.24
50
210
48.3
55.5
0.24
50
230
50.15
33
0.32
65
220
43.35
55.5
0.24
50
210
45.33
55.5
0.24
80
210
45.56
78
0.16
65
200
49.84
55.5
0.24
20
210
48.51
55.5
0.08
50
210
42.63
55.5
0.4
50
210
42.87
55.5
0.24
50
210
47.14
78
0.32
35
200
45.17
55.5
0.24
50
210
47.07
78
0.16
65
220
50.99
33
0.16
35
200
200
Evolutionary AI-Based Algorithms for the Optimization of the Tensile …
201
Fig. 2 Framework implemented in the present work
4 Results and Discussion 4.1 Data Analysis and Visualization Figure 3 shows the plot of the heatmap of the experimental dataset. Heatmaps are a powerful tool for data analysis that can provide valuable insights into complex datasets. They are a type of graphical representation that uses color to represent the density of data points in a given area. This allows for easy identification of patterns and trends that may not be immediately obvious from looking at raw data. One of the most important benefits of heatmaps is their ability to reveal clusters and outliers in the data. By displaying data points in a graphical format, heatmaps make it easy to identify areas of high and low density, which can help to identify clusters of similar data points and outliers that may not fit the overall pattern. This can be extremely useful in identifying potential problems or areas for further investigation. Another important use of heatmaps is in identifying correlations and relationships between different data points. By displaying data points in a graphical format, heatmaps make it easy to see how data points are related to one another. This can be used to identify patterns and trends that may not be immediately obvious from looking at raw data. For example, heatmaps can be used to identify relationships between different input variables in a dataset, such as the relationship between Layer height and Print speed. Heatmaps are also a useful tool for visualizing large and complex datasets. By using color to represent the density of data points, heatmaps can effectively compress large amounts of data into a small and easily understandable format. This makes it easy
202
A. Mishra et al.
to quickly identify patterns and trends in the data, even when dealing with large and complex datasets. Figure 4 shows the plot of the feature importance. In machine learning, feature importance is the process of determining the relative importance of each feature in a dataset for a specific task or model. This is an important step in understanding the data and building accurate and effective models. Tree-based models, such as decision trees and random forests, are popular choices for feature importance. These models create a tree-like structure that separates the data into smaller subsets, with each split in the tree corresponding to a feature. The feature that results in the most informative split is considered the most important. The results shows that the Infill percentage input parameter has highest feature importance value in comparison to the other input parameters.
Fig. 3 Plot of heatmap Fig. 4 Plot of feature importance
Evolutionary AI-Based Algorithms for the Optimization of the Tensile …
203
4.2 Particle Swarm Optimization (PSO) Algorithm for Finding Optimal Parameters Figure 5 shows the flowchart implemented in the present work for finding the optimal input and output parameters. Firstly, we have to initialize the position (xi ) and velocity (vi ) of each particle as shown in Eqs. 1 and 2. The initial position of the particles are shown in Fig. 6. Second step is to evaluate the fitness (or objective function value) of each particle using the objective function to be optimized shown in Eq. 3. For each particle i, we have to update its personal best position pi based on its current fitness using Eq. 4. The best position of the particles are shown in Fig. 7. Next step is to determine the global best position g based on the personal best positions of all particles following Fig. 5 Flowchart of the PSO algorithm used in the present work
204
A. Mishra et al.
Eq. 5. The global best position obtained for the particles is shown in Fig. 8. We have to update the velocity of each particle using the Eq. 5. Also, the position of each particle is updated using the Eq. 7. We have to repeat steps 2–6 for a set number of iterations or until a stopping criterion is met. xi = Min(x i ) + rand() × (Max(xi ) − Min(xi ))
Fig. 6 Plot of the initial positions of the particles
Fig. 7 Plot of the best positions of the particles
(1)
Evolutionary AI-Based Algorithms for the Optimization of the Tensile …
205
Fig. 8 Plot of the best global position of the particle
vi = Min(v i ) + rand() × (Max(vi ) − Min(vi ))
(2)
f i = ( f (xi ))
(3)
pi = xi i f f i < f ( pi )
(4)
g = pi i f f i < f (g) f oralli
(5)
vi = w × vi + c1 × rand() × ( pi − xi ) + c2 × rand() × (g − xi )
(6)
where w, c1 , and c2 are weighting factors that control the influence of the personal best, global best, and current velocity on the update. xi = xi + vi
(7)
Table 2 shows the obtained result of optimal input parameters and output parameter. The PSO algorithm is further coupled with the XG Boost algorithm to determine the Mean Square Error (MSE) and Mean Absolute Error (MAE) as depicted in Table 3. Table 2 Optimal input and output parameters obtained from the PSO Algorithm Infill percentage
Layer height
Print speed
Extrusion temperature
Tensile strength
14.52
0.29
48.19
190.67
78.00
206 Table 3 Obtained metrics features of PSO-XG Boost Algorithm
A. Mishra et al.
Algorithm
Mean Square Error (MSE)
Mean Absolute Error (MAE)
PSO-XG Boost
1.6806
1.1848
Fig. 9 MSE and MAE plot for each fold obtained for PSO-XG Boost Algorithm
Figure 9 shows the MSE and MAE plot for each fold. The Mean Squared Error (MSE) and Mean Absolute Error (MAE) plot for each fold provides insights into the performance of the model across different subsets of the data. If the MSE and MAE values for each fold are relatively similar, it suggests that the model performs consistently across different data subsets. This indicates good generalization ability, meaning the model is likely to perform well on new, unseen data. Lower MSE and MAE values indicate better model performance, as these metrics measure the average squared and absolute differences, respectively, between the predicted and true target values. Smaller error values suggest that the model’s predictions are closer to the actual target values. If there is a significant difference in MSE or MAE values for some folds compared to the others, it may imply that the dataset contains outliers or the model is sensitive to certain data variations. Further investigation would be required to determine the cause of the discrepancy and whether additional data pre-processing or model adjustments are necessary. By comparing the MSE and MAE values, you can gain insight into the model’s robustness to outliers. A larger difference between MSE and MAE values might indicate that the model is more sensitive to outliers, as the MSE penalizes larger errors more heavily due to the squaring operation. Conversely, smaller differences between MSE and MAE may suggest that the model is more robust to outliers.
Evolutionary AI-Based Algorithms for the Optimization of the Tensile …
207
4.3 Differential Evolution (DE) Algorithm for Finding the Optimal Parameters Differential Evolution (DE) is a stochastic optimization algorithm that is widely used for finding the optimal parameters of a system or model. The mechanism of DE algorithm is based on the principles of natural selection and genetic algorithms, where a population of candidate solutions is evolved over time to find the best solution. The DE algorithm starts by initializing a population of candidate solutions, where each solution is represented by a set of parameter values. The DE framework used in the present study is shown in Fig. 10. The DE algorithm is known for its ability to efficiently explore the search space and find the global optimum. The combination of mutation and crossover allows for a balance between exploration and exploitation of the search space, which helps to prevent the algorithm from getting stuck in local optima. DE algorithm also has the ability to handle continuous and discrete variables, as well as handling large number of parameters. Table 4 shows the obtained result of optimal input parameters and output parameter. It is observed from Tables 2 and 4 that the PSO algorithm is resulting in the high optimal value of the tensile strength of the additively manufactured specimen. One of the main advantages of PSO over DE is its ability to handle high-dimensional problems more efficiently. PSO has been found to converge faster than DE for problems
Fig. 10 DE Framework used in the present work
Table 4 Optimal input and output parameters obtained from the DE Algorithm Infill percentage
Layer height
Print speed
Extrusion temperature
Tensile strength
90.81
0.13
46.23
228.57
53.34
208 Table 5 Obtained metrics features for Differential Evolution-XG Boost Algorithm
A. Mishra et al.
Algorithm
Mean Square Error (MSE)
Mean Absolute Error (MAE)
Differential Evolution-XG boost
2.7475
1.4331
with high number of variables, which makes it more suitable for such cases. Additionally, PSO has a relatively small number of parameters that need to be adjusted, making it easier to implement and adjust. This makes PSO more suitable for situations where computational resources are limited. The Differential Evolution algorithm is further coupled with the XG Boost algorithm to determine the Mean Square Error (MSE) and Mean Absolute Error (MAE) as depicted in Table 5. Figure 11 shows the shows the MSE and MAE plot for each fold. When comparing the performance of two optimization algorithms like Particle Swarm Optimization (PSO) and Differential Evolution (DE) coupled with XGBoost, there can be several reasons for the differences in the Mean Squared Error (MSE) and Mean Absolute Error (MAE) values. PSO and DE have different mechanisms for exploring and exploiting the search space. PSO is based on the social behavior of birds flocking or fish schooling and uses the best positions found by the swarm to guide the search. DE, on the other hand, is a population-based optimization algorithm that generates new candidate solutions by combining the differences of randomly selected individuals in the population. As a result, PSO tends to converge faster, while DE may explore the search space more thoroughly. The difference in the balance between exploration and exploitation can lead to differences in the performance of the two algorithms. Depending on the shape and characteristics of the search space for the specific problem, one algorithm may perform better than the other. For example, PSO might perform better in search spaces that have a unimodal fitness landscape, while DE might excel in multimodal search spaces. Both PSO and DE have several parameters that need to be tuned, such as inertia weight, cognitive and social coefficients for PSO, and mutation and crossover rates for DE. The performance of the algorithms can be highly sensitive to these parameter settings. It’s possible that the parameter settings used in the PSO-XGBoost implementation were better suited to the problem than those used in the DE-XGBoost implementation. Both PSO and DE are stochastic algorithms, meaning that they incorporate random elements in their search process. As a result, their performance can vary across different runs. It’s possible that the specific run of PSO-XGBoost resulted in better solutions than the DE-XGBoost run due to random factors. The initial population of solutions for both algorithms can have a significant impact on their performance. If the initial population for PSO was closer to the optimal solution than the initial population for DE, PSO may converge faster and achieve better results. The stopping criteria for both algorithms, such as the maximum number of iterations or the convergence threshold, can influence the performance. If one algorithm is allowed to run for more iterations
Evolutionary AI-Based Algorithms for the Optimization of the Tensile …
209
Fig. 11 MSE and MAE plot for each fold obtained for Differential Evolution-XG Boost Algorithm
or has a stricter convergence criterion, it may have more opportunities to find better solutions. Furthermore, PSO algorithm is better suited for handling constraints, which are often present in practical optimization problems. In PSO, constraints are handled by modifying the velocity update equations, which allows the algorithm to find solutions that satisfy the constraints. On the other hand, DE algorithm has been found to struggle with constraints, and it often requires additional methods to handle them. On the other hand, DE algorithm is generally more robust and can handle a wide range of optimization problems. Additionally, DE can handle problems where the objective function is noisy or has multiple local minima. DE algorithm is also easy to understand and implement, which is beneficial for researchers who are new to the field of optimization.
5 Conclusions Despite the advantages that evolutionary algorithms offer, there are some challenges associated with their use. One of the most significant challenges is the complexity of the algorithms themselves, which can be difficult for engineers to understand and implement. Additionally, the algorithms can require significant computing resources, and can be time-consuming to run. Finally, the algorithms can be sensitive to the initial conditions of the problem, meaning that the same problem can yield different results
210
A. Mishra et al.
depending on the initial parameters. The following conclusions can be made from the present work: • Additive manufacturing has the potential to revolutionize various industrial sectors. It allows for the creation of complex geometries, customization of products and cost savings. However, more research is needed to fully understand the implications of AM and how it can be used to its full potential in different industries. • Evolutionary algorithms have been widely used in manufacturing to solve a wide range of problems, including scheduling, resource allocation, and design optimization. These algorithms have been shown to be highly effective in finding near-optimal solutions to these problems, and they have been applied to a wide range of manufacturing systems, including scheduling systems, robotic systems, and CNC machines. • Both PSO and DE are powerful optimization algorithms, and the choice of which algorithm to use will depend on the specific problem being solved. PSO algorithm is more efficient and effective in finding optimal parameters in high-dimensional problems and problems with constraints, while DE algorithm is more robust and can handle a wide range of optimization problems. It’s important to note that both algorithms have their own strengths and weaknesses, and it’s essential to carefully evaluate the problem at hand and choose the algorithm that best suits the problem. • Evolutionary algorithms are a powerful tool for optimizing complex problems in the fields of manufacturing and materials science. The algorithms offer many advantages, including increased efficiency, cost savings, and improved performance. However, the algorithms can be complex and can require significant computing resources, so there are some challenges associated with their use.
References 1. Asadi-Eydivand M, Solati-Hashjin M, Fathi A, Padashi M, Osman NAA (2016) Optimal design of a 3D-printed scaffold using intelligent evolutionary algorithms. Appl Soft Comput 39:36–47 2. Deswal S, Narang R, Chhabra D (2019) Modeling and parametric optimization of FDM 3D printing process using hybrid techniques for enhancing dimensional preciseness. Int J Interact Des Manuf (IJIDeM) 13:1197–1214 3. Han Y, Jia G (2017) Optimizing product manufacturability in 3D printing. Front Comp Sci 11:347–357 4. Ferretti P, Leon-Cardenas C, Santi GM, Sali M, Ciotti E, Frizziero L, Donnici G, Liverani A (2021) Relationship between FDM 3D printing parameters study: parameter optimization for lower defects. Polymers 13(13):2190 5. Radhwan H, Shayfull Z, Nasir SM, Irfan AR (2020) Optimization parameter effects on the quality surface finish of 3D-printing process using taguchi method. In IOP Conference Series: Materials Science and Engineering 864(1), p 012143. IOP Publishing 6. Najmon JC, Raeisi S, Tovar A (2019) Review of additive manufacturing technologies and applications in the aerospace industry. Additive manufacturing for the aerospace industry, pp 7–31
Evolutionary AI-Based Algorithms for the Optimization of the Tensile …
211
7. Vignesh M, Ranjith Kumar G, Sathishkumar M, Manikandan M, Rajyalakshmi G, Ramanujam R, Arivazhagan N (2021) Development of biomedical implants through additive manufacturing: A review. J Mater Eng Perform 30(7):4735–4744 8. Sing SL, An J, Yeong WY, Wiria FE (2016) Laser and electron-beam powder-bed additive manufacturing of metallic implants: A review on processes, materials and designs. J Orthop Res 34(3):369–385 9. https://www.localmotors.com/3d-printed-cars/ 10. Ghaffar SH, Corker J, Fan M (2018) Additive manufacturing technology and its implementation in construction as an eco-innovative solution. Autom Constr 93:1–11 11. Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley Longman Publishing Co., Inc, Boston, MA, USA 12. Sehgal A, Ward N, La H, Louis S (2022) Automatic parameter optimization using genetic algorithm in deep reinforcement learning for robotic manipulation tasks. arXiv preprint arXiv: 2204.03656. 13. Prakash SO, Jeyakumar M, Gandhi BS (2022) Parametric optimization on electro chemical machining process using PSO algorithm. Materials Today: Proceedings 62:2332–2338 14. Gómez N, Peña N, Rincón S, Amaya S, Calderon J, (2022) March. Leader-follower behavior in multi-agent systems for search and rescue based on pso approach. In Southeast Con pp 413–420. IEEE 15. Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359
Securing Data Conveyance for Dynamic Source Routing Protocol by Using SDSR-ANNETG Technique Ahmed R. Zarzoor and Talib M. J. Abbas
Abstract Security of data conveyance in the Mobile Ad hoc network (MANET) is still an open research area. Due to the nature of its’ framework, where any group of mobile nodes can freely interconnect with each other without requiring any central control such as a router or access point, which makes it vulnerable to many attacks such as black hole attack and forward attack. Therefore, in this study a new technique called SDSR-ANNETG is proposed to select a secure and high-quality routing path for data conveyance between source node (SN) and destination node (DN) for Dynamic Source Routing protocol (DSR). The proposed technique consists of two phases: Entropy Trust, Gain information (ETG) and Artificial Neural Network (ANN). In the ETG, a malicious node is identified based on the nodes’ trust value. While, the ANN method utilizes the path information (nodes’ trust value, total route energy usage, drop packet count) in order to select, the most secure and highest quality route for data transmission between SN and DN. The simulation results have been shown the SDSR-ANNETG method gives high accuracy in detection of blockhole attack and forward attack in comparison with the S-DSR method. Besides, it elects the path that gives high performance, according to Quality of Services (QoS) metrics: latency, throughput, drop packet and nodes’ energy usage in contrast with the S-DSR technique. Keywords Mobile Ad hoc Network (MANET) · Dynamic Source Routing (DSR) · Black Hole Attack (BHA) · Forward Attack (FA) · Machine Learning (ML)
A. R. Zarzoor (B) Ministry of Health, Baghdad, Iraq e-mail: [email protected] T. M. J. Abbas Ashur University College, Baghdad, Iraq e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Adadi and S. Motahhir (eds.), Machine Intelligence for Smart Applications, Studies in Computational Intelligence 1105, https://doi.org/10.1007/978-3-031-37454-8_11
213
214
A. R. Zarzoor and T. M. J. Abbas
1 Introduction A “Mobile Ad hoc Network (MANET)” [1] is decentralized network, where in this network each node can able to communicate with any node and moves from one group to another group of nodes without requiring any central control device such as router or sink node. The routing protocol of MANET can be classified into three categories: table driven, on demand and hybrid. In the table-driven protocol, all the paths are formed between any pair of nodes in the network and path information is saved in the nodes’ routing information even it has not used [2, 3]. While, in on demand routing protocol the path is formed only when there is a need to send message between SN and DN. In the hybrid, both table- driven and on demand techniques are utilized. Also, nature of three routing protocols in MANET makes it vulnerable to attacks since any node can join the network and freely to communicate with other nodes without needs to any central control to secure the communication. Besides, the limited capability of the nodes in the network such as tiny battery power, small memory capacity and limited processor speed makes traditional security methods unsuitable for MANET. Therefore, the security of the MANET can be classified into two types: data security and path routing security [4–6]. The data security is used to secure data exchanged packet from modifying or snooping by unauthorized node. The data security topic is out of scope of this study. While, the path security is used to secure the routing path against any routing attacks such as black holes and forward attacks [7]. In the black hole attack, the attacker uses a malicious node to behave as normal mode and offers an ideal and a short path to its neighbors so as to data convey from SN to the DN. In order to, make neighbors join the path and it receives from them a data packet to (use or drop) it [8]. Also, in the forward attack the same technique that used in the black hole attack is utilized. But instead of dropping or using the data packet, the malicious node forwards the data packet to different destination [5]. However, the common techniques to handle the routing attacks has been done by monitoring the behavior of all the nodes that form the routing path between SN and DN, so as to, identify nodes’ misbehavior (such as drop packet count or taking a long time to reply) and added it to blacklist or isolated it from the network [9]. In this paper, a new technique is proposed to select the most secure route and it have a high quality for the DSR protocol based on two phases: ETG and ANN. In ETG technique, the trust value computes of each node that form the path between SN and DN to detect malicious node in the routing path. While, the ANN method uses the true value with other metrics such as total route energy consumption, drop packet and throughput to choose, the most secure and high-quality route between SN and DN. The SDSR-ANNETG method is proposed as part of developing our previous work [10] to secure and enhance DSR routing protocol performance. The DSR is on demand routing protocol, that consists of two phases: exploring route and maintaining the route. In the first phase, the SN sends a request message called “Route Request” message (RREQ) that contains SN ID, DN ID, path ID to all of its neighbors. Each neighbor checks its cache (i.e., routing table information) to find the
Securing Data Conveyance for Dynamic Source Routing Protocol …
215
path. So, if it found one path, then it replies immediately to SN. Otherwise, adds its’ address in the RREQ message and forward message to its neighbors and so on, until the RREQ message reaches DN [10–12]. Subsequently, DN utilizes the path details that stored in the RREQ message to send a “route replay” message (RREP) to SN. In case of finding more than one path to the SN then the path that has the minimum number of hops is selected, in order to send its details in RREP message to the SN. Consequently, the route detail in the RREP message is utilized by SN so as to send packet data to the DN. While, the maintaining phase is used to manage link failure, due to node lost its energy or some node moves from the path to another path. So, in this stage a “route error” (RERR) message is sent to the SN to remove broken path details from it cache and restart discovering phase again. The rest of the paper is organized as follows: Sect. 2, explores related methods that utilized to secure routing path between SN and the DN in MANET, Sect. 3 presents the SDSR-ANNETG method and Sect. 4 discusses the implements of the proposed method results. Finally, Sect. 5 includes conclusions.
2 Related Works The security of routing protocol for the MANET is still an open research area due to the nature its structure. In study [13], the authors proposed a technique called GDSR to prevent the network against Denial of Source (DoS) attack. The GDSR method is based on monitoring neighbor via calculating G value, where G is a counter transmitting messages from the neighbor to the node. So, if the G value large then the node detected as a malicious node the routing path. In [14] researchers are utilized “Integrated Cross Interior” (ICI) algorithm with “Intrusion Detection System” IDS to detect a black hole attack in MANET. The malicious node in ICI is identified based on two parameters: reply time and routing cost. In the algorithm, they check each nodes’ history for the reply time and number of forward messages (i.e., path cost). Thus, the node that has higher response time and maximum path cost is detected as malicious node. While, in [15], researchers used swarm inspired algorithm beside the “Artificial Neural network” (ANN) to alleviate gray hole attack and black hole attack. In the algorithm, the malicious node is detected according to the fitness function (FF) value. The of FF is computed based on the check of each nodes’ energy usage and packet delivery ratio (PDR). Consequently, if the FF value is large then a node is identified as a malicious node. In [16] study, ANN and “Support Vector Machine” (SVM) are used to specify the black hole attack on the routing path for MANET. The malicious node on the routing path is detected based on the number of dropped packets. Also, [17] study utilized supervise learning to detect malicious node in MANET via using the history of the shared routing information about cluster (group of nodes) or host (one node). While, in [18] study “Restricted Boltzmann Machine” RBM to detect denial of resource DOS attack in DSR. The RBM learns from the network criteria node reputation, where a malicious node is assigned based on the calculation of the trust value for
216
A. R. Zarzoor and T. M. J. Abbas
each node on the routing path. In [19] authors proposed a secure S-DSR method to elect secure routing path for DSR. In S-DSR black hole node is detected via adding a new field to the RREP message called “Check” that contained weight value. So, according to the weight value for each node in routing path the SN can elect the more secure path for conveying data packet to the DN. In [20] researchers are used Fuzzy system to detect malicious node in routing path based on trust value. Also, they used hop counter with the trust value to elect the routing path between SN and DN. So, they used node trust value, number of hops as input to fuzzy system and the output is the secure path between SN and DN. Also, in [21] a new method based on fuzzy rule called “Trust Based Secure Routing Protocol” (TBSRP) is utilized to secure routing path for MANETs. In TBSRP two parameters trust value and grade of trust are used to select a trust path, for routing path between SN and DN. In [22] study a trust model called “Fuzzy Petri Net” (FPI) is proposed to enhance OLSR protocol performance based on an Entropy algorithm to secure routing path between SN and DN. The entropy selects the trust path based on the computation of the trust value. So, the path that has high trust value is selected from the routing table in OLSR. While, in this study Entropy algorithm is used, but for reactive protocol DSR not for proactive protocol (OLSR) besides the “Select Optimal Link” SOL-DSR [10] technique, which utilized to elect the ideal path to enhance network performance.
3 Study Method The SDSR-ANNETG technique consists of two steps: ETG and ANN. In the first step, all the paths between SN and DN that achieves high performance (i.e., the minimum drop packet count, high throughput and less nodes’ energy consumption) are elected by utilizing the SOL-DSR. Also, in the SOL-DSR method, each node in the routing path between SN and DN is selected according to four metrics: Battery Capacity (BC), number of neighbors, node velocity and distance between any two nodes in the routing path between SN and DN. So, the ideal path is the one that forms from the nodes that have maximum BC, large number neighbors, slow moving and with tiny distances between any pair in the path. Nevertheless, a malicious node detection problem in the process of a selecting routing path between SN and DN is not considered in the SOL-DSR technique. Therefore, the ETG method is used in this step to compute the total trust value for each path that have been formed by utilizing the SOL-DSR, based on the entropy information and trust gain information. The entropy, is a term that used to measure rate level of uncertain information so as to get certain possible information [23]. In the ETG method, the entropy uncertain information is computed based on the Trust Value (TV) of each node in the routing path between SN and DS. The TV is calculated by using Eq. (1). T V (i, j) (t) =
Si, j (t) × 100% packetsent
(1)
Securing Data Conveyance for Dynamic Source Routing Protocol …
217
where, T V (i, j ) (t) is the trust value between node i and j at time (t) (i.e., gain trust information). Si, j (t), is the number of sent packets from node i to node j at time t divided by total number sent packet on the route between SN and DN. So, if TV = 0, which mean node is a malignant node. Thus, according to the ratio (P) of (TV) value the entropy value (EN) is calculated via using Eq. (2) [24, 25]. The EN value ranges between 0 and 1, where 1 represent the highest entropy value, which means all the nodes in the routing path between SN and DN are detected as malicious (i.e., unsecure path). While, 0 values for EN represents lowest entropy value, that means no malicious node is detected in the path. E N = Plog(P) − (P + 1)log(P + 1)
(2)
To illustrate, see Fig. 1. Where there are four routes (R1, R2, R3 and R4) have been formed by using SOL-DSR method between SN and DN. The routes’ details, are shown in Table 1, where P is the ratio of TV. So, the route R2 gives lower entropy value (EN = 0.2) and contains one malicious node in comparison with R3 contains two malicious nodes (EN = 0.3) and R4 contains three malicious nodes where (EN = 1). Consequently, the EN value is utilized by DN in order to elect trust route which is the R1 in Fig. 3. So, the route R1 details are sent to SN via RREP message. Also, in Fig. 3, all the paths have the same length (i.e., number of hops). Thus, when there are paths with different length, the path that give the lowest EN value will be elected as secure path. In another word, even though there are malicious nodes in the path. In another word, an ETG technique can be used to alleviate the routing attack for the DSR.
Fig. 1 Illustrate the DSR-ENG methods
218 Table 1 Route details
A. R. Zarzoor and T. M. J. Abbas
Route
P (Normal)
P (Malicious)
EN
R1
1
0
0
R2
0.75
0.25
0.2
R3
0.5
0.5
0.3
R4
0
1
1
In the second step, the ANN method is used to elect the secure and optimal route between the SN and DN. In this step, each route information uses so as to determine the most secure and ideal path between SN and DN. The ANN is a method of ML which is designed to act such as a human brain. The action of ANN is identical to how the human brain acts and learns from experience. The ANN structure in this study composes of three layers: input layer, single hidden layer and output layer, see Fig. 2. The input layer takes the path information such as TV from the ETG method with other parameters: packet delay and total of the paths energy consumption that calculated in SOL-DSR. The hidden layer is used to process the path information in order to identify the relationship between the features that taken from input layers. While, the final secure and optimize path value that has minimum energy consumption, low delay and Highest TV, after calculation and obtained from output layer. The inconsistency between input data values and output result values is gained in a term called error value. Subsequently, according the error value, the neurons’ weight is updated in the hidden layer and this procedure is called “backpropagation”. The neuron states can be either 0 or 1 conformable to the sigmoid function (deactivation and activation). Each neuron input values are calculated by using Eq. (3). Where, zj is the input value, Rij is the resistance of the communicated weight between neuron i and neuron j. While, the output layer value for neuron j can be either 0 or 1 and the output of input i can be expressed in Eq. (4). The energy is diverse because of the modification in the stat-work of the neurons (i) [26]. Thus, this variation in energy
Fig. 2 Illustrates the SDSR-ANNETG structure
Securing Data Conveyance for Dynamic Source Routing Protocol …
219
Fig. 3 Illustrate the accuracy detection for Black hole attack in the S-DSR and SDSR-ANNETG method
can be expressed by Eq. (5). zj + I ni Ri j j=i
(3)
⎛ ⎞ zj + I n i < z j ⎠ z j = z i0 if⎝ Ri j j=i zj ∂E =− − I ni . ∂ zi Ri j j=1 i=1 N
(4)
N
(5)
4 Result and Discussions The proposed method, is implemented by using a simulation network simulator (version NS 2.35). Four scenarios are created: SDSR-ANNETG with the forward attack, S-DSR with the forward attack, SDSR-ANNETG with the black hole attack and S-DSR with the black hole attack. In each scenario, 60 mobile nodes are deployed in the area 200 × 200 m, simulation time 60 min, see Table 2. The accuracy of attack is detection in each scenario is detected via utilizing four parameters: False Native Rate (FNR) that calculated by using Eq. (6), True Negative Rate (TNR) is computed via
220 Table 2 Simulator parameters
A. R. Zarzoor and T. M. J. Abbas
Parameter
Values
Area size
200 × 200
Simulation time
60 min
Routing protocol
SDSR-ANNETG, S-DSR
Number of nodes
60
MAC layer
802.1 b
Mobility
Random way point
Channel capacity
2 MBPs
Transmit power
20 m W
Receiver power
20 m W
using Eq. (7), False Positive True (FPT) is computed by utilizing Eq. (8), True Positive Rate (TPR) calculated by using Eq. (9) and Accuracy of detection is computed by utilizing Eq. (10) [27, 28]. Also, the routing paths that produced in the four scenarios are evaluated via using Quality of Services (QoS) metrics: End to End delay (ETED), Throughput (TH), Drop Packets (DP) and energy usage (EU). The ETED is measured by using Eq. (11), DP and TH are calculated via utilizing Eqs. (12) and (13). While, the EU is computed by utilizing Eqs. (14)–(16) [29]. Where, k represents k-bit that needed to convey data, EnergyTx represents the energy consuming for transmitting k-bit, EnergyRx is the energy consuming for receiving k-bit. While, d represents the distance between two nodes. The simulation results have been revealed that the SDSR-ANNETG achieves high accuracy detection for Black hole and forward attacks in comparison with S-DSR see Figs. 3 and 4. For the network performance, the routes that formed in SDSRANNETG gives high throughput (TH = 1200 Kbps) in comparison with S-DSR (TH = 800 Kbps). Also, the path that selected by the SDSR-ANNETG gives minimum number of dropped packets (DP = 120 packets) in contrast with S-DSR (DP = 330 packets). But S-DSR method, select path that takes less time to deliver packets to the DN (i.e., gives the minimum latency (ETED = 770 ms) in comparison with SDSR-ANNETG where ETED = 900 ms, see Fig. 5. The reason for that in S-DSR there only check for the weight value of each node in the routing path between SN and DN. While, in the SDSR-ANNETG there are two checks: one for the TV and the other one is for the all the path (i.e., EN), see Fig. 6. For nodes’ energy usage, the nodes that form the path which is elected by using S-DSR are consumed more energy in comparison with nodes that constructed the path that select by utilizing the SDSR-ANNETG method.
FPR =
TNR =
FPR FPR + T NR TNR T NR + FPR
× 100
(6)
× 100
(7)
Securing Data Conveyance for Dynamic Source Routing Protocol …
221
Fig. 4 Illustrate the accuracy detection for forward attack in the S-DSR and SDSR-ANNETG method
Fig. 5 Path quality for the S-DSR and SDSR-ANNETG method
222
A. R. Zarzoor and T. M. J. Abbas
Fig. 6 Illustrate the network energy usage for SDSR-ANNETG and S-DSR method
FNR =
T PR + T NR FPR + T NR + T PR + FNR
T PR × 100 T PR = T PR + FNR
Accuracy =
× 100
T PR + T NR T PR + T NR + FNR + FPR
(8) (9) (10)
priod o f time that need to deli ver data packet to the target node number o f r ecei ved data packet at target node (11) (12) Send data packets − Reci vied data packets DP =
ET E D =
AvgT H =
T he amount o f sending packets Speci f ic period o f time
N ode, EU = inital energy − |EenegyT x + Eenegy Rx |
(13) (14)
Securing Data Conveyance for Dynamic Source Routing Protocol …
EenegyT x (d, k) =
223
⎧ ⎨ k Elec + kε amp d 2 , d < d0 ⎩
(15) k Elec + kε amp d , d ≥ d0 4
Eenegy Rx (k) = k Elec + k E pa.
(16)
5 Conclusions This study is proposed in order to secure the routing path that created by the SOLDSR method so as to enhance DSR performance. The proposes method consists of two phases. In the first phase, the routing paths between SN and DN are formed by utilizing SOL-DSR method according to the parameters: battery capacity, node velocity, number of neighbors and distance between any pair of nodes in the routing path. So, the ideal path is the one which consists of nodes that have the maximum battery capacity, the small distance between a pair of nodes, low velocity and large number of neighbors. In the second phase, the SDSR-ANNETG method is utilized to select, the more secure routing path between SN and DN based on the path entropy of the gain trust information of all the nodes that forms the routing path between SN and DN by computing the total TV of each path. In order to utilize, TV and two other metrics: total energy usage and drop packet count as input values to ANN in order to select the secure and optimize route between SN and DN. The proposed method has been performed, by creating four scenarios via using NS2.35 software for the two methods: S-DSR and SDSR-ANNETG with two types of attacks black hole and forward attack. The simulation results have been shown SDSRANNETG gives high detection accuracy in comparison with the S-DSR method for black hole attack and forward attack. Besides, it gives better energy usage and achieves a small delay in contrast with the S-DSR method.
References 1. Kanellopoulos D, Cuomo F (2021) Recent developments on mobile ad-hoc networks and vehicular ad-hoc networks. Electronics 10(364):1–3 2. Al-Absi M, Al-Absi A, Sain M, Lee H (2021) Moving ad hoc networks—a comparative study. Sustainability 13(6187):1–31 3. Karthigha M, Latha L, Sripriyan K (2020) A comprehensive survey of routing attacks in wireless mobile ad hoc networks. In: 2020 international conference on inventive computation technologies (ICICT), pp 396–402 4. Abdel-Fattah F, Farhan KA, Al-Tarawneh FH, AlTamimi F (2019) Security challenges and attacks in dynamic mobile ad hoc networks MANETs. In: 2019 IEEE Jordan international joint conference on electrical engineering and information technology (JEEIT), pp 28–33
224
A. R. Zarzoor and T. M. J. Abbas
5. Asai N, Goka S, Shigeno H (2019) A trust model focusing on node usage in mobile ad hoc networks. In: 2019 IEEE international conference on pervasive computing and communications workshops (PerCom workshops), pp 517–522 6. Vaseer G, Ghai G, Ghai D (2019) Novel intrusion detection and prevention for mobile ad hoc networks: a single- and multiattack case study. IEEE Consum Electron Mag 8(3):35–39 7. Golchha P, Kumar H (2018) A survey on black hole attack in MANET using AODV. In: 2018 international conference on advances in computing, communication control and networking (ICACCCN), pp 361–365 8. Mwangi EG, Muketha GM, Ndungu GK (2019) A review of security techniques against black hole attacks in mobile ad hoc networks. In: 2019 IST-Africa week conference (IST-Africa), pp 1–8 9. Taranum F, Abid Z, Khan KUR (2020) Legitimate-path formation for AODV under black hole attack in MANETs. In: 2020 4th international conference on electronics, communication and aerospace technology (ICECA), pp 806–813 10. Zarzoor A (2021) Enhancing dynamic source routing (DSR) protocol performance based on link quality metrics. In: 2021 international seminar on application for technology of information and communication (iSemantic), pp 17–21 11. Istikmal, Subekti A, Perdana D, Ridha Muldina N, ArifIndra I, Sussi (2019) Dynamic source routing and optimized link state routing performance in multipath fading environment with dynamic network topology. In: 2019 4th international conference on information technology, information systems and electrical engineering (ICITISEE), pp 373–378 12. Nanda Kishore CV, Bhaskar S (2021) A priority based dynamic DSR protocol for avoiding congestion based issues for attaining QoS in MANETS. In: 2021 international conference on intelligent technologies (CONIT), pp 1–5 13. Dhatarwal R, Nagar D (2017) DoS attack prevention technique DSR protocol based on signal strength. Int J Comput Appl 167(6):17–22 14. Vinayagam J, Balaswamy CH, Soundararajan K (2019) Certain investigation on MANET security with routing and blackhole attacks detection. Procedia Comput Sci 165(2019):196–208 15. Rani P, Kavita SV, Nguyen N (2020) Mitigation of black hole and gray hole attack using swarm inspired algorithm with artificial neural network. IEEE Access 8:121755–121764 16. Pandey S, Singh V (2020) Blackhole attack detection using machine learning approach on MANET. In: 2020 international conference on electronics and sustainable communication systems (ICESC), pp 797–802 17. Basomingera R, Choi Y (2020) Learning from routing information for detecting routing misbehavior in ad hoc networks. Sensors 20(21) 18. Mohana Priya P, Shalinie S (2017) Restricted Boltzmann machine-based cognitive protocol for secure routing in software defined wireless networks. IET Netw 6(6):162–168 19. Mohanapriya M, Joshi N, Soni M (2021) Secure dynamic source routing protocol for defending black hole attacks in mobile Ad hoc networks. Indonesian J Electr Eng Comput Sci 21(1):582– 590 20. Yas Q, Khalaf M (2020) A trusted MANET routing algorithm based on fuzzy logic. In: ACRIT 2019, CCIS, vol 1174, pp 185–200 21. Garg M, Singh N, Verma P (2018) Fuzzy rule-based approach for design and analysis of a trust-based secure routing protocol for MANETs. Procedia Comput Sci 132:653–658 22. Wang X, Zhang P, Du Y, Qi M (2020) Trust routing protocol based on cloud-based fuzzy Petri net and trust entropy for mobile ad hoc network. IEEE Access 8:47675–47693 23. Navas RE, Cuppens F, Boulahia Cuppens N, Toutain L, Papadopoulos GZ (2021) MTD, Where art thou? A systematic review of moving target defense techniques for IoT. In: IEEE Internet Things J 8(10):7818–7832 24. Sirajuddin M, Rupa C, Iwendi C, Biamba C (2021) TBSMR: a trust-based secure multipath routing protocol for enhancing the QoS of the mobile ad hoc network. Secur Commun Netw, Article ID 5521713, 2021:1–9 25. Huang S, Lei K (2020) IGAN-IDS: an imbalanced generative adversarial network towards intrusion detection system in ad-hoc networks. Ad Hoc Netw 105:1–11
Securing Data Conveyance for Dynamic Source Routing Protocol …
225
26. Lv Z, Qiao L, Verma S, Kavita (2021) AI-enabled IoT-edge data analytics for connected living. ACM Trans Internet Technol 1–20 27. Mabodi K, Yusefi M, Zandiyan S, Irankhah L, Fotohi R (2020) Multi-level trust-based intelligence schema for securing of internet of things (IoT) against security threats using cryptographic authentication. J Supercomput 1–25 28. Zaminkar M, Fotohi R (2020) SoS-RPL: securing Internet of Things against sinkhole attack using RPL protocol-based node rating and ranking mechanism. Wirel Pers Commun 1–26 29. Ma X, Mao R (2018) Method for effectively utilizing node energy of WSN for coal mine robot. J Robot, Article ID 1487908, 2018:1–8, 2018
Machine Intelligence for Smart Government
Regional Language Translator and Event Detection Using Natural Language Processing P. Santhi, K. Deepa, M. Sathya Sundaram, and V. Kumararaja
Abstract Many languages and cultures are followed by the peoples in India. Most of the peoples are known only their mother tongue. But in everyday life, the peoples are in the position to get some essence from the bills and drafts passed by the government. These bills and drafts are commonly scripted by the language Hindi and English. These languages are not known to all the peoples of India. In this case, when a draft is passed within the specific languages, individuals of different states will not get the essence of this passed draft. They require an interpreter physically to communicate the pros and cons present in the drafts, bills or amendments. In this case, to make the individuals effortlessly get the essence of the drafts or bills passed by the government, a programmed territorial dialect interpreter is required. Here, it needs some language translation like Natural language Processing. This paper concentrates on various techniques used in language translation. In this proposed method using the NLP to convert the sentence into string and for the language translation. This paper uses the knuth morris pratt for searching the string and boyer moore horspool algorithm is used to find the substrings in the string. Through this idea all the peoples can able to know the essence in the bills or drafts passed by the individual as well as by the government. Keywords Natural language processing · Knuth morris pratt string searching algorithm · Boyer moore horspool algorithm · Summarization P. Santhi Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India K. Deepa (B) Department of Computer Science and Engineering, K. Ramakrishnan College of Technology, Trichy, Tamilnadu, India e-mail: [email protected] M. S. Sundaram Department of Computer Science and Engineering, Paavai Engineering College, Namakkal, India V. Kumararaja Department of Computer Science and Engineering, K. Ramakrishnan College of Engineering, Trichy, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Adadi and S. Motahhir (eds.), Machine Intelligence for Smart Applications, Studies in Computational Intelligence 1105, https://doi.org/10.1007/978-3-031-37454-8_12
229
230
P. Santhi et al.
1 Introduction Earlier, there are numerous sorts of online and offline applications are utilized to interpret one dialect to another dialect by using the various machine learning and deep learning algorithms. The dialect-to-dialect interpreter tremendously makes more difficult to the territorial range people groups who are incapable to understand the other dialects. For and illustration, Central government of India distribute their common circular and report in dialects called English and Hindi. But other state individuals confronting inconvenience to get to know the dialect. In that situation, the Territorial Dialect Interpreter plays critical part to interpret the script into territorial languages [1–3]. Within the works, it was learnt approximately about the utilize of GPS module in fore seeing and spotting out the mischances, working of IOT sensors in fire location and information mining procedures in identifying the control era and its normality [4–6]. Human interpreters are presently known with few essential dialects within the world, giving interpretation administrations in a variety of areas. The method of changing over an expression or content into another term where as keeping up the same meaning is known as interpretation. Interpretation has gotten to be a noticeable subject of consider due to the complexities of language structure and the ease with which setting implications may be misplaced. In spite of the fact that word—to— word translation is more valuable for simple statement, it isn’t ideal for discussions that contain basic data, incorporates medication, law, instruction, commerce, and religion. The relevant situation of a period is rich in this industry than coordinate interpretation. The Interpreters are gifted and well-trained in order to communicate the specified message precisely and within the suitable tone. In spite of the truth that English is broadly talked over the world, culture and territorial dialects proceed to be imperative. Interpretation administrations are accessible to bridge communication boundaries. Aside from discourse interpretation, there are a slew of extra alternatives [7–10]. These are a few of them: Record interpretation, Site localization and Sign Dialect interpretation. The toolkit is using the NLP has capacity to prepare the talked or composed sort data by using the Artificial intelligence (AI). In later a long time, we work with different kind of content information ought to prepare and change over into another language for our applications. There are much more inquire about wiped out field of Natural Language Processing in order to make strides for the precision of the content information handling [11–13]. In this Extend, we present the mechanized NLP handle to decipher the scripts in one dialect to other territorial dialect, based on the studies in the papers [14–16].
Regional Language Translator and Event Detection Using Natural …
231
2 Related Works In this work, Boyer Moore Half breed (BMH) and Knuth Morris Pratt (KMP) strategies are integrated to create and demonstrate the information administration frameworks of the worker of that company. By utilizing BMH and KMP calculations in ML, the content information has been handled. This strategy held to distinguish the fragmented data case and offer assistance to auto total. The data in arena by utilizing the integration of algorithms such as Boyer Moore Half Breed and Knuth Morris Pratt [17, 17–19]. Essentially in additional work, the information administration framework has been executed by applying the grouping of Fluffy and KMP calculations. This strategy coordinating the string literals that the string having the similarities [20, 21, 23]. In other paper, displayed the look conception with Knuth Morris Prats calculations. This work makes a difference to the investigate researcher to get it and actualize the Knuth Morris Prats calculation with their possess demonstrate in further works. A few of the calculations in ML algorithms are good in territorial dialect interpreter, But the requirement of the preparing speed and precision are expanding in future [22–25]. This paper has to go with the advanced model or calculation that has the more processing speed at the side of the exactness of the show. Numerous procedures have been utilized to total normal dialect preparing assignments for the Sinhala dialect. Straight forward Sinhala common dialect challenges were utilized to test the rule-based strategy. Additionally, rule-based methods were utilized for Sinhala MT [26–28]. Some of the authors have utilized the KMP calculation, where they deciphered the palembang dialect to bahasa indonesia dialect. In this approach the input sentence is broken into the word, this word will act as a string and after the preprocessing the deciphered string is utilized for the calculation [29]. Outstanding of the difficulty of dialect, measurable methods for Sinhala have created and tried. Factual methods are used for Sinhala characteristic dialect issues and these methods are moved forward by Sinhala common dialect models. It picked up footing for factual models in Sinhala [30–32]. In spite of the fact that profound learning does not require highlight choice, it is capable of creating more precise than the factual methods and instruction- based models [33–35]. For ML, profound learning has been utilized. A profound learning show, on the other hand, requires a greater dataset in order to memorize, and this necessity permits the profound learning show to generalize the dataset [36–38]. Deep learning comes up short to create predominant results in dialects with a decreased advanced information nearness. As a result, MT in Sinhala has the negligible chance of creating comparable results [39–42]. By analyzing this paper, we learnt almost the way of analyzing the feelings of content, which would be accommodating in our work, in analyzing the content of drafts and changing over them into territorial dialects [43–45]. From this Literature Review, we found that machine learning and profound learning calculations are utilized in content classification, nostalgic investigation over
232
P. Santhi et al.
the writings and it is included in the process of extractions. Through this paper, we have learnt the assistance of calculations and to know that NLTK library would offer assistance in translating the dialects productively which can be taken after within the work.
3 Proposed System Architecture In the proposed system, we take input files which are amendments, bills and drafts passed by the government mostly in the form of Hindi and English scripts. The scripts in the input files are translated using NLP toolkit and output will be generated in the form of regional language which the user wants. A model has been created for passing content records as input and calling packages for performing NLP activities. Within the NLP model, the summarization, tokenization is done. Figure 1 is included to represent the general working flow of the proposed system. The input pdf consists of languages like Hindi and English and the output will be in the form of any regional languages like Tamil, Malayalam, Telugu etc. Here the images as well as text will be extracted and processed here. To make an interpretation device which lead us towards simple, open and successful correspondence we have utilized Natural Language Processing which can comprehend the human language with protecting the input words and delivering a successful text in our indicated language. First, we will pass the input pdf which is in the form of Hindi and English scripts. Our model will recognize the input file, whether it is in pdf format or not. While reading the input pdf file, we segregate the images and text. The images which are present in the pdf will be extracted and stored in a separate folder which will then be processed to extract and translate the text. The contents present in the input file are passed for translation and store the translated content in a separate text file.
3.1 Text and Image Translation First of all, we will pass the input pdf which is in the form of Hindi and English scripts. The pdf file will be recognized by our model with the help of PyPDF2 package which is presented in python specially used for PDF. PyPDF2 is a pure python library which is used to manipulate the PDF files and we can extract the words present in the PDF metadata. Parallelly, the images in the file are extracted using fitz, which is present under the PyMuPdf and it is used to process the pdf file with high performance. Once opened the pdf file using fitz, using getImageList() the number of images present in the file, is extracted. Iterating over each index of the image, by using extractImage(), image is obtained in the form of bytes and additional information like extensions are obtained. The image is then saved by manipulating bytes data using BytesIO.
Regional Language Translator and Event Detection Using Natural …
233
Fig. 1 Block diagram of the proposed method
All the images present in the input file will be stored under a single folder. Once splitting of images and text are done, then text present in the file as well as text present in the images are extracted. Text present in the content of file is extracted using PdfFileReader and text present in images are extracted using pytesseract. Figure 2 represents the workflow of text extraction, translation, and finding accuracy of translated scripts Additional information like pdf author, creator, subject, title, and number of pages are obtained using the getDocumentInfo(). The translation part begins once the extraction part is completed. For performing translation, we use TextBlob as our main translation part and for reference purpose we parallelly translate the text using Google API. TextBlob is a Python library which is used to process the textual data from the extracted data and evaluate the tonality of the texts within a few lines of code. It is helpful in translating the dialect present in the text from one dialect to another. The next step in the flow is to track down the recall, precision score for the translated scripts. To accomplish this, we have utilized Recall- Oriented Understudy for Gusting Evaluation (ROUGE) metric. The primary motivation behind using this metric is that, this metric produces most accurate results by comparing machine translation against the set of references. It will analyze the created synopsis or text which is deciphered against a bunch of reference outlines (Here we used the text which is translated using Google API). ROUGE consists of parameter called Recall which refers to find that how many reference summary the system summary is capturing. For individual words, it will compute as
234
P. Santhi et al.
Fig. 2 Work flow of the proposed method
Recall =
No. of Overlapping Words Total words in reference summary
(1)
Precision also one of the Rouge parameters which measures, how much of system summary is relevant or it is needed, it can be computed as Precision =
No. of Overlapping Words Total words in System summary
(2)
Also, F-measure has been calculated, to get the harmonic mean value of both precision and recall scores. This combined single score is helpful in comparing the model’s performance with other models. We have calculated the Rouge-1, Rouge-2 and Rouge-L scores to find the overlap and similarities over the unigrams, bigrams and longest common subsequences respectively, which represents in the translated script when compared to the reference script.
Regional Language Translator and Event Detection Using Natural …
235
3.2 Event Detection Besides translation using NLP tools, we have use Latent Dirichlet Allocation (LDA) model to find the topic of the scripts present in the input file. In order to detect the event (main topic) of the script, we have followed the steps as represented in the Fig. 3. The main motive behind the event detection is to find the main gist behind the script present in the input file. Before passing the text into LDA model, the original text present in the scripts is summarized using LSA (Latent Semantic Analysis). The main use of this unsupervised learning algorithm is to perform the extractive summarization of the original script. The semantically significant sentences are extracted based on the number of lines we wanted to be in the summary. In the summarization, PlaintextParser module is used to recover the script form any common syntactical errors. Once summarization is done, the summarized script is passed to the sent_ tokenize(), which is imported from NLTK, to tokenize the summarized script into separate sentences. Gensim is one such popular NLP library used to perform complex tasks like word vectors. From Gensim utils, simple_preprocess () is the function which is used to convert the document sentences into list of tokens. The output tokens that we get from this preprocess () is the unicode strings, which does not need any process further. Also, by setting the deacc parameter equals to True, in simple preprocess(), the accent in the tokens are removed. Also processed the summarized content by extracting bigrams so that the words which combinedly occur most of the times in the text as well as which is essential in delivering the meaning for the script. Stop words like general verbs, conjunctions, prepositions have to be removed if it is present in the summarized script. The stopwords module is imported from NLTK. Corpus package to obtain the stop words present in the language that the summarized script has. Then those stop words like articles, pronouns are removed from the summarized script and then added the bigrams into them. Then the stopwords removed script is lemmatized using lemmatize module imported from genism.utils. By performing lemmatization, the words in the script are returned to the base word found in the dictionary which is also known as lemma. The use of performing lemmatization is to find the meaning behind the word. In the lemmatization process, we have taken only the proper nouns present in the script so as to keep the script very crisp and concise. The next step is to create a dictionary using the tokens that we get after lemmatization to maintain unique tokens. And then corpus is created using the doc2 bow (bag of words) module in dictionary to convert strings to numbers, since the model understands only numbers. The corpus describes the unique id for each unique tokens and the number of occurrences of the particular token in the script that we use. Then the LDA model is called and defined the number of topics under which we need to classify the tokens and passing the dictionary as the input to the model. The LDA model maps the entire corpus we have created into the number of topics that we have specified in the model. The models give out the most contributing words in
236 Fig. 3 Workflow of the event detection
P. Santhi et al.
Regional Language Translator and Event Detection Using Natural …
237
the text that we pass along with the probabilities that represent the contribution of those words. To visualize and analyze the outcome of LDA topic detection, pyLDAvis.gensim is imported and enabled the notebook (). By passing the ldamodel, corpus and dictionary as the parameters to the pyLDAvis.gensim.prepare(), we could see the classified topics and the most contributing words under each topic. One main advantage in using this visualization is that we can adjust the relevance parameter (λ) and take a look over the most contributing words in the script.
4 Results Evaluation 4.1 Performance Analysis The performance of the model is analyzed in the places where translation is done as well as in the event detection. In the translation part of the system, the performance of the system is analyzed using the ROUGE metric to calculate the accuracy of the system, by evaluating its output precision, recall and f-measure scores. Also, the three different rouge measures like Rouge 1, Rouge 2 and Rouge L are used to evaluate the performance of the model that we have created. To analyze the event detection phase, pyLDAvis package had been used to visualize and differentiate the topics and the related words that come under the topics. It also gives the percentage of tokens, which are matched under the topic classified and produced from the LDA model. Table 1 shows the performance analysis of the model. Table 1 Performance measures
Name of the measures
Values
Rouge 1 Precision
0.65
Recall
0.74
Fmeasure
0.69
Rouge 2 Precision
0.45
Recall
0.52
Fmeasure
0.48
Rough L Precision
0.61
Recall
0.69
Fmeasure
0.65
238
P. Santhi et al.
4.2 Results Analysis 4.2.1
Text Translation
In this section, we have translated the text which are presented in the input PDF which includes images. The images which are presented in the PDF will be stored in the separate folder, and paragraph which are presented in the PDF is read with the help of PyPDF2 package after reading the pdf the text is extracted by PyPDF2. Figures 4, 5 shows the input and output text of text translation. Here, English text is converted into Tamil text.
Fig. 4 Input text for the text translation
Fig. 5 Output text for the text translation
Regional Language Translator and Event Detection Using Natural …
239
Fig. 6 Event detection
4.2.2
Event Detection
Before proceeding with event detection, we have summarized the translated text. In order to gather the information about the input PDF and detect the main purpose of the Pdf, we have used Latent Dirichlet Allocation (LDA) model. By using genism and nltk libraries we have summarized, tokenized and lemmatized the script as well as removed the stopwords, accents and extracted only the proper nouns from the script and made them into corpus and created the dictionary for it. By passing the created dictionary, corpus and number of topics into LDA model, we obtained the classification of words under different categories, along with the probability value describing the occurrences of the words present in the script. By visualizing using pyLDAvis, the top most relevant words under the topics could be listed out as well as overlapping of topics could be found out with the help of Intertopic distance map. Output of the Event detection is shown in Fig. 6.
5 Conclusion and Future Work The territorial dialect interpreter is profoundly required apparatus in the nation like India. Since, India has distinctive culture and dialects. All the scripts have been distributed and circulated by the central government of India in dialect called Hindi and English. Other states which are not familiarized with the languages like Hindi and English, needed human interpreter to decipher the script. In this proposed strategy, we present the NLP based framework which is utilized to decipher the scripts naturally with the assistance of NLTK packages along with APIs. The proposed strategy will
240
P. Santhi et al.
perform well in higher exactness and operation speed compared to other existing strategy. Since in the proposed strategy as it has few dialects are accessible in NLTKs, it isn’t conceivable to change over any dialects required by the user. It too requires huge computation control for interpreting into all conceivable languages. In our framework, since translation is based on NLP methods, it can translate into any of the dialects that client demands as well as languages supported by the NLTK packages. Advance upgrades arranged to apply within the work is dealing with providing the model that supports huge data translation along with the various kinds of images with various fonts, text styles and other user specified languages, which in turn will give a fast reaction in the form of script to the client with negligible sentences, so that the client does not require to hold up to know the substance of the script, until the human interpreter is available.
References 1. Anggreani D, Putri DPI, Handayani AN, Azis H (2020) Knuth morris pratt algorithm in enrekang indonesian language translator. In IEEE 2020 4th International Conference on Vocational Education and Training (ICOVET) (pp. 144–148) 2. Strubell E, Ganesh A, McCallum A (2019) Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 3. Li J, Chen X, Hovy E, Jurafsky D (2015) Visualizing and understanding neural models in nlp. arXiv preprintarXiv:1506.01066 4. Patil Prachu J, Ritika V, Zalke, Kalyani R, Tumasare, Bhavan A, Shiwankar, Shivani R. Singh, and Shailes H Sakhare (2021) IoT Protocol for accident spotting with medical facility. Journal of Artificial Intelligence 3, no. 02, Pages 140–150 5. Shakya, Subarna (2021) A Self Monitoring and Analyzing System for Solar Power Station using IoT and Data Mining Algorithms. Journal of Soft Computing Paradigm 3(2), pp 96–109 6. Sungheetha, Akey, and Rajesh Sharma. Real Time Monitoring and Fire Detection using Internet of Things and Cloud based Drones. Journal of Soft Computing Paradigm (JSCP) 2, no. 03, Pages 168–174, (2020). 7. Ermatit E, Budianta D (2017) Fuzzy knuth moris pratt algorithm for knowledge management system model on knowledge heavy metal content in oil plants, pp. 188–192 8. R. Rahim, I. Zulkarnain, and H. Jaya (2017) A review: Search visualization with Knuth Morris Pratt algorithm, IOP Conf. Ser. Mater. Sci. Eng., 237(1) 9. Y. I. Samantha Thelijjagodzf and T. Ikeda, Japanese-Sinhalese “machine translation system Jaw/Sinhalese, Journal of the National Science Foundation of Sri Lanka, vol. 35, p. 2, (2007). 10. Weerasinghe R (2003) A statistical machine translation approach to sinhalatamil language translation,”; Towards an ICT enabled Society, p 136 11. Sripirakas S, Weerasinghe A, Herath DL (2010) Statistical machine translation of systems for Sinhala-Tamil. In 2010 International Conference on Advances in ICT for Emerging Regions (ICTer), pp. 62–68 12. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444 13. Khafaga DS, Auvdaiappan M, Deepa K, Abouhawwash M (2023) Deep Learning for Depression Detection Using Twitter Data. Intell Autom Soft Comput 36(2):1301–1313 14. Thilagamani S, Nandhakumar C (2020) Implementing green revolution for organic plant forming using KNN- classification technique, International Journal of Advanced Science and Technology, Volume 29. Isuue 7S:1707–1712 15. J. Weston, S. Chopra, and A. Bordes, Memory networks,”; arXiv preprintarXiv:1410.3916, (2014).
Regional Language Translator and Event Detection Using Natural …
241
16. M. Hu, Y. Peng, and X. Qiu, Mnemonic Reader: Machine Comprehension with Iterative Aligning and Multi-hop Answer Pointing, (2017) 17. B. Peng, Z. Lu, H. Li, and K.-F. Wong, Towards neural network-based reasoning, arXiv preprint arXiv:1508.05508, (2015). 18. K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H.Schwenk, et al., “;Learning phrase representations using RNN encoder-decoder for statistical machine translation,”; arXiv preprint arXiv:1406.1078, (2014). 19. D. Bahdanau, K. Cho, and Y. Bengio, “;Neural machine translation by jointly learning to align and translate,”; arXiv preprint arXiv:1409.0473, (2014). 20. A. Nugaliyadde, K. W. Wong, F. Sohel, and H. Xie, “;Language Modeling through Long Term Memory Network,”; arXiv preprint arXiv:1904.08936, (2019). 21. S. Boukoros, A. Nugaliyadde, A. Marnerides, C. Vassilakis, P. Koutsakis, and K. W. Wong, “;Modeling server workloads for campus email traffic using recurrent neural networks,”; in International Conference on Neural Information Processing, pp. 57–66, (2017). 22. Thilagamani S, Shanti N (2014) Gaussian and gabor filter approach for object segmentation. J Comput Inf Sci Eng 14(2):021006 23. K. Senevirathne, N. Attanayake, A. Dhananjanie, W. Weragoda, A.Nugaliyadde, and S. Thelijjagoda, “;Conditional Random Fields based named entity recognition for sinhala,”; in 2015 IEEE 10th International Conference on Industria and Information Systems (ICIIS), pp. 302–307, (2015). 24. E. Cambria and B. White, “;Jumping NLP curves: a review of natural language processing research [review article],”;IEEE Computational Intelligence Magazine, vol. 9, pp. 48–57, (2014). 25. C. D. Manning, C. D. Manning, and H. Schütze, Foundations of statistical natural language processing: MIT press, (1999). 26. Jayakody J, Gamlath T, Lasantha W, Premachandra K, Nugaliyadde A, Mallawarachchi Y (2016) “;‘Mahoshadha’, the Sinhala Tagged Corpus Based Question Answering System.” Proceedings of First International Conference on Information and Communication Technology for Intelligent Systems 1:313–322 27. Deepa K, Balamurugan D, Bharath M, Dineshkumar S, Vignesh C (2023) Recommendation System Using Deep Learning with Text Mining Algorithm in E-commerce Framework. Lecture Notes on Data Engineering and Communications Technologiesthis link is disabled 141:345– 356 28. P. Antony,”;Machine translation approaches and survey for Indian languages,”; International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 1, March 2013, vol. 18, (2013). 29. B. Hettige and A. Karunananda, “;Developing lexicon databases for English to Sinhala machine translation,”; in 2007 International Conference on Industrial and Information Systems, pp. 215– 220, (2007). 30. Harco Leslie Hendric Spits Warnars, Jessica Aurellia, Kendrick Saputra, Translation Learning Tool for Local Language to Bahasa Indonesia using Knuth-Morris-Pratt Algorithm, TEM Journal. Volume 10, Issue 1, Pages 55-62, February (2021). 31. Pandiaraja P, Sharmila S (2020) Optimal routing path for heterogenous vehicular adhoc network. International Journal of Advanced Science and Technology 29(7):1762–1771 32. Deepa K, Thilagamani S (2019) Segmentation techniques for overlapped latent fingerprint matching, International Journal of Innovative Technology and Exploring. Engineering 8(12):1849–1852 33. H. W. Herwanto, A. N. Handayani, K. L. Chandrika, and A. P. Wibawa, “Zoning Feature Extraction for Handwritten Javanese Character Recognition,”ICEEIE 2019 - Int. Conf. Electr. Electron.Inf.Eng.Emerg. InnovTechnology.Sustain. Futur., pp. 264–268, (2019). 34. H. Azis, R. D. Mallongi, D. Lantara, and Y. Salim, “Comparison of Floyd-Warshall Algorithm and Greedy Algorithm in Determining the Shortest Route,”Proc. - 2nd East Indones. Conf. Comput. Inf. Technol. Internet Things Ind. EIConCIT 2018, pp. 294–298, (2018).
242
P. Santhi et al.
35. S. A. Sini, “Enhanced Pattern Matching Algorithms for Searching Arabic Text Based on Multithreading Technology,” pp. 0–6, (2019). 36. Pandiaraja P, Aravinthan K, Lakshmi NR, Kaaviya KS, Madumithra K (2020) Efficient cloud storage using data partition and time based access control with secure aes encryption technique. International Journal of Advance Science and Technology 29(7):1698–1706 37. R. Apriyadi, “Knuth Morris Pratt - Boyer Moore Hybrid Algorithm for Knowledge Management System Model on Competence Employee in Petrochemical Company,” pp. 201–206, (2019). 38. E. Ermatita and D. Budianta, “Fuzzy Knuth Moris Pratt Algorithm for Knowledge Management System Model on Knowledge Heavy Metal Content in Oil Plants,” pp. 188–192, (2017). 39. Akhtar R (2012) Parallelization of KMP String Matching Algorithm on Different SIMD architecture: Multi-Core and GPGPU’s. Internation Journal of Computer Applications 49:0975– 8887 40. Rajesh Kanna, P., Santhi, P.,Unified Deep Learning approach for Efficient Intrusion Detection System using Integrated Spatial–Temporal Features,Knowledge-Based Systems, Vol 226 , (2021). 41. G. Duan, Sh. Weichang, C. Jiao and Q. Lin, “The implementation of KMP algorithm based on MPI + OpenMP,” 9th International Conference on Fuzzy Systems and Knowledge Discovery, vol. 10, pp. 978–1–4673–0024–7, (2012). 42. Santhi P, Mahalakshmi G (2019) Classification of magnetic resonance images using eight directions gray level co-occurrence matrix (8dglcm) based feature extraction. International Journal of Engineering and Advanced Technology 8(4):839–846 43. Nazlı, M., Effective And Efficient Parallelization of String Matching Algorithms Using GPGPU Accelerators, (2020). 44. Rajesh Kanna P, Santhi P (2022) Hybrid Intrusion Detection using Map Reduce based Black Widow Optimized Convolutional Long Short-Term Memory Neural Networks. Expert Syst Appl 194:15 45. Lu, X., November. The Analysis of KMP Algorithm and it Optimization. In Journal of Physics: Conference Series(Vol. 1345, No. 4, p.042005). IOP Publishing, (2019). 46. Budianta, D., August. Fuzzy Knuth Moris pratt algorithm for knowledge management system model on knowledge heavy metal content in oil plants. In 2017 International Conference on Electrical Engineering and Computer Science (ICECOS) (pp. 188–192). IEEE, (2017). 47. Deepa, K., Kokila, M., Nandhini, A., Pavethra, A., Umadevi, M., Rainfall prediction using CNN, International Journal of Advanced Science and Technology, 29(7 Special Issue), pp. 1623– 48. (2020). 49. Cinti A, Bianchi FM, Martino A, Rizzi A (2020) A novel algorithm for online inexact string matching and its FPGA implementation. Cogn Comput 12(2):369–387 50. Kalubandi, V.K.P. and Varalakshmi, M., February. Accelerated spam filtering with enhanced KMP algorithm on GPU. In 2017 National Conference on Parallel Computing Technologies (PARCOMPTECH) (pp. 1–7). IEEE, (2020). 51. Kottursamy, Kottilingam., A Review on Finding Efficient Approach to Detect Customer Emotion Analysis using Deep Learning Analysis. Journal of Trends in Computer Science and Smart Technology. 3. 95–113.https://doi.org/10.36548/jtcsst.2021.2.003, (2021). 52. Tripathi, Milan., Sentiment Analysis of Nepali COVID19 Tweets Using NB, SVM AND LSTM. Journal of Artificial Intelligence and Capsule Networks3.151_168.https://doi.org/10.36548/ jaicn.2021.3.001,(20 21). 53. Bashar, Abul., Survey On Evolving Deep Learning Neural Network Architectures. Journal of Artificial Intelligence and Capsule Networks. 2019. 73–82. https://doi.org/10.36548/jaicn. 2019.2.003, (2019).