286 44 15MB
English Pages [471] Year 2022
Studies in Computational Intelligence 1050
Oscar Castillo Patricia Melin Editors
New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics
Studies in Computational Intelligence Volume 1050
Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.
Oscar Castillo · Patricia Melin Editors
New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics
Editors Oscar Castillo Division of Graduate Studies and Research Tijuana Institute of Technology Tijuana, Baja California, Mexico
Patricia Melin Division of Graduate Studies and Research Tijuana Institute of Technology Tijuana, Baja California, Mexico
ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-031-08265-8 ISBN 978-3-031-08266-5 (eBook) https://doi.org/10.1007/978-3-031-08266-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
We describe in this book, recent developments on fuzzy logic, neural networks and metaheuristic optimization algorithms, as well as their hybrid combinations, and their application in areas such as intelligent control and robotics, pattern recognition, medical diagnosis, time series prediction and optimization of complex problems. There are papers with the main topics of type-1 and type-2 fuzzy logic, which basically consists of papers that propose new concepts and algorithms based on type-1 and type-2 fuzzy logic and their applications. There are also papers that present theory and practice of metaheuristics in diverse areas of application. There are interesting papers on different applications of fuzzy logic, neural networks and hybrid intelligent systems in medical problems. In addition, we can find papers describing applications of fuzzy logic, neural networks and metaheuristics in robotics problems. Another set of papers are presenting theory and practice of neural networks in different areas of application, including convolutional and deep learning neural networks. There are also a group of papers that present theory and practice of optimization and evolutionary algorithms in different areas of application. Finally, we can find a set of papers describing applications of fuzzy logic, neural networks and metaheuristics in pattern recognition problems. The papers of the book are organized into six parts that group together papers with similar topics or applications. In conclusion, the edited book comprises papers on diverse aspects of fuzzy logic, neural networks and nature-inspired optimization metaheuristics for designing and implementing hybrid intelligent systems and their application in areas such as intelligent control and robotics, pattern recognition, time series prediction and optimization of complex problems. There are theoretical aspects as well as application papers. We expect that the book will serve as reference for researchers and graduate students working in the computational intelligence area. Tijuana, Mexico November 2021
Patricia Melin Oscar Castillo
v
About This Book
In this book, recent developments on fuzzy logic, neural networks and optimization algorithms, as well as their hybrid combinations, are presented. In addition, the abovementioned methods are applied to areas such as intelligent control and robotics, pattern recognition, medical diagnosis, time series prediction and optimization of complex problems. Nowadays, the main topic of the book is highly relevant, as most current intelligent systems and devices in use, utilize some form of intelligent feature to enhance their performance. In addition, on the theoretical side, new and advanced models and algorithms of type-2 fuzzy logic are presented, which will be of great interest to researchers in these areas. Also, new nature-inspired optimization algorithms and innovative neural models are put forward in the manuscript, that are very popular subjects, at this moment. There are contributions on theoretical aspects as well as applications, which make the book very appealing to a wide audience, ranging from researchers to professors and graduate students.
vii
Contents
Neural Networks Automated Medical Diagnosis and Classification of Skin Diseases Using Efficinetnet-B0 Convolutional Neural Network . . . . . . . . . . . . . . . . . Harsh Gunwant, Abhisht Joshi, Moolchand Sharma, and Deepak Gupta
3
Modular Approach for Neural Networks in Medical Image Classification with Enhanced Fuzzy Integration . . . . . . . . . . . . . . . . . . . . . . Sergio Varela-Santos and Patricia Melin
21
Clustering and Prediction of Time Series for Traffic Accidents Using a Nested Layered Artificial Neural Network Model . . . . . . . . . . . . . Martha Ramirez and Patricia Melin
37
Ensemble Recurrent Neural Networks and Their Optimization by Particle Swarm for Complex Time Series Prediction . . . . . . . . . . . . . . . Martha Pulido and Patricia Melin
47
Filter Estimation in a Convolutional Neural Network with Type-2 Fuzzy Systems and a Fuzzy Gravitational Search Algorithm . . . . . . . . . . . Yutzil Poma and Patricia Melin
63
Optimization Artificial Fish Swarm Algorithm for the Optimization of a Benchmark Set of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cinthia Peraza, Patricia Ochoa, Leticia Amador, and Oscar Castillo Hierarchical Logistics Methodology for the Routing Planning of the Package Delivery Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norberto Castillo-García, Paula Hernández-Hernández, and Edilberto Rodríguez Larkins
77
93
ix
x
Contents
A Novel Distributed Nature-Inspired Algorithm for Solving Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 J. C. Felix-Saul, Mario García Valdez, and Juan J. Merelo Guervós Evaluation and Comparison of Brute-Force Search and Constrained Optimization Algorithms to Solve the N-Queens Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Alfredo Arteaga, Ulises Orozco-Rosas, Oscar Montiel, and Oscar Castillo Performance Comparative Between Single and Multi-objective Algorithms for the Capacitated Vehicle Routing Problem . . . . . . . . . . . . . . 141 David Bolaños-Rojas, Luis Chávez, Jorge A. Soria-Alcaraz, Andrés Espinal, and Marco A. Sotelo-Figueroa Fuzzy Logic Optimization of a Fuzzy Classifier for Obtaining the Blood Pressure Levels Using the Ant Lion Optimizer . . . . . . . . . . . . . . . . . . . . . . . 155 Oscar Carvajal, Patricia Melin, Ivette Miramontes, and German Prado-Arechiga Optimization of Fuzzy-Control Parameters for Path Tracking of a Mobile Robot Using Distributed Genetic Algorithms . . . . . . . . . . . . . . 167 Alejandra Mancilla, Oscar Castillo, and Mario Garcia-Valdez A New Fuzzy Approach to Dynamic Adaptation of the Marine Predator Algorithm Parameters in the Optimization of Fuzzy Controllers for Autonomous Mobile Robots . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Felizardo Cuevas, Oscar Castillo, and Prometeo Cortes-Antonio Evaluation of Times and Best Solutions of MFO, LSA and PSO Using Parallel Computing, Fuzzy Logic Systems and Migration Blocks Together to Evaluate Benchmark Functions . . . . . . . . . . . . . . . . . . . 205 Yunkio Kawano, Fevrier Valdez, and Oscar Castillo Fuzzy Dynamic Parameter Adaptation in the Mayfly Algorithm: Preliminary Tests for a Parameter Variation Study . . . . . . . . . . . . . . . . . . . 223 Enrique Lizarraga, Fevrier Valdez, Oscar Castillo, and Patricia Melin Optimization: Theory and Applications Symmetric-Approximation Energy-Based Estimation of Distribution (SEED) Algorithm for Solving Continuous High-Dimensional Global Optimization Problems . . . . . . . . . . . . . . . . . . . . 243 Valentín Calzada-Ledesma, Juan de Anda-Suárez, Lucero Ortiz-Aguilar, Luis Fernando Villanueva-Jiménez, and Rosa Trasviña-Osorio
Contents
xi
Optimization Models and Methods for Bin Packing Problems: A Case Study on Solving 1D-BPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Jessica González-San Martín, Laura Cruz-Reyes, Bernabé Dorronsoro, Marcela Quiroz-Castellanos, Héctor Fraire, Claudia Gómez-Santillán, and Nelson Rangel-Valdez CMA Evolution Strategy Applied to Optimize Chemical Molecular Clusters MxNz (x + y ≤ 5; M = N or M ≤ N) . . . . . . . . . . . . . . . . . . . . . . . . 281 J. M. Pérez-Rocha, Andrés Espinal, Erik Díaz-Cervantes, J. A. Soria-Alcaraz, M. A. García-Revilla, and M. A. Sotelo-Figueroa Specialized Crossover Operator for the Differential Evolution Algorithm Applied to a Car Sequencing Problem with Constraint Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Javier Manzanares, Elvi Sánchez, Héctor Puga, Manuel Ornelas, and Martin Carpio A Brave New Algorithm to Maintain the Exploration/Exploitation Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Cecilia Merelo, Juan J. Merelo, and Mario García-Valdez A New Optimization Method Based on the Lotka-Volterra System Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Hector Carreon and Fevrier Valdez Hybrid Intelligent Systems A Comparison of Replacement Operators in Heuristics for CSP Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Lucero Ortiz-Aguilar, Valentín Calzada-Ledesma, Juan de Anda-Suárez, Rogelio Bautista-Sánchez, and Natanael Zapata-Gonzalez Synchronisms Using Reinforcement Learning as an Heuristic . . . . . . . . . . 355 Omar Zamarrón, Mauricio A. Sanchez, and Antonio Rodríguez-Díaz A Mathematical Deduction of Variational Minimum Distance in Gaussian Space and Its Possible Application to Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Juan de Anda-Suárez, Lucero Ortiz-Aguilar, Valentín Calzada-Ledesma, Luis Fernando Villanueva-Jiménez, Rosa Trasviña-Osorio, and Germán Pérez-Zúñiga A Model for Learning Cause-Effect Relationships in Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Jenny Betsabé Vázquez-Aguirre, Nicandro Cruz-Ramírez, and Marcela Quiroz-Castellanos
xii
Contents
Eureka-Universe: A Business Analytics and Business Intelligence System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 José Fernando Padrón-Tristán, Laura Cruz-Reyes, Rafael Alejandro Espín-Andrade, Carlos Eric Llorente-Peralta, Claudia Guadalupe Gomez-Santillan, Alejandro Castellanos-Alvarez, and Jordan Michelt Aran-Pérez Neural Networks and Learning Extension of Windowing as a Learning Technique in Artificial Noisy Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 David Martínez-Galicia, Alejandro Guerra-Hernández, Xavier Limón, Nicandro Cruz-Ramírez, and Francisco Grimaldo Why Rectified Linear Activation Functions? Why Max-Pooling? A Possible Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Julio C. Urenda and Vladik Kreinovich Localized Learning: A Possible Alternative to Current Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 Javier Viaña, Kelly Cohen, Anca Ralescu, Stephan Ralescu, and Vladik Kreinovich What is a Reasonable Way to Make Predictions? . . . . . . . . . . . . . . . . . . . . . 475 Leonardo Orea Amador and Vladik Kreinovich
Neural Networks
Automated Medical Diagnosis and Classification of Skin Diseases Using Efficinetnet-B0 Convolutional Neural Network Harsh Gunwant, Abhisht Joshi, Moolchand Sharma, and Deepak Gupta
Abstract The skin is the biggest and most vulnerable organ in the body, so skin illnesses are common nowadays. However, the significance of skin health is sometimes underestimated. According to one survey, around 1.79% of the world population suffers from skin-related disorders. These disorders can be fatal when they are not given treatment in their early stages. As a result, skin illnesses must be recognized and diagnosed early to avert severe dangers to one’s life. However, the patient may be impacted, and they are frequently subjected to extensive examinations to determine the severity of their skin issue. As a result, we must create an expert system capable of detecting illnesses at an early stage. At the moment, just a few computerized methods are available for detecting skin illnesses, but this is an era in which substantial research is being conducted and may be further expanded. In this paper, an expert system was created using the EfficientNet B-0 model, and the model can also be used to assist specialists in more effectively and efficiently identifying and diagnosing various significant skin diseases such as (Eczema, Psoriasis & Lichen Planus, Benign Tumours, Fungal Infections, and Viral Infections). In addition, we conducted a comparison of sequential Convolutional Neural Networks (CNNs), EfficientNet B0, and ResNet50. Through these models, the reasons for recognized skin illness may be defined, and therapy may be offered. The Python programming language was utilized to implement the models. The dataset was obtained via DERMNET. Using EfficientNet-B0 to train the model and predict outcomes at an epoch value of 10, we attained an accuracy of 91.36%. Keywords Expert system · CNNs · EfficientNet B0 · ResNet50 · Skin diseases · Dermnet H. Gunwant · M. Sharma (B) · D. Gupta Department of Computer Science, MAIT, GGSIPU, New Delhi, India e-mail: [email protected] D. Gupta e-mail: [email protected] A. Joshi Department of Information Technology, MAIT, GGSIPU, New Delhi, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_1
3
4
H. Gunwant et al.
1 Introduction One of the most important organs in the human body is the skin. and most exposed in the environment. It covers around 1.5–2.5 msq and has roughly 4–5 kg body mass. The sense organ and the most critical body part is the skin. It serves several purposes, including protection, sensation, and regulation. The skin has three layers (as illustrated in Fig. 1): Dermis is thicker than Epidermis and contains all oil and sweat glands, whereas Subcutaneous is beneath the Dermis and offers insulation to our bodies well as acting as a cushion for critical organs. Below is a quick summary of several skin conditions: (i)
(ii) (iii) (iv) (v) (vi) (vii)
Rash: Any alteration in the look of the skin is a rash. The variations in rashes are from skin irritation and superficial pores, and some may also be the result of a medical condition. Eczema: It is a skin inflammation that results in an itchy rash. An overactive immune system primarily causes it. Dandruff: Psoriasis, eczema, or seborrheic dermatitis causes the scaling of the scalp this is known as dandruff. Dermatitis: It is an umbrella term for skin inflammation. Atopic dermatitis is the most common type. Acne: This may be the most frequently occurring skin condition, that usually affects over 80% of the population at some point in their lives. Melanoma: It is the worst kind of skin cancer caused by UV radiation from the sun and other factors. It can be identified with a skin biopsy. Psoriasis: It is an autoimmune disorder that manifests as a variety of different skin rashes. The most common symptom is the appearance of silvery and scaly plaques on the skin.
Fig. 1 Skin anatomy
Automated Medical Diagnosis and Classification …
(viii)
(ix)
(x)
(xi)
(xii)
5
Rosacea: The skin disorder is something that appears on the face as a red rash. Acne-like lesions may be visible on the skin, but their characteristics are not well recognized. Additionally, they should be treated as soon as possible. Basal cell carcinoma: The most prevalent kind of skin cancer is called basal cell carcinoma. Because it develops more slowly than melanoma, it is less hazardous. Warts: Virus attacks the skin, causing the body to produce extra growth., resulting in the formation of a wart. Warts can be removed by a physician or they can also be treated at home. Skin abscess (boil or furuncle): local skin infection results in the accumulation of pus underneath the skin. To be treated, some of it must be opened and drained by a physician. Cellulitis: Swelling of the dermis as well as hypodermis tissues. and subcutaneous tissues, most often induced by infection. It manifests as a red, warm, and painful rash on the skin.
Skin illnesses are nowadays more frequent, and one of the most frequent disorders people of all ages suffer. Skin disorders can become harmful and can unless early diagnosed and handled, lead to severe health and financial problems. On the other hand, early detection and diagnosis may prevent a deadly disease. However, owing to the many symptoms, To study pore and skin illness, you need to know a lot about pores and skin. A computerized diagnostic system should thus be seen, as it takes time to diagnose skin illness correctly. In addition, each patient is subject to very costly monitoring costs with a dermatologist [1, 2]. So, we will describe several techniques that can identify skin illnesses through this article, to improve diagnosis effectiveness to firmly established diagnosis methods that will help produce a more trustworthy, more time-efficient, and less costly diagnosis. We have done a comparative analysis between the different models to better understand and use these models according to the requirements. The main goal of our project is to study the ability of various pre-trained networks to extract features. Although many pre-trained models have emerged as “the best” for knowledge transfer, no consensus has been reached on which model is superior. ResNet50 has been demonstrated to be the best in various studies. EfficientNet has been found to outperform other pre-trained models in various tests. However, in our work, we employ PCA to demonstrate that the EfficientNet pre-trained model has superior feature extraction performance. In 2019, Tan and Le presented a new model called EfficientNet, which requires the fewest FLOPS. As well as handling routine picture categorization transfer learning tasks, it can reach State-of-the-Art accuracy in the ImageNe competition [3]. The Residual Network is a well-known neural network utilized to serve a wide variety of computer vision tasks [4, 5]. The ResNet enabled us to effectively train massive and dense neural networks with 140+ layers, a significant advance. The critical aspects of the technique proposed are as follows:
6
(i)
(ii) (iii) (iv)
H. Gunwant et al.
In photographs of different types of body regions of skins, the technology may identify skin problems. It also details the reasons and processes of the skin problem detected, along with a summary of how it is treated. Using an EfficientNet B0, the data set is trained, and results for new images are generated. Five main skin conditions are detected in Eczema, Psoriasis & Litch Planus, Benign Tumour, Viral Infections, and Fungal Infections. Previous methods could not get an accuracy of 91.36%. The approach we’ve taken has neutralized these difficulties and is more precise than the approaches developed in the past.
A systems expert system contains an inference engine and a knowledge base. Systems that give advice or choices are referred to as expert systems. Python was used to build the expert system. As is the case with most expert systems, a base, an interference engine, and the UI comprises typical components. The paper is aligned parallel to the edge of the table. Various past system implementations and approaches for picture categorization are covered in Sect. 2. Section 3 describes the suggested approach and stages in the system’s development. Section 4 details system execution and the outcomes generated. Moreover, last, the final remarks paper is part of the final component.
2 Related Work Skin-related diseases are ranked fourth as the most common cause of human illness. Skin-related diseases have been growing to a great extent. There is an urgent need to detect skin-related diseases early as they can also become life-threatening. Skin image identification and classification has become a significant research area in the past few years. Muhammad Zubair Asghar presented an online expert system based on Java technology in his article. The expert rules in a tree graph were submitted and inferred using the search-first-progress method. However, advancements in technology have enabled us to design more accurate ways that apply a system based on predefined rules and indicators [1]. M. Shamsul’s study proposes the use of feedforward backpropagation artificial neural networks. The system is driven by visual input, namely high-resolution color photographs that have been pre-processed and patient history [2]. Damilola uses a prototyping technique for his solution. The purpose of this system is to merge PSL (Pigmented skin lesion) image data, analysis, matching observations, and conclusions reached by medical specialists into one image [6]. Munirah M. proposes an online system that uses an inference engine that is rule-based and uses forward chaining to search for symptoms associated with a particular skin illness through the use of an online questionnaire [7]. Finally, Ranjan Parekh suggested an automated technique for identifying illness conditions utilizing skin texture photographs and Grey Level Co-occurrence Matrices in his study (GLCM)) [8].
Automated Medical Diagnosis and Classification …
7
AALC Amarathunga, in his paper, proposed another Expert online system in which one can upload an image, and it can answer some of the questions that will be according to their skin disease condition [9]. This can increase the accuracy of detecting skin disease as it uses both image processing and data mining techniques to evaluate the disease further. Pravin S. Ambad, in his research paper, proposed an imaging system in which the user can upload the image of the skin rash or mole and get real-time updates on whether you should contact the doctor or not [10]. Skin disease is a type of infection or bacterial infection that affects the skin. A new method, described in Nisha Yadav’s work, incorporates filtering, segmentation, feature extraction, picture pre-processing, and edge detection to create a new image processing method [11]. Dermatological conditions are one of the most prevalent illnesses on the planet. Sourav Kumar Patnaik used three image recognition architectures in developing a maximum voting system: Inception V3, Inception Resnet V2, and Mobile Net [12]. Anurag Kumar Verma proposed a method that utilizes six machine learning techniques, which incorporates three ensemble approaches to identify unique classes of skin condition utilizing three of these machine learning techniques. Furthermore, in his article, he evaluated the outcomes from several machine learning algorithms using a feature selection step [13]. The Ketut Gede Darma Putra examination showed a skin ailment identification image. Through machine learning and image processing, he was able to obtain this result. K-Means Clustering was used for segmentation, while DWT and Color moments feature extraction was employed for feature extraction [14]. We need to spread awareness about skin disease in rural areas where access to dermatologists is limited. G. RAJASEKARAN, in his research paper, proposed a method that uses image pre-processing and feature extraction in the first step and then machine learning techniques as the next step [15]. They created a scaling model for convolutional neural networks, proposing Mingxing Tan and Quoc V. Le as authors. and the resultant approach was named Efficient Net. Comprehensive research of model scaling and identification found that properly balancing network depth, breadth, and resolution may improve performance. Efficient Net produced superior outcomes [3]. Deevyankar Agarwal, in his paper, proposed an automated diagnosis of covid-19 patients using Efficient Net. A CNN was developed using EfficientNet architecture [16]. Transfer learning helped Jiangchuan Liu to fine-tune the model parameters. A small sample of maize disease datasets (by kind) is more accurately and rapidly diagnosed with more efficient network identification technology [17]. Finally, Nils Dessert suggested using an ensemble of deep learning models, including Efficient Nets, ResNet WSL, and SENet, to classify skin lesions [18]. On the DERMNET dataset, we present a system that classifies five distinct types of skin disorders using several pre-processing techniques for image processing and various models such as CNN and its architectures, EfficientNet-B0, and ResNet50. Additionally, a comparison of the three has been conducted.
8
H. Gunwant et al.
3 Methodology This section will explain how the images have been pre-processed into the model for training and the validation to detect the type of skin disease. Also, we will see how the deep learning models are successful for image classification tasks. Since there have been significant advances in Deep Learning architectures called Convolutional Neural Networks, the Deep Learning approaches have shown to be quite successful in picture categorization (CNN). The ImageNet Large Scale Visual Recognition Challenge in 2012 honored AlexNet for the impact of their work in computer vision on the field [19, 20]. The CNN architecture has since then seen continued advancement in the Deep Learning community. CNN designs well-known to the public include VGG, GoogleNet, Resnet, and EfficientNet. While CNN’s excel at classifying images according to location invariant characteristics, CNNs can also successfully do image classification tasks. Furthermore, while training CNN as soon as CNN learns the filters that serve as feature extractors, it automatically optimizes the machine learning models. This filter helps the model learn and discriminate between various types of pictures fed into the model. This study leverages CNN and two of its architectures to classify Skin disease images into five diseases like Benign Tumours, Fungal Infections, Viral Infections, and Eczema, Psoriasis, and Lichen Planus. Below the Fig. 2 shows the flow process used in the paper, and further, we will see the architecture of the different expert systems used for the simulation. Fig. 2 The flow diagram of the expert system used
Automated Medical Diagnosis and Classification …
9
3.1 EfficientNet-B0 EfficientNet is a type of ConvNet that is one of the most efficient, effective, and accurate [21]. The study’s authors presented a new Convolutional neural network scaling methodology in which to increase accuracy and efficiency, the breadth, resolution, and depth of ConvNets are all scaled equally [3]. Using the new scaling approach, a new set of models termed EfficientNets was built. The eight original EfficientNet models are called EfficientNet-B0 through EfficientNet-B7. Following are scaled variations of the basic model, EfficientNet-B0. The architectural complexity and quantity of model parameters increase as we progress from B0 to B7. Following extensive testing and comparison, it was discovered that EfficientNet-B0 provided the best accuracy and efficiency. Because these models have low FLOPS costs and fewer parameters, EfficientNet was chosen. As a result, they have cutting-edge picture categorization results that can be used on various datasets. Reference [3] quotes the authors as follows: “On ImageNet, our EfficientNetB7 is 84.4% top-1 97.1% Top-5 and 8.4 × smaller and 6.1 × quicker than the best current ConvNet. Our EfficientNets also transfer effectively to CIFAR-100 (91.7%), Flowers (98.8%), and three other transfer learning datasets. The goal was to see if EfficientNets could reproduce its success on the Skin disease dataset, given it had done so well on previous datasets. Furthermore, EfficientNet proved to be an excellent pick. Not only did we outperform Resnet50 and CNN in terms of performance, but our model also contains fewer parameters (5.3 million) than Resnet50 and CNN.
3.1.1
Architecture
A Neural Architecture Search technique was used to create the baseline architecture for EfficientNet-B0, which was optimized for FLOPS and accuracy. The EfficientNetB0 architecture is shown in Fig. 3. The first step is the design of the stem network, which is illustrated in Fig. 4.
Fig. 3 Architecture of EfficientNet-B0
10
H. Gunwant et al.
Fig. 4 STEM of each architecture in EfficientNet B0-B7 Model
Fig. 5 Final Layer of the architecture in EfficientNet B0 Model
Then, the design of the various architectures begins, which is consistent in all of the EfficientNet models and is represented in Fig. 5. Next, each of them has seven blocks inside of it. These blocks are further equipped with a variable number of sub-parts when we proceed from EfficientNetB0 to EfficientNetB7, the number of sub-blocks changes EfficientNet-B0 contains 237 layers, whereas EfficientNet-B7 has 813 layers. Do not worry, though. These 5 modules may be used to create all these layers. The sub-blocks are formed by combining more specific modules. This design from Fig. 3 describes the EfficientNet B0 model perfectly. (i) (ii) (iii) (iv) (v)
Module 1—This is the base for the sub-blocks. Module 2—This is the first of the seven major blocks and blocks in sub-blocks start here. Module 3—This is an interconnected chain of sub-blocks, all of which are connected as a skip to the main block. Module 4—This is used for connecting the sub-blocks so that each can have its skip connection. Module 5—This module connects the two preceding sub-blocks via a skip connection.
The kernel size for convolution operations, as well as the resolution, channels, and layers in EfficientNet-B0, are listed below in Table 1.
3.1.2
Compound Scaling
The breadth, depth, and resolution of ConvNets can be scaled in three dimensions. Earlier attempts to scale ConvNets were solely focused on scaling. One of the three dimensions demonstrates that scaling while balancing the three dimensions of
Automated Medical Diagnosis and Classification …
11
Table 1 Input parameters for EfficientNet B0 model Stage i
Operator F i
Resolution H i × wi
No. of Channels ci
No. of layers L i
1
Conv3 × 3
24 × 224
32
1
2
MBConv1, k3 × 3
112 × 112
16
1
3
MBConv6, k3 × 3
112 × 112
24
2
4
MBConv6, k5 × 5
56 × 56
40
2
5
MBConv6, k3 × 3
28 × 28
80
3
6
MBConv6, k5 × 5
14 × 14
112
3
7
MBConv6, k5 × 5
14 × 14
192
4
8
MBConv6, k3 × 3
7×7
320
1
9
Conv1 × 1 & Pooling & FC
7×7
1280
1
breadth, network-depth, and resolution yields more efficient, effective, and accurate models than ever before. The goal of compound scaling is to: “We should enhance network depth for higher resolution images so that the larger receptive fields can help catch similar features that include more pixels in larger images. If the resolution is increased, we also need to extend the breadth of the network to accommodate larger and more detailed patterns in photographs with a higher resolution. This suggests that we should use many scaling dimensions in tandem, rather than utilizing a single dimension that scales everything uniformly [3]. depth: d = α φ
(1)
width: W = β φ
(2)
resolution: r = γ φ
(3)
s.t α · β 2 · γ 2 ≈ 2,
(4)
α ≥ 1, β ≥ 1, γ ≥ 1 In the preceding Eqs. (1), (2), (3), (4), is the compound coefficient, which the user can specify for scale based on the resources available. The constants specify how these resources will be distributed over the ConvNet’s depth, breadth, and resolution. Grid search can also be used to produce, and then a user-specified compound coefficient can be used to scale a baseline model (Fig. 6).
12
H. Gunwant et al.
Fig. 6 Model size versus accuracy comparison. EfficientNet-B0
3.2 ResNet50 Kaiming He et al. introduced ResNet. in 2015 [4]. ResNet is a well-known neuronal network used as a support for several tasks in computer vision. ResNet is a residual network. Because of the problem of vanishing gradients, dense neural networks were complicated before ResNet training. ResNets are created to learn residual functions associated with the layer inputs instead of unreferenced functions. Residual nets allow these layers to be utilized as residual mappings, not just as additional ones. Residual blocks are stacked on top of each other to construct a network such as a ResNet-50: a ResNet-50 has fifty layers made up of these residual blocks. A “shortcut connection to identification” can be created by using the techniques described in Fig. 7. Fig. 7 Residual identity mapping
Automated Medical Diagnosis and Classification …
13
Fig. 8 Architecture of ResNet 50
The ResNet models used previously skipped two layers, but for ResNet50, they skipped three layers, and 1 × 1 convolution layers were added, which can be seen in the ResNet50 Architecture as shown below in Fig. 8.
3.3 CNN The heart of AlexNet is a convolutional neural network (CNN), an artificial network that mimics significantly the human visual system [19]. Figure 9 demonstrates that Convolutional Neural Network (ConvNet/CNN) constitutes a deep learning method that may use an input picture to distinguish between aspects/objects (learnable weights and biases). Convolutional neural networks are composed of many artificial neuron layers. Artificial neurons are mathematical functions that provide an activation value after adding together several weighted inputs. CNN image classifications capture, evaluate and classify a picture into several groups (E.g., Dog, Cat,
Fig. 9 Numerous convolutional layers in a neural network
14
H. Gunwant et al.
Tiger, Lion). Computers see an image input as a pixel array, depending on the picture resolution, with several pixels variable. It will view, h x w x d based on the image resolution. (w = Width, h = Height, d = Dimension). Several CNN architectures are used for Image Processing, Feature Extraction, etc. Some of the Architectures are VGG, Inception, MobileNet, ResNet, etc.
3.4 Dataset The suggested CNN and its architectures make use of a DERMNET dataset. The collection contains skin conditions classified as Eczema, Psoriasis, Lichen Planus, benign tumors, fungal infections, and viral infections. It is pre-classified into five classes. Each of the five categories of skin disorders has several photos ranging in size from 1900 to 2500. These photographs depict skin diseases in various body areas, including the neck, forehead, hands, back, and legs. The acquired dataset is pre-processed using a data augmentation technique that generates multiple copies of each image by rotation, flipping, and resizing operations, increasing the dataset’s size and improving training efficiency and effectiveness. In addition, it assists in avoiding underfitting. Figure 10 depicts representative images of skin disease kinds from the dataset, whereas Fig. 11 depicts the dataset’s format before pre-processing. The skin illnesses depicted in the Dataset Sample images (Fig. 10) are as follows: (i) (ii) (iii) (iv) (v)
Eczema Psoriasis and Litchen Planus Benign Tumor Bacterial Infections Fungal Infections
Separating before processing into training and testing datasets allows for more reliability in determining if the model is functioning correctly. The training set consists of 80% of the total images, and the model is being trained using the training set. The model then estimates the optimal weights that best suit the ANN to predict all categories with the least amount of error. Also known as the Validation subset, the remaining 20% of the dataset’s images make up the testing dataset. This dataset is employed to guarantee the model’s accuracy. Also, the Model is used to predict test images, which are then compared to the desired output to determine the model’s efficiency and efficacy.
3.5 Experimental Setup The system was built on a computing environment employing mobile computers. The system on which our laptop is based on a seventh-generation Intel Core i5 CPU
Automated Medical Diagnosis and Classification …
Fig. 10 Illustrations from several classes
15
16
H. Gunwant et al.
Fig. 11 Pre-processed representation of the dataset
that has a 2.1 GHz clock speed, 3 MB cache memory, and 8 GB of RAM installed on a 1 TB hard drive (ver. 20H2, 64 bit). Python 3.2 with TensorFlow, NumPy, Scikit-learn, and Pandas is available at the Google Collaboratory.
4 Results and Discussion It is noted that the Convolutional Neural Network, ResNet50, and EfficientNet-B0 models all produce accurate results for all five skin conditions. However, the best results are obtained when one of the CNN designs is used, namely EfficientNetB0, which has a precision of 91.36%. Therefore, we pre-processed the dataset by conducting data augmentation to enhance its size, trained and tested it using Convolutional Neural Network, ResNet50, and EfficientNet-B0 models. EfficientNet-B0 does classification based on feature extraction. Therefore, no classifier is required. The maximum accuracy is obtained when the batch size is 32 samples, and the epoch is set to 10. As shown in Fig. 12 and table. Two, it is clear that EfficientNet-B0, with an accuracy of 91.36%, has more accuracy and sensitivity than CNN and ResNet50 when trained on the DERMNET dataset to classify the images into five classes. Sensitivity measures the proportion of people suffering from the disease who got predicted correctly as those suffering from the disease. Sensitivity is also termed Recall. ResNet50 and CNN got an accuracy of 87.49 %and 84.52%, respectively (Table 2). EfficientNet-B0 had a 224 × 224 input with ReLu as the activation function, ten epochs, and batch size is kept constant, i.e., 32 also there are 237 layers. The EfficientNet-B0 model is a simple baseline architecture for mobile devices learned on the ImageNet dataset. EfficientNet-B0 has the fewest parameters, around 5.3 million; ResNet50 and CNN have 25.6 million and 60 million, respectively. For all three
Automated Medical Diagnosis and Classification …
17
Fig. 12 Accuracy of different models
Table 2 Accuracy and sensitivity for different models
Model
Accuracy (%)
Sensitivity (%)
CNN
84.52
81.75
ResNet50
87.49
85.34
EfficientNet-B0
91.36
92.68
models, the learning rate is set at 0.0001. EfficientNet-B0 outperforms ResNet50 and CNN on this dataset in terms of performance and accuracy. The parameters used in all three models are listed in Table 3 below:
5 Limitation of the Study The limitation of our study is only about the models that were used. The EfficientNet Model starts from B0 to B7. However, we have used the B0 model in our study due to the machine and space limitation as EfficientNets’ input size grows from 224*224
18
H. Gunwant et al.
Table 3 Hyperparameter for the three models Hyperparameters
Values for ResNet50
Values for EfficientNet-B0
Values for CNN
Batch Size
32
32
32
Epochs
10
10
20
Activation Function
ReLu
ReLu
ReLu
Dropout
0.5
0.5
0.5
No. of layers
50
237
6
Input shape
(150,150)
(224,224)
(64,64)
in B0 to 600*600 in B7. As a result, the complexity of space and time increases. EfficientNetB7 will take a long time to train on a large dataset. Going from 5.3 million in B0 to 66 million in B7, there is an increase in the parameters. So, for using a model above B0, we need a high-performance machine.
6 Conclusion and Future Scope This paper conducts a comprehensive examination and comparison of three models, ResNet50, EfficientNetB0, and CNN, taking into account the many factors used to forecast skin disease types. Thus, based on the application need, the research assists in determining which model to select solely based on the advantages and disadvantages of each design. Additionally, a comparison of the three models above was conducted to detect and identify the critical five skin illnesses, namely Eczema, Psoriasis and Lichen Planus, Benign Tumors, Fungal Infections, and Viral Infections. Additionally, it has been observed that using EfficientNet-B0 results in a significant boost in illness detection accuracy (about 4%) when compared to the other two models utilized. In the future, EfficientNet-B0 may be regarded as a safe bet for achieving superior outcomes in image prediction and classification—accuracy attained at a significantly reduced computational cost in these models. Additionally, we may expand the dataset for skin disease categorization. These algorithms/models may be used in a variety of applications, including biomedical and image processing.
References 1. Asghar, M., Asghar, J., Saqib, S., Ahmad, B., Ahmad, S., Ahmad, H.: Diagnosis of skin diseases using online expert system. Int. J. Comput. Sci. Inf. Secur. (IJCSIS) 9(6), 323–325 (2011) 2. Shamsul Arifin, M., Golam Kibria, M., Firoze, A., Ashraful Amini, M., Yan, H.: Dermatological disease diagnosis using color-skin images. In: 2012 International Conference on Machine Learning and Cybernetics, pp. 1675–1680 (2012). https://doi.org/10.1109/ICMLC.2012.635 9626
Automated Medical Diagnosis and Classification …
19
3. Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. International Conference on Machine Learning (2019). arXiv:1905.11946 4. Albawi, S., Mohammed, T.A., Al-Zawi, S.: Understanding of a convolutional neural network. 2017 International Conference on Engineering and Technology (ICET), pp. 1–6 (2017). https:// doi.org/10.1109/ICEngTechnol.2017.8308186 5. Sharma, M., Jain, B., Kargeti, C., Gupta, V., Gupta, D.: Detection and diagnosis of skin diseases using residual neural networks (Resnet). Int. J. Image Graph. (2020). https://doi.org/10.1142/ s0219467821400027 6. Okuboyejo, D., Olugbara, O., Odunaike, S.: Automating skin disease diagnosis using image classification. In: Lecture Notes in Engineering and Computer Science, Proceedings of the World Congress on Engineering and Computer Science 2013, Vol II WCECS, 23–25 October 2013, San Francisco, USA 7. Mohd Yusof, M., Ab Aziz, R., Fei, C.: The development of online children skin diseases diagnosis system. Int. J. Inf. Educ. Technol. 3(2), 231–234 (2013). https://doi.org/10.7763/ IJIET.2013.V3.270 8. Parekh, R., Mittra, A.K.: 2011: Automated detection of skin diseases using texture features. Int. J. Eng. Sci. Technol. 3(6), 4801–4808 (2011) 9. Amarathunga, A.A.L.C., Ellawala, E.P.W.C., Amalraj, C.R.J., Abeysekara, G.N.: Expert system for diagnosis of skin diseases. Int. J. Scientific Technol. Res. 4(1), 174–178 (2015). ISSN 2277-8616 10. Ambad, P., Shirsat, A.: An image analysis system to detect skin diseases. IOSR J. VLSI Signal Process. 6, 17–25 (2016). https://doi.org/10.9790/4200-0605011725 11. Yadav, N., Narang, V.K., Shrivastava, U.: Skin diseases detection models using image processing: a survey. Int. J. Comput. Appl. 137(12), 0975–8887 (2016) 12. Patnaik, S.K., Sidhu, M.S., Gehlot, Y., Sharma, B., Muthu, P.: Automated skin disease identification using deep learning algorithm. Biomei. Pharmacol. J. 11(3) (2018). https://doi.org/10. 13005/bpj/1507 13. Verma, A.K., Pal, S., Kumar, S.: Prediction of skin disease using ensemble data mining techniques and feature selection method—a comparative study. Appl. Biochem. Biotechnol. 190, 341–359 (2020). https://doi.org/10.1007/s12010-019-03093-z 14. Putra, K.G.D., Wiastini, N.P.A.O., Wibawa, K.S., Putra, I.M.S.: Identification of skin disease using K-means clustering, discrete wavelet transform, color moments and support vector machine. Int. J. Mach. Learn. Comput. 10(4), 542–548 (2020). https://doi.org/10.18178/ijmlc. 2020.10.4.970 15. Rajasekaran, G., Aiswarya, N., Keerthana, R.: Skin disease identification using image processing and machine learning techniques. Int. Res. J. Eng. Technol. (IRJET) 7(3) (2020) p-ISSN: 2395–0072 16. Marques, G., Agarwal, D., de la Torre Díez, I.: An automated medical diagnosis of COVID19 through EfficientNet convolutional neural network. Appl. Soft Comput., 106691 (2020). https://doi.org/10.1016/j.asoc.2020.106691 17. Liu, J., Wang, M., Bao, L., Li, X.: EfficientNet based recognition of maize diseases by leaf image classification. J. Phys. Conf. Series 1693, The 2020 3rd International Conference on Computer Information Science and Artificial Intelligence (CISAI) 2020 25–27 September 2020, Inner Mongolia, China (2019) 18. Gessert, N., Nielsen, M., Shaikh, M., Werner, R., Schlaefer, A.: Skin lesion classification using ensembles of multi-resolution efficientnets with metadata. MethodsX, 100864 (2020). https:// doi.org/10.1016/j.mex.2020.100864 19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386 20. Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y 21. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https:// doi.org/10.1109/CVPR.2016.90
Modular Approach for Neural Networks in Medical Image Classification with Enhanced Fuzzy Integration Sergio Varela-Santos and Patricia Melin
Abstract In recent years digital image analysis has become a subject of interest for computational scientists partially because of the data-driven society we have today which makes it possible for large datasets of information like images to be widely spread and accessible to more users than ever. This is also because of the resurgence of popularity for neural networks with the deep architectures like Convolutional Neural Networks (CNN) specialized on this type of data. Several of these large datasets are dedicated to medical image diagnosis by classifying medical images like X-Rays or CT scans. CNN’s offer a robust learning system that processes and reduces the information contained in an image dataset, other approaches make use of predefined feature extractors like image texture features and use them as inputs for classifiers to learn and categorize, both paradigms have been used with success in image-related problems. In the presented work the contribution is a robust modular approach for combining these two paradigms of image analysis into one cooperative system that is unified by a fuzzy system designed for an output integration resulting in one image classifier that using both perspectives give back one unified response. Experiments were performed on benchmark image datasets and medical image focused datasets for diagnosing Pneumonia, COVID-19 and Tuberculosis with favorable results, giving a more comprehensive image analysis and an alternative to single model architectures that are more commonly used. Keywords Texture features · Deep learning · Image classification · Local binary patterns · X-ray · Neural networks
S. Varela-Santos · P. Melin (B) Tijuana Institute of Technology, Tijuana, BC, Mexico e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_2
21
22
S. Varela-Santos and P. Melin
1 Introduction Computer Aided Diagnosis systems have supported physicians in analyzing several different types of diseases in patients over the years [1]. These systems are produced from the research advancements in diagnostic techniques created by the combined effort of medical and computer scientists. CAD systems are commonly supported by the usage of inference and classification computer algorithms which perform automatic diagnostics of medical data, in the form of medical images [2, 3], electrical signals obtained from the patient’s body [4], readings of medical devices [5], and many other forms of data extraction regarding an individual’s health [6]. Regarding these supporting classification algorithms, hybrid systems like neurofuzzy models [7–9] have become more important in recent years, because with the creation of larger image datasets for testing medical diagnostic models, the models itself may need to include different inference mechanisms not found in a single model, but possible with the combination of models from different areas of that work different in nature but provide an accurate response to the medical image diagnostic problem (image classification problem). In this work, we present an approach to combine classification results from specialized neural network models by using a fuzzy response integrator (Type-1 Fuzzy System) for n-class image classification problems.
1.1 Medical Images Nowadays, accessibility to digital medical image databases has increased, since the late 90’s changed their workflow to a modern operation using digital medical images by implementing the usage of Picture Archiving and Communication Systems (PACS) [10], by making this transition and by the surge of interest in digital image analysis created by the new advancements in modern classification models like deep learning, Hospitals and Research Centers have started to collaborate in order to release Large Medical Image Datasets for scientific usage [11]. In the field of medical image diagnostics radiology is one of the most important areas in which advances in computer-aided decision support have been generated. This is thanks to the now widely available image datasets for many “Chest X-Ray” detectable diseases [12, 13]. Medical Images are commonly defined by the method how images are produced. For ultrasound, these are generated by using sound waves on the body and are able to obtain images of a patient’s anatomy, in the case of radiation-based like X-rays, are generated by machines that shot a single x-ray source unto the tissue generating two-dimensional images. Computed tomography (CT) scanners generate higher resolution images with the usage of multiple sources of x-rays [14].
Modular Approach for Neural Networks …
23
1.2 Medical Image Classification Using Neural Networks Because of these factors the development of new and better solutions using diagnostic technology have been created to help the medical sector. The better a group of data represents a problem, the better the model generated by that group of data would be able to solve real-world problems. Newer models like convolutional neural networks (CNN) require large amounts of images in order to perform an accurate classification and solution to the diagnostic problem [15]. In order to perform a diagnostic assessment on medical images, we need to first pre-label a group of data like images, into pre-defined classes that represent the different states of the problem, the diagnostic problem can be considered a “classification problem” because it involves giving a prediction or result based on the input data and grouping that prediction to one of the pre-defined states of the problem. For example, in a medical problem the patient either will have signs that indicate a verdict of “having a disease”, “not having a disease” and “having more than one disease” each of these verdicts can be grouped in a class, one class commonly is the “healthy patient”, “patient diagnosed with disease 1”, “patient diagnosed with disease 2” and so forth. In this regard, Neural Networks are a commonly used classification model because it has produced great results in several binary and multiclass problems over the years [16, 17], many of which have created feasible medical diagnosis support systems [18–20]. Particularly in the area of medical image classification Neural Networks work in tandem with computer vision algorithms to detect the most important features of the image dataset and use that information to classify with high accuracy [21]. In Chauhan et al. [22] a DenseNet model was trained for classification using transfer learning on images obtained from Covid-19 patients, the images itself are Chest X-ray images divided into 2 classes, the Covid class and a healthy-normal one. The results seem promising with a 98.45% accuracy for the normal class and a 98.32% for the Covid class. In Varshmi et al. [23] a Deep Convolutional Networks was used as a feature extraction algorithm, extracting the best features via the convolution operation and instead of using a classification layer like in the common model of CNN’s the authors used another well-known classifier in the Support Vector Machines Classifier (SVM), the experiments were done on a Chest X-Ray image dataset for the diagnosis of Pneumonia. The images were resized to the 224 by 224, standard input size of DenseNet models, the final results were evaluated using AUC metric which reached a 0.8002 in the best result. In Verma et al. [24] other model focused on the Pneumonia disease is created, this time using convolutional neural networks and data augmentation in order to make up for the lack of images, the augmentation is done by rotating the images and resizing them and using them with the normal images in order to extract meaningful features regardless the angle of the pictures, retaining data invariant characteristics.
24
S. Varela-Santos and P. Melin
1.3 Fuzzy Logic Combined with Modular Neural Networks Fuzzy Logic was developed by the mathematician Zadeh [25] as an extension to set theory in order to consider the numbers in between 0–1. This principle applied to a classification algorithm considers a prediction as a combination of memberships of more than one class, which in itself may provide a better solution in some cases. The application of fuzzy logic in hybrid models with other machine learning techniques like neural networks, and in particular modular or ensemble-based architectures have been used since the last decade [26], and still is an important design when we need an extra accuracy on the model’s results. When working with ensemble or modular models, the problem is divided into sections, most of which are treated separately, performing an analysis based on each section’s capabilities. Commonly the models selected have some sort of relationship to the problem to solve or at least provide a partial solution to the problem, so when we combine different models that have already provided a solution to a classification problem, we are combining good classifiers into one that provides many solutions to the problem, one for each model, so what would be the correct way to select the classification criteria based on many solutions?, this would be done using an integration mechanism [27], many algorithms have been proposed for this solution, from simple average and weighted average of the solutions [28], the highest of the solutions, and by performing a tie-breaking or combination of the outputs [29]. Another way of combining them is by the usage of a Fuzzy logic-based system, like a type-1 Mamdani fuzzy system. These systems work with a similar input–output system than the one used in neural networks but base its function on rules guided in fuzzy logic. Fuzzy Logic represents an important alternative because aside from the outputs of well-known classifiers, it takes consideration of uncertainty into the problem, which is inherent in medical problems and in the way, humans think and see the world. Classification models that include fuzzy logic have been able to provide good solutions into real-world problems. In Ontiveros-Robles et al. [30] the authors present a study using type-2 fuzzy classifiers for medical diagnosis problems, performing classification on datasets for diverse medical problems like breast cancer (Breast Cancer Wisconsin, Breast Cancer Coimbra), liver patient identification (Indians Liver Dataset), diabetes patient identification (Pima Indians Diabetes Dataset), symptoms of heart diseases (SPECT Heart), primarily using intelligent models based on type-2 fuzzy logic and supervised learning with a focus on modelling the data uncertainty of each medical dataset. In Mousavi et al. [31] a robust hybrid intelligent system is proposed, which is based on the combination of the Harmony Search (HS) algorithm and two type-1 fuzzy systems that perform the classification, the idea is to analyze medical datasets and by optimizing a fitness function obtain fuzzy-rules using a modified version of the HS algorithm, the fuzzy rules are personalized for each dataset and then used to perform an accurate classification solution. In Melin et al. [32] another hybrid model for classification is proposed, this time for the risk diagnosis in developing a heart disease like high blood pressure patients,
Modular Approach for Neural Networks …
25
the hybrid model was comprised of modular neural networks for pattern recognition and fuzzy logic classifiers, the risk assessment was done by using multiple attributes like age, risk factors and records of ambulatory blood pressure monitoring. The fuzzy logic systems are able to give a classification in the form of a risk diagnosis based on the input of risk factors and the output obtained from the modular neural network trained on the records of blood pressure, thus creating a support system capable of helping medical professionals in detecting heart diagnosis diseases. This work is organized as follows: in Sect. 2 the proposed modular model is presented and explained, in Sect. 3 the methodology used for experimentation, the datasets used and the detailed parameters used in each of the proposed model’s modules is explained, in Sect. 4 the results and comparisons are presented and in Sect. 5 conclusions and future work on the subject are described.
2 Proposed Method The advancement in computer vision techniques and the usage of newly adopted neural network models has been the main catalyst in the now mainstream usage of deep neural models, and convolutional neural networks that are great on image classification tasks. Commonly before this adoption, the way to solve image-related problems in computational intelligence was to first analyze the images for distinctive features that were present only on images from a specific class, this was known to be the representative features of x-class, so in order to perform classification such features have to be emphasized for them to be used for classification for example lines, edges, objects. Another way of obtaining useful information on class features is to perform feature extraction, this is to use specially designed formulas and algorithms that perform calculations based on information found in an image sample, this can be the quantization of pixel distributions, the extraction of texture-related information and also color distribution information in the cases of color images. With the creation of Convolutional models these steps have been grouped together into a single closed model capable of extracting features from the source image dataset and perform classification. Our hybrid model seeks to present a simple yet powerful way of grouping together both paradigms of image-related classification analysis into a double agent model that combines the first paradigm of using a feature extraction algorithm and a neural network classifier separately, and the second one that uses a convolutional neural network that performs both functions in a single closed model. The combination of outputs from both paradigms is done by using a Mamdani type-1 fuzzy system in the form of an integrator, that uses the inputs of both neural networks and gives an aggregated final response for the classification. Experiments were done using 4 databases, and statistical testing is done by comparing the best result from one of the two neural networks and the final aggregated response that takes into consideration the other method as well. In Fig. 1 we can observe the proposed model architecture.
26
S. Varela-Santos and P. Melin
Fig. 1 Proposed method for classification using a Multilayered Perceptron and Convolutional Network, b Fuzzy Response Integrator that combines the outputs into one integrated classification prediction
3 Methodology The experimentation was done using medical image datasets for image classification problems, we used as our first dataset, the Montgomery-Shenzen Tuberculosis image dataset [33], a binary class image (400 by 400 pixels per image) dataset for the diagnosis of pneumonia. A Covid-19 binary dataset divided in Covid and nonCovid classes, images are a combination of x-ray and ct scans, only the x-ray part was used for this study (500 by 500 pixels per image). The third dataset was a 3-class Covid dataset created to differentiate Covid, pneumonia and a healthy x-ray this was created using the second dataset and the dataset of pneumonia from Zhang Lab (images were
Modular Approach for Neural Networks … Table 1 Parameters for module 1 (Multilayered Perceptron)
27
Network parameters Inputs
59
Hidden layers
4
Neurons
59 input layer/30,30,30, 30 hidden layers/1 output layer
Outputs
1
Training algorithm
Scaled conjugate gradient
Performance evaluation measure
Cross entropy
standardized to 500 by 500 pixels per image) [34]. Finally, the fourth and final dataset is the 4-class (covid-19, viral pneumonia, lung opacities and healthy) released by a team of researchers of the university of Qatar and Bangladesh (300 by 300 pixels per image) [35]. In all experiments done a data distribution of train-testing was divided by using 70% of available images for training and the remaining 30% for testing. The classifiers used for experimentation are designated as “Module 1” and “Module 2”, Module 1 is a Multilayered Perceptron also called in some contexts as fully connected neural network, it uses a learning algorithm to learn patterns from samples and uses a loss or error function that gets minimized as the training advances and the results predicted by the model are closer to those expected. The patterns used for the first module are texture features [36] extracted from the images of the experimental datasets, these features calculate the spatial-distributions of gray level pixels from grayscale images like those found in medical datasets. Module 2 is a standard Convolutional Neural Network that uses 4 convolutional layers to reduce the image used as input into features that are emphasize over the pooling layers and finally classified using the classification layer. Tables 1 and 2 contain more specific data on the parameters of both models. Table 2 Parameters of for module 2 (Convolutional Neural Network)
Network parameters Inputs
Size of image (400 × 400/500 × 500/)
Hidden layers
4 Convolutional (8,16,32,64 kernel size) and 1 fully connected
Outputs
1 classification layer
Learning rate
–
Minibatch size
64
Training algorithm
ADAM optimizer
Performance evaluation measure
Cross entropy
28
S. Varela-Santos and P. Melin
The model itself works like a modular neural network that uses the same input either divided into modules or separated into an independent analysis and combined at the end by an integrator. For this model, the usage of fuzzy logic in the integrator contributes by modelling uncertainty of the problem and taking it into consideration in the final prediction. The problem’s uncertainty can come from several factors, not knowing for certain the most optimal classifiers for the problem, the noise generated by the medical image-capture device, and several others commonly found when approximating a function instead of having a specific solution. This in combination with the trained neural networks focused on the problem with each different perspectives performs a better classification overall. The fuzzy integrator contains 2 inputs, one for each of the modules existing in the modular model, input 1 corresponds to the classification value given by multilayered perceptron, and input 2 to the classification given by the convolutional neural network, depending of the classes to the problem we assign one membership function for each class, called “range”, “range 1” for class 1, “range 2” for class 2 of the problem until class n. Only one output of the model is given at the end, and is considered the final prediction for a single image sample. The outputs are “Low Class”, “Medium Class” and “High Class”, if the output of the system is a low number [0–0.5] it would be mostly classified in the “Low Class” or “Medium Class” if it’s a medium number [0.2–0.7], it would fall into the “medium class”, and if it’s a high number [0.5–1] it would fall into the “High Class” membership function, after the fuzzy classification is done, the final crisp number obtained after the defuzzification is then compared with the actual label of the data in order to test the accuracy of the prediction (Table 3). In Fig. 2 and Table 4 we can observe the membership functions and the fuzzy rules used for the system. The following model was used for Datasets 1 & 2 which are binary class datasets. The same principle used in the first two experiments is used for the next experiment but adding a third membership function per input, in order to accommodate the third class that the problem has. In Fig. 3 and Table 4 we present the fuzzy integrator and fuzzy inference rules used for the 3-class dataset. Finally, following the same idea for the last experiment a fourth membership function was added in order to make it compatible for the 4-class scenario. In Fig. 4 and Table 5. we can observe the specifics of the model. For all cases the defuzzification Table 3 Fuzzy rule set for 2-class problem (Dataset 1&2)
Fuzzy rules
Input 1 MLP-LBP prediction
Input 2 CNN prediction
Output
1
Class 1
Class 1
Low class
2
Class 2
Class 2
High class
3
Class 1
Class 2
Medium class
4
Class 2
Class 1
Medium class
Modular Approach for Neural Networks …
29
Fig. 2 Type-1 fuzzy system applied to 2-class problems
Table 4 Fuzzy rule set for 3-class problem (Dataset 3)
Fuzzy Rules
Input 1 MLP-LBP prediction
Input 2 CNN prediction
Output
1
Class 1
Class 1
Low class
2
Class 2
Class 2
Medium class
3
Class 3
Class 3
High class
4
Class 1
Class 2
Medium class
5
Class 2
Class 1
Medium class
method used was the centroid and the inferencing method was Max–Min.
4 Results and Discussion As we explained in the introduction the selected techniques for the training of the modules and the extraction of patterns and feature data are good models precisely created to work in image related problems. Because of that, results obtained were generally good for all the datasets, In Table 6 the results of 30 experiments done for each dataset are shown.
30
Fig. 3 Type-1 fuzzy system applied to 3-class problems
Fig. 4 Type-1 fuzzy system applied to 4-class problems
S. Varela-Santos and P. Melin
Modular Approach for Neural Networks …
31
Table 5 Fuzzy rule set for 4-class problem (Dataset 4) Fuzzy Rules
Input 1 MLP-LBP prediction
Input 2 CNN prediction
Output
1
Class 1
Class 1
Low class
2
Class 2
Class 2
Medium class
3
Class 3
Class 3
Medium class
4
Class 4
Class 4
High class
5
Class 1
Class 2
Medium class
6
Class 2
Class 1
Medium class
Table 6 Experimental results
Montgomery-Shenzen
COVID-19 (2-class)
COVID-19 and Zhang Lab (3-class)
COVID-19, Viral Pneumonia, Normal and Lung Opacities (4-class)
Dataset
Multilayered Perceptron Neural Network with LBP Features (MLP-LBP)
Convolutional Neural Network (CNN)
Proposed Modular Neural Network with Fuzzy Integration (MNN-T1FI)
Average Classification Accuracy on Test Set
0.8914
0.9055
0.9210
Standard Deviation (σ)
0.009343251
0.008944153
0.005664148
Average Classification Accuracy on Test Set
0.9598
0.9448
0.9649
Standard Deviation (σ)
0.004334991
0.007591418
0.00262294
Average Classification Accuracy on Test Set
0.9524
0.9372
0.9643
Standard Deviation (σ)
0.006739088
0.003434832
0.004206383
Average Classification Accuracy on Test Set
0.9440
0.9538
0.9744
Standard Deviation (σ)
0.002440302
0.014117729
0.003420383
32
S. Varela-Santos and P. Melin
In the case of the first dataset that uses images for the tuberculosis disease and has only 2 classes, discrimination between classes becomes an easy task that can be solved with both models, also the number of images of the dataset are 662 which are a few samples in the context of image classification. In this case, the best model was the CNN with 0.9055 testing accuracy and the proposed model boosted the accuracy a little bit more into a 0.9210 testing accuracy. The second dataset that also deals with a binary classification situation, it scores a 0.9598 testing accuracy on the MLP-LBP module, and a 0.9649 on the proposed model. For the third dataset the difficulty is increased by adding a third class into the problem. The best result obtained was with the MLP-LBP model with 0.9524 testing accuracy, and the proposed model increased this accuracy into 0.9643. The fourth dataset increased the problem’s complexity by adding another class and the best module was the CNN, this can also be because of the capabilities of the CNN to work better when there is more images or data available, in this dataset there were 21,165 images in contrast with the first 3 datasets that only used around 700 images each. This advantage gives CNNs the upper hand in dealing with large datasets thus given better results, in the same manner we are using a CNN inside our proposed model, and the combination of fuzzy logic and the predictions of the other model gives back an increase in accuracy, by 2% obtaining a 0.9744 testing accuracy. These are preliminary results, but in order to provide statistical proof we performed the following right tail two sample statistical z-test in Table 7. In Table 7, we observe a small comparative analysis that performs the statistical Two-Sample Z-Test on the best model from both of the modules and our proposed model with fuzzy logic integration. In the 4 cases is proved that the proposed model reinforces the accuracy of the individual models on the testing set, by including the uncertainty factor and also by providing a second opinion from the results obtained by the second module that scored less, which in itself can be considered proof that the usage of multiple models to solve a problem provides a better understanding and a more precise solution. Table 7 Comparative statistical analysis Dataset
HIPOTESIS: Proposed Hybrid Model is better than Best Individual Model
Z-value
Significative evidence
Montgomery-Shenzen (2-class)
MNN-T1FI vs CNN
8.019133
Yes
COVID-19 (2-class)
MNN-T1FI vs MLP-LBP
5.513169
Yes
COVID-19 and Zhang Lab (3-class)
MNN-T1FI vs MLP-LBP
8.204688
Yes
COVID-19, Viral Pneumonia, Normal and Lung Opacities (4-class)
MNN-T1FI vs CNN
7.767425
Yes
Modular Approach for Neural Networks …
33
5 Conclusions Following the classification results obtained using the combination of the two already well studied approaches into a singular cooperative system, we can confirm that the usage of more than one image analysis technique provides a better understanding of the problem, extraction of useful information from each image, and as a result a more accurate classification system. The usage of fuzzy logic into the problem helps in controlling the uncertainty factor always present in medical diagnosis problems and making the classifier obtain a more accurate final result. More experiments need to be conducted in order to perform a better assessment of the advantages of the usage of a modular image classification system. So far, favorable results have been obtained with the model and provide an alternative way for image classification effective in analyzing grayscale distribution patterns, similar to those found in Medical Images. Experimentation on larger multiclass medical datasets needs to be performed, experiments using different membership functions for the fuzzy integrator can be performed and also comparison with other methods. Acknowledgements we would like to express our gratitude to CONACYT, Tijuana Institute of Technology for the facilities and resources granted for the development of this research.
References 1. Brunetti, A., Carnimeo, L., Trotta, G.P., Bevilacqua, V.: Computer-assisted frameworks for classification of liver, breast and blood neoplasias via neural networks: a survey based on medical images. Neurocomputing 335, 274–298 (2018) 2. Abraham, B., Nair, M.S.: Computer-aided detection of COVID-19 from X-ray images using multi-CNN and Bayesnet classifier. Biocybernetics Biomed. Eng. 40(4), 1436–1445 (2020) 3. Moon, W.K., Lee, Y., Ke, H., Lee, S.H., Huang, C., Chang, R.: Computer-aided diagnosis of breast ultrasound images using ensemble learning from convolutional neural networks. Comput. Methods Programs Biomed. 190, 105361 (2020) 4. Ahmadi, A., Kashefi, M., Shahrokhi, H., Nazaari, M.A.: Computer aided diagnosis system using deep convolutional neural networks for ADHD subtypes. Biomed. Signal Process. Control 63, 102227 (2021) 5. Saygili, A.: A new approach for computer-aided detection of coronavirus (COVID-19) from CT and X-ray images using machine learning methods. Appl. Soft Comput. 105, 107323 (2021) 6. Zheng, S., Shen, Z., Pei, C., Ding, W., Lin, H., Zheng, J., Pan, L., Zheng, B., Huang, L.: Interpretative computer-aided lung cancer diagnosis: From radiology analysis to malignancy evaluation. Comput. Methods Programs Biomed. 210, 106363 (2021) 7. Varela-Santos, S., Melin, P.: A new modular neural network approach with fuzzy response integration for lung disease classification based on multiple objective feature optimization in chest X-ray images. Expert Syst. Appl. 168, 114361 (2021) 8. Ramirez, E., Melin, P., Prado-Arechiga, G.: Hybrid model based on neural networks, type-1 and type-2 fuzzy systems for 2-lead cardiac arrhythmia classification. Expert Syst. Appl. 126, 295–307 (2019)
34
S. Varela-Santos and P. Melin
9. Shihabudheen, K.V., Pillai, G.N.: Recent advances in neuro-fuzzy system: a survey. Knowl.Based Syst. 152, 136–162 (2018) 10. van Ooijen, P.M.A.: From Physical Film to Picture Archiving and Communication Systems. Springer International Publishing, Basic Knowledge of medical Imaging Informatics (2021) 11. Ziyad, S.R., Radha, V., Vayyapuri, T.: Overview of computer aided detection and computer aided diagnosis systems for lung nodule detection in computed tomography. Curr. Med. Imaging 16(6), 16–26 (2020) 12. Ranschaert, E.R., Morozov, S., Algra, P.R.: Artificial Intelligence in Medical Imaging. Springer International Publishing (2019) 13. Lanca, L., Silva, A.: Digital Imaging Systems for Plain Radiography. Springer, New York (2013) 14. Jacques, S., Christe, B.: Chapter 2—Healthcare technology basics, pp. 21–50. Introduction to Clinical Engineering, Academic Press (2020) 15. Chandola, Y., Virmani, J., Bhadauria, H.S., Kumar, P.: Deep Learning for Chest Radiographs. Computer Aided Classification, Academic Press (2021) 16. Abdelrahman, L., Ghamdi, M.A., Collado-Mesa, F., Abdel-Mottaleb, M.: Convolutional neural networks for breast cancer detection in mammography: a survey. Comput. Biol. Med. 131, 104248 (2021) 17. Calli, E., Sogancioglu, E., van Ginneken, B., van Leeuwen, K.G., Murphy, K.: Deep Learning for chest X-ray analysis: a survey. Med. Image Anal. 72, 102125 (2021) 18. Varela-Santos, S., Melin, P.: A new approach for classifying coronavirus COVID-19 based on its manifestation on chest X-rays using texture features and neural networks. Inf. Sci. 545, 403–414 (2021) 19. Sudharson, S., Kokil, P.: An ensemble of deep neural networks for kidney ultrasound image classification. Comput. Methods Programs Biomed. 197, 105709 (2020) 20. Waheed, A., Goyal, M., Gupta, D., Khanna, A., Al-Turjman, F., Pinheiro, P.R.: CovidGAN: data augmentation using auxiliary classifier GAN for improved Covid-19 detection. IEEE Access 8, 91916–91923 (2020) 21. Varela-Santos, S., Melin, P.: Classification of X-ray images for pneumonia detection using texture features and neural networks. In: Castillo, O., Melin, P., Kacprzyk, J. (eds.) Intuitionistic and Type-2 Fuzzy Logic Enhancements in Neural and Optimization Algorithms: Theory and Applications. Studies in Computational Intelligence, Vol. 862. Springer, Cham (2020) 22. Chauhan, T., Palivela, H., Tiwari, S.: Optimization and fine-tunning of DenseNet model for classification of COVID-19 cases in medical imaging. Int. J. Inf. Manag. Data Insights 1(2), 100020 (2021) 23. Varshni, D., Thakral, K., Agarwal, L., Nijhawan, R., Mittal, A.: Pneumonia detect ion using CNN based feature extract ion. 2019 IEEE International Conference on Electrical Computer and Communication Technologies (ICECCT), pp. 1–7 (2019) 24. Verma, G., Prakash, S.: Pneumonia classification using deep learning in healthcare. Int. J. Innov. Technol. Exploring Eng. (IJITEE) 9(4) (2020) 25. Mittal, K., Jain, A., Vaisla, K.S., Castillo, O., Kacprzyk, J.: A comprehensive review on type 2 fuzzy logic applications: past, present and future. Eng. Appl. Artif. Intell. 95, 103916 (2020) 26. Lopez, M., Melin, P.: Response integration in ensemble neural networks using interval type2 fuzzy logic. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1503–1508 (2008) 27. Pulido, M., Mancilla, A., Melin, P.: An ensemble neural network architecture with fuzzy response integration for complex time series prediction. In: Castillo, O., Pedrycz, W., Kacprzyk, J. (eds.) Evolutionary Design of Intelligent Systems in Modeling, Simulation and Control. Studies in Computational Intelligence, Vol. 257. Springer, Berlin, Heidelberg 28. Melin, P., Soto, J., Castillo, O., Soria, J.: A new approach for time series prediction using ensembles of ANFIS models. Expert Syst. Appl. 39(3), 3494–3506 (2012) 29. Csiszar, O., Csiszar, G., Dombi, J.: Interpretable neural networks based on continuous-valued logic and multicriteria decision operators. Knowl.-Based Syst. 199, 105972 (2020)
Modular Approach for Neural Networks …
35
30. Ontiveros-Robles, E., Melin, P., Castillo, O.: Towards asymmetric uncertainty modeling in designing General Type-2 Fuzzy classifiers for medical diagnosis. Expert Syst. Appl. 183, 115370 (2021) 31. Mousavi, S.M., Abdullah, S., Niaki, S.T.A., Banihashemi, S.: An intelligent hybrid classification algorithm integrating fuzzy rule-based extraction and harmony search optimization: medical diagnosis applications. Knowl. Based Syst. 220, 103943 (2021) 32. Melin, P., Miramontes, I., Prado-Arechiga, G.: A hybrid model based on modular neural networks and fuzzy systems for classification of blood pressure and hypertension risk diagnosis. Expert Syst. Appl. 107, 146–164 (2018) 33. Jaeger, S., Candemir, S., Antani, S., Wang, Y.J., Lu, P., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant. Imaging Med. Surg. 4(6), 475–477 (2014) 34. Kermany, D., Goldbaum, M., Cai, W.: Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5), 1122–1131 (2014) 35. Vaishnavi Jamdade: COVID-19 dataset 3 classes. IEEE Dataport (2020). https://doi.org/10. 21227/q4ds-7j67 36. Zhao, Y., Zhang, D., Lin, L., Yang, Y.: A method for eliminating the disturbance of pseudotextural-direction in ultrasound image feature extraction. Biomed. Signal Process. Control 71, 103176 (2021)
Clustering and Prediction of Time Series for Traffic Accidents Using a Nested Layered Artificial Neural Network Model Martha Ramirez and Patricia Melin
Abstract Our proposal consists of using a nested layer model in which an unsupervised artificial neural network method is used as the first layer to perform tasks of clustering time series corresponding to the statistics of traffic accidents in Mexico for a particular period. As a second layer, a supervised neural network method is used to carry out tasks for the prediction of the number of accidents registered by place of occurrence. The results show that by combining both methods it is possible to use the unsupervised model (first layer) to find similarities in the information and to highlight attributes such as age range, sex, or geographic location, and later by using the supervised method (second layer) to focus on the prediction of the time series considering these attributes. Keywords Neural networks · Clustering · Time series · Prediction
1 Introduction For a couple of decades, the amount of data that is generated within organizations and the speed at which this happens has increased significantly. The analysis of historical information frequently is used as a support tool for the decision-making process [1– 3]. As it is known, a time series is a sequence or set of data recorded chronologically over a period. Also, predicting the future values of a time series has been a constant challenge for many researchers [4–6]. There are different methods of computational intelligence to perform clustering [7, 8]. These techniques have the objective to find hidden patterns or similar data groups in a dataset [9–11]. As well, there are multiple computational models that have been designed to perform the prediction of time series [12–14]. It is possible that the integration of several of these techniques will be required to solve complex problems in the real world [15–17]. M. Ramirez · P. Melin (B) Tijuana Institute of Technology, Tijuana, Mexico e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_3
37
38
M. Ramirez and P. Melin
Artificial Neural Networks (ANNs) are computational models [18–20], that simulate the brain’s behavior, so they can learn directly from the data and be trained to pattern recognition [21, 22]. In the case of the supervised neural networks [23–25] the learning algorithm uses input–output training data to model the dynamic system, on contrary, in the case of the unsupervised neural networks [26, 27] only the input data is given, so they are used to make representative clusters of all the data [28]. At present, there exist several applications based on ANNs that have been demonstrated their efficiency and accuracy to solve diverse problems [29–31]. The main contribution of this paper is to propose a method to perform time series prediction using nested layered neural networks. First at all, unsupervised neural networks were used to generate data clusters, and secondly, a supervised neural network is used to predict future values for each cluster. This paper is organized as follows. In Sect. 2, the case study is described. The methodology used is explained in Sect. 3. The experimental results are shown in Sect. 4. Finally, in Sect. 5, the general conclusions are presented.
2 Case Study The problem of road traffic crashes and injuries is a serious public health and development issue, taxing health care systems and undermining their ability to devote limited resources to other areas of need [32]. The study of the Ground Traffic Accidents in Urban and Suburban Areas in Mexico (ATUS statistics) is realized by National Institute of Statistics and Geography (INEGI) and it has the objective to produce annual information on the accident rate of ground transport at the national, federal entity and municipality level, thereby contributing to the planning and organization of both transport and roads [33]. The selected dataset for the period 2012–2018 consists of daily accident records (see Table 1) for the 32 states in Mexico. Although each record consists of 45 attributes, only six attributes were considered to perform the clustering and prediction tasks: state, year, month, day, sex, and age range of the driver. Each state’s identification number was assigned by state name in alphabetical order, which means geographic location is unrelated (see Table 2). The analysis of different traffic accident variables like date and time of occurrence, location by urban or suburban area, class and type of accident, type of vehicle involved, rolling surface and driver data, among others, would allow us to know their behavior. Table 1 Total annual records of traffic accidents in Mexico 2012–2018 Year
2012
2013
2014
2015
2016
2017
2018
Records
390,411
385,722
380,573
379,948
360,051
367,789
366,498
Clustering and Prediction of Time Series …
39
Table 2 List of 32 states of Mexico by number ID ID
State
ID
State
ID
State
ID
State
1
Aguascalientes
9
Ciudad de México
17
Morelos
25
Sinaloa
2
Baja California
10
Durango
18
Nayarit
26
Sonora
3
Baja California Sur
11
Guanajuato
19
Nuevo Leon
27
Tabasco
4
Campeche
12
Guerrero
20
Oaxaca
28
Tamaulipas
5
Coahuila
13
Hidalgo
21
Puebla
29
Tlaxcala
6
Colima
14
Jalisco
22
Queretaro
30
Veracruz
7
Chiapas
15
Mexico
23
Quintana Roo
31
Yucatan
8
Chihuahua
16
Michoacan
24
San Luis Potosi
32
Zacatecas
3 Methodology Our proposal consists of using a nested layer model in which: an unsupervised artificial neural network method is used to perform tasks of clustering time series corresponding to the statistics of traffic accidents in Mexico for a particular period, as first layer. A supervised neural network method is used to carry out tasks for the prediction of the number of accidents registered by place of occurrence, as second layer (see Fig. 1).
Fig. 1 Conceptual design of the proposed method
40
M. Ramirez and P. Melin
The first phase is obtaining, selecting, organizing, and processing the dataset of ground traffic accidents in Mexico. The second phase consists of evaluating the unsupervised neural network for the clustering tasks. In this case we use competitive neural networks [34]. The third phase refers to the evaluation of the supervised neural networks to perform the prediction tasks, such as Long Short-Term Memory (LSTM) neural network which is an upgraded version of Recurrent Neural Network (RNN) [35, 36].
4 Experiments and Results Based on the monthly number of accidents recorded in the period 2013–2018, the states that recorded similar traffic accident statistics were grouped into four classes (very few, few, many or too many accidents) by using competitive neural networks. As well, based on the sex and age range of the driver similar clusters were formed respectively (see Table 3). For each experiment, 30 executions were performed. A visual representation of the clusters formed is shown in Fig. 2. We note that in all simulations, 18 states remained within the cluster “Very Few” accidents, and in the case of the cluster “Too many” accidents, only the state with ID number 19 remained in this category, which represents that this only one state recorded the highest number of accidents in the entire country (see Table 4). Also, there are 13 states that presented cluster changes in the simulations, basically when the factor age range or sex is considered. The results are presented in Table 5. In order to perform the prediction tasks, we used a Long Short-Term Memory (LSTM) neural network with 180 neurons in the hidden layer. We considered 70% data for training and 30% for test. For each experiment, 30 executions were performed. The % Root Mean Square Error (RMSE) was used to measure the performance of each neural network. First, we made the prediction of the number of accidents by state, age range and sex of the driver, based on the data set of the 32 states respectively (see Table 6). Secondly, we take the parameters obtained by the competitive neural networks, to identify the data of each of the states that belong to each cluster. Then, we use the data corresponding to each cluster to train the LSTM network respectively (see Tables 7, 8, 9 and 10). Table 3 Clusters of monthly number of accidents recorded by state in the period 2013–2018 Cluster class
Monthly records
Age range
Age range women
Age range men
Very few
28
19
29
18
Few
2
11
1
12
Many
1
1
1
1
Too many
1
1
1
1
Clustering and Prediction of Time Series …
41
Fig. 2 Clusters by monthly number of accidentes by state 2013–2018 Table 4 Clusters by monthly number of accidentes by state 2013–2018 ID
State
Monthly records
Age range
Age range women
Age range men
1
Aguascalientes
Very few
Very few
Very few
Very few
3
Baja California Sur
Very few
Very few
Very few
Very few
4
Campeche
Very few
Very few
Very few
Very few
6
Colima
Very few
Very few
Very few
Very few
7
Chiapas
Very few
Very few
Very few
Very few
10
Durango
Very few
Very few
Very few
Very few
12
Guerrero
Very few
Very few
Very few
Very few
13
Hidalgo
Very few
Very few
Very few
Very few
18
Nayarit
Very few
Very few
Very few
Very few
19
Nuevo Leon
Too many
Too many
Too many
Too many
20
Oaxaca
Very few
Very few
Very few
Very few
21
Puebla
Very few
Very few
Very few
Very few
23
Quintana Roo
Very few
Very few
Very few
Very few
24
San Luis Potosi
Very few
Very few
Very few
Very few
25
Sinaloa
Very few
Very few
Very few
Very few
27
Tabasco
Very few
Very few
Very few
Very few
29
Tlaxcala
Very few
Very few
Very few
Very few
31
Yucatan
Very few
Very few
Very few
Very few
32
Zacatecas
Very few
Very few
Very few
Very few
42
M. Ramirez and P. Melin
Table 5 Multiple clusters by monthly number of accidentes by state 2013–2018 ID
State
Monthly records
Age range
Age range women
Age range men
2
Baja California
Very few
Few
Very few
Few
5
Coahuila
Very few
Few
Very few
Few
8
Chihuahua
Few
Few
Many
Few
9
Ciudad de México
Very few
Few
Very few
Few
11
Guanajuato
Few
Few
Very few
Few
14
Jalisco
Many
Many
Few
Many
15
Mexico
Very few
Few
Very few
Few
16
Michoacan
Very few
Few
Very few
Few
17
Morelos
Very few
Few
Very few
Few
22
Queretaro
Very few
Few
Very few
Few
26
Sonora
Very few
Few
Very few
Few
28
Tamaulipas
Very few
Few
Very few
Few
30
Veracruz
Very few
Very few
Very few
Few
Table 6 Prediction of the number accidents by attributes 2013–2018 Attribute
Average %RMSE
Best %RMSE
Worst %RMSE
State
0.001099408
0.000892990
0.001426313
Age range
0.002031722
0.001478331
0.002679007
Age range women
0.002571199
0.001437401
0.003503766
Age range men
0.002084991
0.001511632
0.002845864
Table 7 Prediction of number accidents by cluster records 2013–2018 Cluster class
Number of elements
Average %RMSE
Best %RMSE
Worst %RMSE
Very few
28
0.001089330
0.000966223
0.001405463
Few
2
0.004298581
0.001711493
0.010981270
Many
1
0.015911502
0.013330322
0.019791860
Too many
1
0.004718022
0.002490341
0.007786377
Table 8 Prediction of number of accidents by cluster age range 2013–2018 Cluster class
Number of elements
Average %RMSE
Best %RMSE
Worst %RMSE
Very few
19
0.001705369
0.001559329
0.002143049
Few
11
0.001379145
0.000869062
0.002850381
Many
1
0.086625433
0.055248528
0.093774513
Too many
1
0.006650431
0.002160609
0.008214891
Clustering and Prediction of Time Series …
43
Table 9 Prediction of number of accidents by cluster age range–women 2013–2018 Cluster class
Number of elements
Average %RMSE
Best %RMSE
Worst %RMSE
Very few
29
0.001621082
0.001378867
0.002058618
Few
1
0.067443688
0.046078795
0.081193765
Many
1
0.008311026
0.007225442
0.010490512
Too many
1
0.005329147
0.001745912
0.009691400
Table 10 Prediction of number of accidents by cluster age range–men 2013–2018 Cluster class
Number of elements
Average %RMSE
Best %RMSE
Worst %RMSE
Very few
18
0.001679730
0.001476107
0.002014695
Few
12
0.001598917
0.001189301
0.002646550
Many
1
0.087618763
0.056619465
0.102783290
Too many
1
0.006796464
0.001795046
0.008346521
The results show that by combining both methods it is possible to use the unsupervised model (first layer) to find similarities in the information and to highlight attributes such as age range, sex, or geographic location, and later by using the supervised method (second layer) to focus on the prediction of the time series considering these attributes.
5 Conclusions We have presented in this work a nested model for the clustering and prediction of time series of traffic accidents in Mexico using supervised and unsupervised neural networks. As an unsupervised method, we use a competitive neural network to do the clustering tasks. The simulation results have shown the geographical areas with similar data (clusters), such as the number of accidents, age range, and gender of the driver. An LSTM neural network, as a supervised method, was used to do the prediction of the number of accidents by state, age range and sex of the driver. As future work, we consider conducting tests with new case studies where the geographic location of the event is significantly relevant data within the time series, to obtain clusters with a greater number of elements, then by dividing the data corresponding to these groups, it is intended to improve the prediction for each region.
44
M. Ramirez and P. Melin
References 1. Tsai, Y., Zeng, Y., Chang, Y.: Air pollution forecasting using RNN with LSTM. 2018 IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), 2018, pp. 1074–1079. https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00178 2. Melin, P., Mancilla, A., Lopez, M., Mendoza, O.: A hybrid modular neural network architecture with fuzzy Sugeno integration for time series forecasting. Appl. Soft Comput. J. 7(4), 1217– 1226 (2007), ISSN 1568–4946. https://doi.org/10.1016/j.asoc.2006.01.009 3. Melin, P., Castillo, O.: An intelligent hybrid approach for industrial quality control combining neural networks, fuzzy logic and fractal theory. Inf. Sci. 177(7), 1543–1557 (2007). https:// doi.org/10.1016/j.ins.2006.07.022 4. Sfetsos, A., Siriopoulos, C.: Combinatorial time series forecasting based on clustering algorithms and neural networks. Neural Comput. Appl. 13, 56–64 (2004). https://doi.org/10.1007/ s00521-003-0391-y 5. Li, Y., Bao, T., Gong, J., Shu, X., Zhang, K.: The prediction of dam displacement time series using STL, extra-trees, and stacked LSTM neural network IEEE. Access 8, 94440–94452 (2020). https://doi.org/10.1109/ACCESS.2020.2995592 6. Castillo, O., Melin, P.: Forecasting of COVID-19 time series for countries in the world based on a hybrid approach combining the fractal dimension and fuzzy logic. Chaos Solitons Fract. 140 (2020). https://doi.org/10.1016/j.chaos.2020.110242 7. Ding, X., Hao, K., Cai, X., Tang, X.-S., Chen, L., Zhang, H.: A novel similarity measurement and clustering framework for time series based on convolution neural networks. IEEE Access 8, 173158–173168 (2020). https://doi.org/10.1109/ACCESS.2020.3025048 8. Melin, P., Amezcua, J., Valdez, F., Castillo, O.: A new neural network model based on the LVQ algorithm for multi-class classification of arrhythmias. Inf. Sci. 279, 483–497 (2014). https:// doi.org/10.1016/j.ins.2014.04.003 9. Austin, E., Coull, B., Zanobetti, A., Koutrakis, P.: A framework to spatially cluster air pollution monitoring sites in US based on the PM2.5 composition. Environ. Int. 59, 244–254 (2013). https://doi.org/10.1016/j.envint.2013.06.003 10. Melin, P., Monica, J.C., Sanchez, D., Castillo, O.: analysis of spatial spread relationships of coronavirus (COVID-19) pandemic in the world using self organizing. Maps Chaos Solitons Fract 138 (2020). https://doi.org/10.1016/j.chaos.2020.109917 11. Melin, P., Castillo, O.: Spatial and temporal spread of the COVID-19 Pandemic using self organizing neural networks and a fuzzy fractal approach. Sustainability 13(8295), 1–17 (2021). https://doi.org/10.3390/su13158295 12. Melin, P., Monica, J.C., Sanchez, D., Castillo, O.: A new prediction approach of the COVID19 virus pandemic behavior with a hybrid ensemble modular nonlinear autoregressive neural network. Soft Comput. (2020). https://doi.org/10.1007/s00500-020-05452-z 13. Sánchez, D., Melin, P.: Modular neural networks for time series prediction using Type-1 fuzzy logic integration. In: Melin, P., Castillo, O., Kacprzyk, J. (eds.) Design of Intelligent Systems Based on Fuzzy Logic, Neural Networks and Nature-Inspired Optimization. Studies in Computational Intelligence, vol. 601, pp. 141–154. Springer, Cham (2015). https://doi.org/10.1007/ 978-3-319-17747-2_11 14. Castillo, O., Melin, P.: Hybrid intelligent systems for time series prediction using neural networks, fuzzy logic, and fractal theory. IEEE Trans. Neural Netw. 13(6), 1395–1408 (2002). https://doi.org/10.1109/TNN.2002.804316 15. Chacón, H.D., Kesici, E., Najafirad, P.: Improving financial time series prediction accuracy using ensemble empirical mode decomposition and recurrent neural networks. IEEE Access 8, 117133–117145 (2020). https://doi.org/10.1109/ACCESS.2020.2996981 16. Melin, P., Soto, J., Castillo, O., Soria, J.: A new approach for time series prediction using ensembles of ANFIS models. Expert Syst. Appl. 39(3), 3494–3506 (2012). ISSN 0957–4174, https://doi.org/10.1016/j.eswa.2011.09.040
Clustering and Prediction of Time Series …
45
17. Soto, J., Melin, P., Castillo, O.: Time series prediction using ensembles of ANFIS models with genetic optimization of interval Type-2 and Type-1 fuzzy integrators. Hybrid Intell. Syst. 11(3), 211–226 (2014). https://doi.org/10.3233/HIS-140196. 18. Valdez, F., Melin, P., Castillo, O.: Modular neural networks architecture optimization with a new nature inspired method using a fuzzy combination of particle swarm optimization and genetic algorithms. Inf. Sci. 270, 143–153 (2014). https://doi.org/10.1016/j.ins.2014.02.091 19. Pulido, M., Melin, P.: Comparison of genetic algorithm and particle swarm optimization of ensemble neural networks for complex time series prediction. In: Melin, P., Castillo, O., Kacprzyk, J. (eds.) Recent Advances of Hybrid Intelligent Systems Based on Soft Computing. Studies in Computational Intelligence, vol. 915, pp. 51–77. Springer, Cham (2021). https://doi. org/10.1007/978-3-030-58728-4_3 20. Melin, P., Sánchez, D., Castillo, O.: Genetic optimization of modular neural networks with fuzzy response integration for human recognition. Inf. Sci. 197, 1–19 (2012). https://doi.org/ 10.1016/j.ins.2012.02.027 21. Sotirov, S., Sotirova, E., Melin, P., Castillo, O., Atanassov, K.: Modular neural network preprocessing procedure with intuitionistic fuzzy intercriteria analysis method. In: Andreasen, T. et al. (eds) Flexible Query Answering Systems 2015. Advances in Intelligent Systems and Computing, vol. 400, pp. 175–186. Springer, Cham (2016). https://doi.org/10.1007/978-3319-26154-6_14 22. Ramirez, E., Melin, P., Prado-Arechiga, G.: Hybrid model based on neural networks, type-1 and type-2 fuzzy systems for 2-lead cardiac arrhythmia classification. Expert Syst. Appl. 126, 295–307 (2019). https://doi.org/10.1016/j.eswa.2019.02.035 23. Moghar, A., Hamiche, M.: Stock market prediction using LSTM recurrent neural network. Procedia Comput. Sci. 170, 1168–1173 (2020). ISSN 1877–0509. https://doi.org/10.1016/j. procs.2020.03.049 24. Barbounis, T.G., Theocharis, J.B.: Locally recurrent neural networks for wind speed prediction using spatial correlation. Inf. Sci. 177(24), 5775–5797 (2007). ISSN 0020–0255. https://doi. org/10.1016/j.ins.2007.05.024 25. Wei, D.: Prediction of stock price based on LSTM neural network. 2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM), pp. 544–547 (2019).https://doi.org/10.1109/AIAM48774.2019.00113 26. Cherif, A., Cardot, H., Boné, R.: SOM time series clustering and prediction with recurrent neural networks. Neurocomputing 74(11), 1936–1944 (2011). https://doi.org/10.1016/j.neu com.2010.11.026 27. Melin, P., Castillo, O.: Spatial and temporal spread of the COVID-19 Pandemic using self organizing neural networks and a fuzzy fractal approach. Sustainability 13, 8295 (2021). https:// doi.org/10.3390/su13158295 28. Melin, P.: Introduction to Type-2 fuzzy logic in neural pattern recognition systems. In: Modular Neural Networks and Type-2 Fuzzy Systems for Pattern Recognition, Studies in Computational Intelligence, vol. 389, pp. 3–6. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/ 978-3-642-24139-0_1 29. Zhang, J., Chen, F., Shen, Q.: Cluster-based LSTM network for short-term passenger flow forecasting in Urban Rail Transit. IEEE Access 7, 147653–147671 (2019). https://doi.org/10. 1109/ACCESS.2019.2941987 30. Li, T., Hua, M., Wu, X.: A hybrid CNN-LSTM model for forecasting particulate matter (PM2.5). IEEE Access 8, 26933–26940 (2020). https://doi.org/10.1109/ACCESS.2020.2971348 31. Qian, F., Chen, X.: Stock prediction based on LSTM under different stability. 2019 IEEE 4th International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), pp. 483– 486 2019. https://doi.org/10.1109/ICCCBDA.2019.8725709 32. Peden, M. et al.: World report on road traffic injury prevention, World Health Organization, p. 60 (2004). ISBN 92 4 156260 9. https://www.who.int/publications/i/item/world-report-onroad-traffic-injury-prevention 33. “Ground traffic accidents in urban and suburban areas”, INEGI.org.mx. http://en.www.inegi. org.mx/programas/accidentes/#Documentation. Accesed 22 Sep 2020
46
M. Ramirez and P. Melin
34. Méndez, E., Lugo, O., Melin, P.: A competitive modular neural network for long-term time series forecasting. In: Melin, P., Castillo, O., Kacprzyk, J. (eds.) Nature-Inspired Design of Hybrid Intelligent Systems, Studies in Computational Intelligence, vol. 667, pp.243–254. Springer (2012). https://doi.org/10.1007/978-3-319-47054-2_16 35. Hu, Y., Sun, X., Nie, X., Li, Y., Liu, L.: An Enhanced LSTM for trend following of time series. IEEE Access 7, 34020–34030 (2019). https://doi.org/10.1109/ACCESS.2019.2896621 36. Zhelev, S., Avresky, D.R.: Using LSTM neural network for time series predictions in financial markets. 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA), pp. 1–5 (2019). https://doi.org/10.1109/NCA.2019.8935009
Ensemble Recurrent Neural Networks and Their Optimization by Particle Swarm for Complex Time Series Prediction Martha Pulido and Patricia Melin
Abstract In this article, we combine convolutional neural networks and particle swarm optimization techniques to design Recurrent Neural Network (RNN) architectures. The proposed particle swarm optimization seeks to find the number of layers, number of modules and number of neurons for layers. This method is applied to time series, the databases used are that of the Petroleum, The US Dollar/MX Exchange Rate and Taiwan Exchange Rate, where the main objective of this work is to find by means optimization algorithm is to minimize the perdition error for each of time series. The simulation results show that the recurrent neural network approach produces a good prediction error. Keywords Recurrent neural networks · Particle swarm optimization · Time series prediction
1 Introduction Time series is an ordered sequence of data or observations measured over time. One of the most important utilities of time series is its analysis for the prediction of the measured variable [1–3]. In organizations it is very useful to consider short- and medium-term predictions, for example to find what would happen with the demand for a certain product, future sales, decisions on inventory, supplies, etc. Recurrent Neural Networks (RNNs) are capable of performing a wide variety of computational tasks including processing sequences, continuation of a trajectory, non-linear prediction, and modeling of dynamical systems.
M. Pulido (B) · P. Melin Tijuana Institute of Technology, Tijuana, Mexico e-mail: [email protected] P. Melin e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_4
47
48
M. Pulido and P. Melin
These networks are also known as space–time or dynamic networks, they are a attempt to establish a correspondence between input and output sequences that they are just temporary patterns. An RNN can be classified as partially and/or totally recurrent. The totally, recurrent are those that each neuron can be connected to any other and their Recurring connections are variable. Partially recurring networks are those that your recurring connections are fixed. The latter are the usual way to recognize or play sequences. They generally have the most forward connections but include a set of feedback connections. RNN are a field within machine learning in constant development, which in recent times has gained enormous popularity thanks to the promising results obtained in the most diverse fields of application. One of particular interest is the prediction of time series. In a time, series, each piece of data has a time dependence on the previous and subsequent samples. This means that a particular analysis of the case is required to be able to make a prediction from the past [4]. Examples of predictive models based on recurrent neural networks can be found in finance [5], electronic commerce [6, 7], capital markets [8, 9], macroeconomic variables [10, 11], health [12, 13], signal processing, meteorology [14, 15], voice recognition [16] and traffic control [17]. In this work the optimization algorithm by means of particles was used since they are tools that help use predict a time series and find good solutions, we have previously done work with this metaheuristic and they have given us good results, we apply it to network optimization neural ensemble for time series. This work describes the creation of the ensemble recurrent neural network (ERNN), this model is applied to the prediction of the time series [18, 19], the architecture is optimized with particle swarm, (PSO) [20–23]. The responses of the ERNN are integrated with type-1 and type-2 fuzzy systems (IT2FS), [24–27]. The optimization of the recurrent neural network consists in the number of hidden layer (NL), their number of neurons (NN) and the number of modules (NM),) in the ERNN, the we integrate responses ERNN, with type-1 and IT2FS and in this way we achieve prediction and Mamdani fuzzy inference system (FIS) has five inputs which are Pred1, Pred2, Pred3, Pred4, and Pred5 and one output is called prediction, the number of inputs of the fuzzy system (FS) is according to the outputs of ERNN and Mamdani fuzzy inference system (FIS) is created, This FIS five inputs, which are Pred1, Pred2, Pred3, Pred4, and Pred5, the range 0 to 1.4, the outputs is called prediction, the range goes from 0 to 1.4 and is granulated into two MFs “Low”, “High”, as linguistic variables [28–30]. This document is conformed as follows: Sect. 2 shows a description of the problem and proposed method, in Sect. 3 Simulation and results, and Sect. 4 Conclusions.
Ensemble Recurrent Neural Networks …
49
2 Problem Statement and Proposed Method The proposed method combines recurrent neural networks and particle swarm optimization to design the ERNN architectures. The proposed particle swarm optimization seeks to find the number of layers, number of modules, number of neurons by layers. The proposed method is shown in Fig. 1.
2.1 Description of the Particle Swarm Optimization Applied to Recurrent Neural Network The parameters of the PSO, are established from previous works [31], as shown in Table 1. The following equation represent the objective function that we used with genetic algorithm to minimize to prediction error of the time series: E RM = (
d | pi − xi |)/d i=1
ERNN PSO Module 1
Module 2 Integration
Module 3
Module 4
Module 5
Fig. 1 General Architecture of the proposed method
50
M. Pulido and P. Melin
Table 1 Parameters for the PSO
Table 2 Parameters for the space search
Parameter
Value
Particles
100
Maximum Iterations
100
C1
2
C2
2
w
0.8
Parameter
Value minimum
Maximum
NN
1
5
NL
1
3
NN
1
30
Pr ediction Err or = (E R M 1 + E R M 2 + · · · + E R M N )/N
(1)
where p represents the predicted of the data for each of the modules of ensemble recurrent network, X corresponds real data of time series, d the number of data used by time series, E R M is the Prediction error by module of ERNN, N corresponds the number of modules determined by the GA and the Pr edicion Err or corresponds to average prediction error achieved by ERNN. The search space parameters as shown in Table 2.
2.2 Data Base Figure 2 represents the graph of the data Petroleum Time Series [32], where we using 800 data that correspond to period from07/04/08 to 09/05/11. We used 70% of the for the RNN trainings and 30% to test the RNN. Figure 3 represents the graph of the data US Dollar/MX Peso time series [33], where we using 800 data that correspond to period from07/04/08 to 09/05/11. We used 70% of the for the RNN trainings and 30% to test the RNN. Figure 4 represents the graph of the data Taiwan Stock Exchange time series [34], where we using 800 data that correspond to period from07/04/08 to 09/05/11. We used 70% of the for the RNN trainings and 30% to test the RNN.
Ensemble Recurrent Neural Networks …
51
Fig. 2 Series data of Petroleum time series
Fig. 3 Series data of US Dollar/MX pesos time series
2.3 Description of Type-1 and IT2FS The next step is the description of the type-1 fuzzy system and IT2FS. The following equation shows how the total result of the FS is calculated: n u(xi ) i=1 u(x i )i
i=1xi
y = n
(2)
52
M. Pulido and P. Melin
Fig. 4 Series data of Taiwan stock exchange
where u represents MFs and x corresponds the input data. Figure 5 Show a Mamdani fuzzy inference system (FIS) is created, This FIS five inputs which are Pred1, Pred2, Pred3, Pred4, and Pred5, the range 0 to 1.4., the output is called prediction and the range goes from 0 to 1.4 and is granulated into two MFs “Low”, “High”, as linguistic variables. Fuzzy system rules are as follows (as shown in Fig. 6), since the fuzzy system has 5 input variables with two MFs and one output with two MFs, therefore the possible number of rules are 32.
Fig. 5 IT2FS
Ensemble Recurrent Neural Networks …
1. If (Pred1 is P1) and (Pred2 is P2L) and (Pred3 is P3L) and (Pred is P4L) and (Predi5 is P5Low) then (Pred is L) 2. If (Pred1 is P1H) and (Pred2 is P2H) an0d (Pred3 is P3H) and (Pred4 is P4H) and (Pred5 is P5H) then (Pred is H) 3. If (Pred1 is P1L) and (Pred2 is P2L) and (Pred3 is P3L) and (Pred4 is P4L) and (Pred5 is P5H) then (Pred is L) 4. If (Pred1 is P1H) and (Pred2 is P2H) and (Pred3 is P3H) and (Pred4 is P4H) and (Pred5 is P5L) then (Pred is H) 5. If (Pred1 is P1L) and (Pred2 is P2L) and (Pred3 is P3L) and (Pred4 is P4H) and (Pred5 is P5H) then (Pred is L) 6. If (Pred1 is P1H) and (Pred2 is P2H) and (Pred3 is P3H) and (Pred4 is P4L) and (Pred5 is P5L) then (Pred is H) 7. If (Pred1 is P1L) and (Pred2 is P2L) and (Pred3 is P3H) and (Pred4 is P4H) and (Pred5 is P5H) then (Pred is H) 8. If (Pred1 is P1H) and (Pred2 is P2H) and (Pred3 is P3L) and (Pred4 is P4L) and (Pred5 is P5L) then (Pred is L) 9. If (Pred1 is P1L) and (Pred2 is P2H) and (Pred3 is P3H) and (Pred4 is P4H) and (Pred5 is P5H) then (Pred is H) 10. If (Pred1 is P1H) and (Pred2 is P2L) and (Pred3 is P3L) and (Pred4 is P4L) and (Pred5 is P5L) then (Pred is L) 11. If (Pred1 is P1L) and (Pred2 is P2H) and (Pred3 is P3L) and (Pred4 is P4H) and (Pred5 is P5L) then (Pred is L) 12. If (Pred1 is P1H) and (Pred2 is P2L) and (Pred3 is P3H) and (Pred4 is P4L) and (Pred5 is P5H) then (Pred is H) 13. If (Pred1 is P1L) and (Pred2 is P2H) and (Pred3 is P3L) and (Pred4 is P4L) and (Pred5 is P5L) then (Pred is L) 14. If (Pred1 is P1H) and (Pred2 is P2L) and (Pred3 is P3H) and (Pred4 is P4H) and (Pred5 is P5H) then (Pred is H) 15. If (Pred1 is P1L) and (Pred2 is P2L) and (Pred3 is P3H) and (Pred4 is P4L) and (Pred5 is P5L) then (Pred is L) 16. If (Pred1 is P1H) and (Pred2 is P2H) and (Pred3 is P3L) and (Pred4 is P4H) and (Pred5 is P5H) then (Pred is H) 17. If (Pred1 is P1L) and (Pred2 is P2L) and (Pred3 is P3L) and (Pred4 is P4H) and (Pred5 is P5L) then (Pred is L) 18. If (Pred1 is P1H) and (Pred2 is P2H) and (Pred3 is P3H) and (Pred4 is P4L) and (Pred5 is P5H) then (Pred is H) 19. If (Pred1 is P1L) and (Pred2 is P2L) and (Pred3 is P3H) and (Pred4 is P4H) and (Pred5 is P5L) then (Pred is L) 20. If (Pred1 is P1H) and (Pred2 is P2H) and (Pred3 is P3L) and (Pred4 is P4L) and (Pred5 is P5H) then (Pred is H) 21. If (Pred1 is P1L) and (Pred2 is P2H) and (Pred3 is P3H) and (Pred4 is P4L) and (Pred5 is P5L) then (Pred is L) 22. If (Pred1 is P1H) and (Pred2 is P2L) and (Pred3 is P3L) and (Pred4 is P4H) and (Pred5 is P5H) then (Pred is H) 23. If (Pred1 is P1L) and (Pred2 is P2L) and (Pred3 is P3H) and (Pred4 is P4H) and (Pred5 is P5L) then (Pred is L) 24. If (Pred1 is P1H) and (Pred2 is P2H) and (Pred3 is P3L) and (Pred4 is P4L) and (Pred5 is P5H) then (Pred is H) 25. If (Pred1 is P1L) and (Pred2 is P2H) and (Pred3 is P3H) and (Pred4 is P4L) and (Pred5 is P5H) then (Pred is H) 26. If (Pred1 is P1H) and (Pred2 is P2L) and (Pred3 is P3L) and (Pred4 is P4H) and (Pred5 is P5H) then (Pred is L) 27. If (Pred1 is P1L) and (Pred2 is P2H) and (Pred3 is P3L) and (Pred4 is P4H) and (Pred5 is P5L) then (Pred is L) 28. If (Pred1 is P1H) and (Pred2 is P2L) and (Pred3 is P3H) and (Pred4 is P4L) and (Pred5 is P5H) then (Pred is H) 29. If (Pred1 is P1L) and (Pred2 is P2H) and (Pred3 is P3H) and (Pred4 is P4H) and (Pred5 is P5H) then (Pred is L) 30. If (Pred1 is P1H) and (Pred2 is P2L) and (Pred3 is P3L) and (Pred4 is P4H) and (Pred5 is P5H) then (Pred is L) 31. If (Pred1 is P1L) and (Pred2 is P2H) and (Pred3 is P3L) and (Pred4 is P4H) and (Pred5 is P5H) then (Pred is H) 32. If (Pred1 is P1H) and (Pred2 is P2L) and (Pred3 is P3H) and (Pred4 is P4L) and (Pred5 is P5L) then (Pred is L)
Fig. 6 Rules used for the IT2FS
53
54
M. Pulido and P. Melin
3 Simulation Results The proposed method is applied to time series prediction and the results achieved are shown in this Sect. 4. The results of the optimization by means of particles are shown for the time series of Petroleum, Taiwan Stock Exchange and US Dollar/MX pesos. Table 3 shows the results of 10 experiments of the optimization of ERNN with particle swarm optimization for the Petroleum time series. Table 4 shows the results of 10 experiments of the type-1 fuzzy for Petroleum time series. Table 5 represents the prediction of the time series using the IT2FS for the Petroleum time series. Table 6 shows the results of 10 experiments of the optimization of ERNN with particle swarm optimization for the US Dollar/MX Peso time series. Table 7 shows the results of 10 experiments of the type-1 fuzzy for the US Dollar/MX Peso time series. Table 8 represents the prediction of the time series using the IT2FS for the US Dollar/MX Peso time series. Table 9 shows the results of 10 experiments of the optimization of ERNN with particle swarm optimization for the Taiwan Stock Exchange. Table 10 shows the results of 10 experiments of the type-1 fuzzy for the Taiwan Stock Exchange time. Table 11 represents the prediction of the time series using the IT2FS for the Taiwan Table 3 Results of the optimization de ERNN with PSO, for Petroleum time series No
Iterations
Particles
Number of modules
Number of layers
Duration
Prediction error
1
100
100
2
16,6
01:38:47
0.019806
2
100
100
1
10
01:40:41
0.01848
3
100
100
2
24 16
01:35:11
0.019806
4
100
100
2
25 15
01:40:41
0.018997
5
100
100
2
25 15
01:15:42
0.018907
6
100
100
2
6,9 28,4
01:45:35
0.019806
7
100
100
1
3
01:35:35
0.01842
8
100
100
01:4247
0.019888
9
100
100
2
25 15
01:15:42
0.0122111
10
100
100
2
6,9 26,2
01:26:15
0.022751
Ensemble Recurrent Neural Networks … Table 4 Results for type-1 fuzzy integrator in Petroleum time series
Table 5 Results for type-2 fuzzy integrator in Petroleum time series
55
No
Type-1 fuzzy integrator
1
0.3075
2
0.1343
3
0.1383
4
0.1322
5
0.3075
6
0.2012
7
0.1125
8
0.1822
9
0.1383
10
0.3022
Experiments
Prediction Error 0.3 uncertainty
Prediction Error 0.4 uncertainty
Prediction Error 0.5 uncertainty
1
0.2258
0.2261
0.2282
2
0.2261
0.2265
0.2268
3
0.2212
0.2233
0.2333
4
0.2247
0.2251
0.2258
5
0.229
0.2335
0.2338
6
0.2219
0.2222
0.2229
7
0.2218
0.2224
0.2230
8
0.2251
0.2258
0.2310
9
0.2255
0.2255
0.2387
10
0.2248
0.2254
0.2394
Stock Exchange.
4 Conclusions In this paper, the architecture of the recurrent neural network was optimized for the Petroleum, Taiwan Stock Exchange and US Dollar/MX pesos time series and good results were obtained, so we can say that recurrent neural networks do not allow analyzing sequences, and for this it has two inputs, the current data and the previous prediction and by combining these elements it is possible to generate the output of
56
M. Pulido and P. Melin
Table 6 Results of the optimization de ERNN with PSO, for US Dollar/MX Peso time series No
Iterations
Particles
Number of modules
Number of layers
Duration
Prediction error
1
100
100
2
18 15
01:05:22
0.00062062
2
100
100
3
5,18 15,13 17,18
01:19:33
0.00060268
3
100
100
3
17,5 12,25 20,27
01:28:04
0.00062102
4
100
100
4
10 11 15 18
01:45:20
0.00062632
5
100
100
2
18,19 13,13
01:10:15
0.00060568
6
100
100
3
21,19,14 26,17,12 16,17,18
01:11:03
0.00061166
7
100
100
3
8,22 11,23 16,6
01:33:18
0.00062424
8
100
100
3
18,18,25 15,20,19 16,11,23
01:22:20
0.0062805
9
100
100
3
7,6,11 26,2,18 26,14,16
01:51:21
0.0060541
10
100
100
4
5,27,28 20,20,6 13,12,12 12,26,6
01:44:09
0.00061135
the network as well as preserve the information obtained in previous instants of time, which is precisely equivalent to the memory of the network. It can be affirmed that the prediction of time series through the use of recurrent neural networks returns very positive results and even in certain circumstances better than those of a statistical method known for its good performance. This means that these models are really useful for predictive applications. However, they need to be improved by special approaches or new experimentation to compensate for the high computational load required for their training.
Ensemble Recurrent Neural Networks … Table 7 Results for type-1 fuzzy integrator for the US Dollar/MX Peso time series
Table 8 Results for type-2 fuzzy integrator for the US Dollar/MX Peso time series
57
No
Type-1 fuzzy integrator
1
0.4112
2
0.1120
3
0.1422
4
0.4521
5
0.4425
6
0.4852
7
0.1586
8
0.4220
9
0.4217
10
0.4825
Experiments
Prediction Error 0.3 uncertainty
Prediction Error 0.4 uncertainty
Prediction Error 0.5 uncertainty
1
0.4785
0.4261
0.42282
2
0.4261
0.4265
0.42268
3
0.4268
0.4233
0.42333
4
0.4247
0.4251
0.4258
5
0.4499
0.4335
0.433
6
0.4887
0.4222
0.4229
7
0.4558
0.4544
0.4277
8
0.4490
0.4877
0.4325
9
0.4441
0.4443
0.4337
10
0.4896
0.4987
0.42394
As future work we will consider the optimization of the recurrent neural network with another optimization method, and make comparisons and this by the type-1 and IT2FS [35–38]. We will also test the ability of our method to predict other complex time series.
58
M. Pulido and P. Melin
Table 9 Results of the optimization de ERNN with PSO, for Taiwan Stock Exchange No
Iterations
Particles
Number of modules
Number of layers
Duration
Prediction error
1
100
100
2
15,16,19 3,25,28
01:42:14
0.00081941
2
100
100
3
4,3 2,11 14,22
01:39:22
0.00082986
3
100
100
3
16 17 15
01:40:13
0.00081966
4
100
100
3
21,17,16 17,19,11 20,15,17
01:42:22
0.00082388
5
100
100
2
15,16,19 3,25,28
01:29:41
0.00089
6
100
100
2
15,6,19 3,25,28
01:23:05
0.00081941
7
100
100
3
4,3 11,26 14,22
01:42:53
0.00082986
8
100
100
3
16 17 15
01:38:17
0.00081966
9
100
100
3
21,17,16 17,19,11 20,15,17
01:41:55
0.00082388
10
100
100
2
17,25,15 36,22,21
01:42:00
0.00073167
Table 10 Results for type-1 fuzzy integrator for the Taiwan Stock Exchange time
No
Type-1 fuzzy integrator
1
0.0399
2
0.0233
3
0.0256
4
0.3861
5
0.3876
6
0.3784
7
0.0246
8
0.2347
9
0.1383
10
0.2148
Ensemble Recurrent Neural Networks … Table 11 Results for type-1 fuzzy integrator in Taiwan Stock Exchange
59
Experiments
Prediction Error 0.3 uncertainty
Prediction Error 0.4 uncertainty
Prediction Error 0.5 uncertainty
1
0.0228
0.0030
0.0375
2
0.0218
0.0224
0.02196
3
0.0254
0.0287
0.3015
4
0.00207
0.0245
0.0264
5
0.0235
0.0238
0.0244
6
0.0252
0.0278
0.0294
7
0.0228
0.2557
0.0275
8
0.02299
0.0278
0.0317
9
0.0222
0.0244
0.0277
10
0.02014
0.01999
0.0197
Acknowledgements We would like to express our gratitude to the CONACYT, Tijuana Institute of Technology for the facilities and resources granted for the development of this research.
References 1. Brockwell, P.D., Davis, R.A.: Introduction to Time Series and Forecasting, pp. 1–219. Springer, New York (2002) 2. Davey, N., Hunt, S., Frank, R.: Time Series Prediction and Neural Networks. University of Hertfordshire, Hatfield (1999) 3. Cowpertwait, P., Metcalfe, A.: “Time series”, Introductory Time Series with R., pp. 2–5. Springer Dordrecht, Heidelberg, London, New York (2009) ˇ 4. Mikolov, T., Kombrink, S., Burget, L., Cernocký, J., Khudanpur, S.: Extensions of recurrent neural network language model. (2011) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531 (2011) 5. Castillo, O., Melin, P.: Comparison of hybrid intelligent systems, neural networks and interval Type-2 fuzzy logic for time series prediction. Proceedings IJCNN (2007), pp. 3086–3091 6. Castillo, O., Melin, P.: Hybrid intelligent systems for time series prediction using neural networks, fuzzy logic, and fractal theory. Neural Netw. IEEE Trans. 13(6), 1395–1408 (2002) 7. Castro, J., Castillo, O., Melin, P., Mendoza, O., Rodríguez, A.: An interval Type-2 fuzzy neural network for chaotic time series prediction with cross-validation and Akaike test. Soft Computing for Intelligent Control and Robotics, pp.269–285 (2011) 8. Pulido, M., Melin, P., Mendoza, O.: Optimization of ensemble neural networks with Type-1 and interval Type-2 fuzzy integration for forecasting the taiwan stock exchange. Advances in Data Analysis with Computational, pp. 169–181 (2018) 9. Correa, P., Cipriano, A., Nuñez, F., Salas, C., Label, H.: Forecasting copper Electrorefining cathode rejection by means of recurrent neural networks with attention mechanism. IEEE Access 9, 79080–79088 (2021) 10. Güzelis, C., Yildiz, O.: Recurrent Trend Predictive Neural Network for Multi-Sensor Fire Detection, pp. 84204–84216 (2021)
60
M. Pulido and P. Melin
11. Hoffmann, L.F., Parquet Bizarria, F.C., Parquet Bizarria, J.W.: Detection of liner surface defects in solid rocket motors using multilayer perceptron neural networks. Polym. Test. (2020) 12. Sudipto, S., Raghava, G.: Prediction of Continuous B-Cell Epitopes in an Antigen Using Recurrent Neural Network, National Library of Medecine, pp. 40–48 (2006) 13. Walid, Alamsyah, A.: Recurrent Neural Network For Forecasting Time Series With Long Memory Pattern, pp. 1–8 (2016) 14. Yao, Q., Dongjin, S., Haifeng, C., Wei, C., Guofei, J., Garrison, C.: A Dual-Stage AttentionBased Recurrent Neural Network for Time Series Prediction. Computer Science, Cornell University, pp. 1–7 (2017) 15. Zhang, J., Man, K.F.: Time series prediction using recurrent neural network in multi-dimension embedding phase space. IEEE Int. Conf. Syst. Man Cybern. 2, 11–14 (1998) 16. Zhang, J., Xio, X.: Predicting chaotic time series using recurrent neural network. IOP Science, pp. 88–90 (2000) 17. Min, H., Jianhui, X., Shiguo, X., Fu-Liang, Y.: Prediction of chaotic time series based on the recurrent predictor neural network, pp. 3409–3416. IEEE (2004) 18. Petnehazi, G.: Recurrent Neural Networks for Time Series Forecasting. University of Debrecen, p.p. 1–22 (2019) 19. Rohitash, C., Mengjie, Z.: Cooperative coevolution of Elman recurrent neural networks for chaotic time series prediction. Neurocomputing 116–123 (2012) 20. Jerome, T., Connor, R., Douglas, M.: Recurrent neural networks and robust time series prediction. IEEE Trans. Neural Netw. 5(2), 240–254 (1994) 21. Pulido, M., Melin, P.: Ensemble recurrent neural networks for complex time series prediction with integration methods. Fuzzy Logic Hybrid Extensions of Neural and Optimization Algorithms, pp. 71–83 (2021) 22. Sherstinsky, A.: Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Physica D: Nonlinear Phenomena, pp. 1–28 (2020) 23. Mendel, J.: “Uncertain Rule-Based Fuzzy Logic Systems” Introduction an new directions”, pp. 213–231. Prentice-Hall, Inc. (2001) 24. Castillo, O., Melin, P.: Simulation and forecasting complex economic time series using neural networks and fuzzy logic. Proc. Int. Neural Netw. Conf. 3, 1805–1810 (2001) 25. Castillo, O., Melin, P.: “Type-2 Fuzzy Systems”, Type-2 Fuzzy logic Theory and Application, pp. 30–43., Ed. Springer (2008) 26. Castillo, O.: Type-2 Fuzzy Logic in Intelligent Control Applications. Springer, Berlin, Germany (2012) 27. Karnik, N., Mendel, J.M.: Introduction to type-2 fuzzy logic systems. IEEE Trans. Signal Process. 2, 915–920 (1998) 28. Eberhart, R.C., Kennedy, J.: A new optimizer particle swarm theory. In: Proceedings of the Sixth Symposium on Micromachine an Human Science, p. 39–43 (1995) 29. Eberhart, R.C.: Fundamentals of Computational Swarm Intelligence, pp. 93–129. Wiley, New York (2005) 30. Melin, P., Pulido, M., Castillo, O.: Ensemble Neural Network with Type-1 and Type-2 Fuzzy Integration for Time Series Prediction and Its Optimization with PSO. Imprecision and Uncertainty in Information Representation and Processing, pp. 375–388 (2016) 31. Petroleum Database: https://mx.investing.com/commodities/crude-oil-historical-data (June 05,2020) 32. Dollar/MX pesos Database: http://www.banxico.org.mx (June 05, 2021) 33. Taiwan Bank Database: www.twse.com/en (June 06, 2021) 34. Melin, P., Sanchez, D.: Multi-objective optimization for modular granular neural networks applied to pattern recognition. Inf. Sci. 460–461, 594–610 (2018) 35. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings Intelligent Symposium, pp. 80–87 (2003) 36. Olivas, F., Valdez, F., Castillo, O., Melin, P.: Dynamic parameter adaptation in particle swarm optimization using interval type-2 fuzzy logic. Soft Comput. 20(3), 1057–1070 (2016)
Ensemble Recurrent Neural Networks …
61
37. Castillo, O., Castro, J.R., Melin, P., Rodriguez Dias, A.: Application of interval type-2 fuzzy neural networks in non-linear identification and time series prediction. Soft Comput. 18(6), 1213–1224 (2014) 38. Ontiveros, E., Melin, P., Castillo, O.: High order α-planes integration: a new approach to computational cost reduction of general Type-2 fuzzy systems. Eng. Appl. Artif. Intell. 74, 186–197 (2018)
Filter Estimation in a Convolutional Neural Network with Type-2 Fuzzy Systems and a Fuzzy Gravitational Search Algorithm Yutzil Poma and Patricia Melin
Abstract We propose the adaptation of parameters (number of filters (NF)) of each convolution layer of a convolutional neural network (CNN). Type 2 fuzzy logic is used, which works together with the Fuzzy Gravitational Search Algorithm (FGSA) method, which It is based on the Gravitational Search Algorithm method also known as GSA, which is inspired in Newton’s second law and the gravity. The Fuzzy Gravitational Search Algorithm method will help the construction the membership functions of a Type-2 fuzzy system, which we propose for the search and adaptation the filter numbers of each the convolutional layers the convolutional neuron network. The uncertainty between the upper and lower membership functions the Type-2 fuzzy system is manually modified with values of 0.02, 0.05 and 0.08. This work is to demonstrate that Type-2 fuzzy logic used in conjunction with a bioinspired algorithm and working together with the convolutional neural network, obtain better results than applying Type-1 fuzzy logic in image recognition. Keywords Optimization · Parameter adaptation · Convolutional neural networks · Fuzzy system · Type-2 fuzzy logic · Fuzzy gravitational search algorithm
1 Introduction In these times the Convolutional neural networks (CNN), have a great boom thanks to their characteristics and their formidable capacity for image recognition and classification. Today there are several types of CNN available, which we can use for the classification and recognition of images such as ResNet [1] AlexNet [2] or LeNet [3]. The CNNs have been used in conjunction with different methods mainly for their classification or recognition of images, for example in the recognition of Chinese spelling sign language [4], or in [5] where these networks are used to speech recognition or face recognition as in [6–10]. Some the models obtained in the different experiments in conjunction with other methods are used to solve problems and expand the Y. Poma · P. Melin (B) Tijuana Institute of Technology, Tijuana, Mexico e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_5
63
64
Y. Poma and P. Melin
solutions these increasingly complicated using methods such as transfer learning [11, 12], the optimization of neural networks (NN) have helped to obtain best outcome as in [13, 14] where COVID-19 conditions are differentiated from people have good health in chest radiographs using the CNN which is optimized or using an optimized CNN for detection Cancer in the skin [15], or [16] where the convolutional neural network is optimized for intra video prediction. The case study selected for this research is the ORL database [17] which has been used in many methods and techniques for recognition as in [18, 19], since this database is widely used for tests in the image recognition. Thanks to the publication of "fuzzy sets" in 1965 by Zadeh [20], fuzzy logic (FL) has been introduced in various hybridized methods with neural networks as well as bio-inspired algorithms among others, as in [21] where fuzzy logic is applied to a monitor system or where prospects are identified based for learning foreign languages through games [22]. Other works where Type-1 FL has been used in [23] where a neural network is optimized using a fuzzy classifier or in [24] which where Type-1 is compared with Type-2 FL by intervals for control problems, as well as in [25] where Type-1 vs Type-2 is statistically analyzed by intervals in a dynamic adaptation of BCO parameters. Type-2 fuzzy systems (FS) have given remarkable results in the experimentation carried out collate to Type-1 FL, although it’s clear that it depends a lot on the problem to be solved for the use the solution and to define if it is required to use Type-1 FL or Type-FL. Some the works in which Type-2 fuzzy logic has been practical are [26, 27] where it is used for control or in which the parameters are dynamically adapted used harmony algorithm with Type-2 FL by intervals [28], some other works where it has been used [29–31]. There are many methods for the optimization of neural networks or other algorithms, the Fuzzy Gravitational Search Algorithm [32] has outstanding in experimentation thanks to its search capacity and the application of fuzzy logic within it. Some the most relevant works where they have been used is in [33, 34]. This work is segmented into: in the Section number 2 there are concepts reference to CNN also Type-2 fuzzy logic. In Sect. 3, we’ll see the proposal of this work in which we use Type-2 FL for the adaptation the network parameters. In Sect. 4 we will see the results obtained in the implementation the proposal of this work and the results will be collated with Type-1 FS and Type-2 FS as well as the optimization the network for the adaptation of its parameters. Finalizing, in Sect. 5 are the conclusions the experimentation carried out.
2 Literature Review In this segmented, we showed the concepts to understand the proposed method in this work:
Filter Estimation in a Convolutional Neural Network …
65
2.1 Convolutional Neural Networks One the networks with the most growth in our times is CNN, which is an artificial neural network, and is causative for imitating the natural learning the human being, identifies through characteristics of objects and Thus it manages to classify or recognize the images. This network is categorized by having layers, each one them is in charge of extracting characteristics the images “filtering” the information and in the end as determined it will classify it in a totally connected classification layer. Start the first layer is the convolution layer [35] which extracts the characteristics by means of a filter which multiplies the values of this by the values the image and later adds them in a space called “characteristic map”, in this layer is applied an activation function [36], later this map of characteristics goes to the next layer the network also named “Pooling” or sometimes also called “Aggregation layer”, [37] in this layer is in charge of continuing to filter the data or characteristics taken with an empty mask which varies in size (generally it is 2 * 2) and it goes through the characteristics map choosing the value that is within this mask already be it the maximum these or the average the total the values that are currently within the mask. ending with the last layer, which is the classification layer, which is fully-connected, in this layer are the number of output classes that are required for the classification or recognition the images [38]. The number of each the layers varies according to the depth the network, the deeper it will take more time and will take more computational resources.
2.2 Type-2 Fuzzy Logic System In 1999, Mendel and other authors started Type-2 fuzzy logic, which handles rule inserts, these consist of a defuzzification that is in charge of converting real value to fuzzy value, a fuzzy inference machine that applies a fuzzy reasoning to get fuzzy output; an output processor consisting of a reducer that transforms a Type-2 fuzzy set to a Type-1 fuzzy set, and a diffuser that translates an output to a precise value [39]. The membership functions (MF), also called FOU which is the variation between the higher membership function and the lower membership function, depends on the interval between these varies influences the optimal result the problem in question. It could be said that both the upper and lower MF are Type-1 MF. In Fig. 1 we can see a Type-2 membership function.
3 Proposed Method Based on the experiments executed previously [40] where the parameters the number of filters (NF) of each convolution layer the CNN are adapted, with the difference that in the past a Type-1 FS was used to adapt these parameters.
66
Y. Poma and P. Melin
Fig. 1 Example of Type-2 MF
Our proposal consists of adapting the parameters the convolutional neural network, which together with the Fuzzy Gravitational Search Algorithm method adapts the number of filters of each convolutional layer the network. A Type-2 FS was designed, see Fig. 2, it is detailed as follows: it has 1 input which is the Error (E1) that the CNN throws, in addition, RC corresponds to recognition, it varies according to the membership function (MF), also, it has 2 outputs which correspond to the NF1 and the NF2 the convolution layer, each input and output the FS has 3 MF respectively, the fuzzy rules can be found in Table 1, this FS works together with the FGSA method which is in charge of building the functions of Gaussian type membership. For each membership function corresponding to the input and output they were the Gaussian type with uncertainty in standard deviation, which is defined in Eq. 1 and in Fig. 3.
Fig. 2 Type-2 FS
Filter Estimation in a Convolutional Neural Network …
67
Table 1 Rules of Type-2 FS Rules 1
If
E1 is −RC
then
NF1 is −RC
and
NF2 is −RC
2
If
E1 is 1/2RC
then
NF1 is 1/2RC
and
NF2 is 1/2RC
3
If
E1 is + RC
then
NF1 is + RC
and
NF2 is + RC
Fig. 3 Gaussian MF with uncertainty SD
1 x −m 2 μ F˜ (x) = ex p − σ ∈ [σ1 , σ2 ] 2 σ 1 x −m 2 μ F˜ (x) = ex p − σ2 2 1 x −m 2 μ F˜ (x) = ex p − σ1 2
(1)
In Fig. 4 we can find the diagram of our proposed approach which consists of entering the data, later with the support the FGSA method which creates the agents that build the MF the Type-2 FS, so the last one is in charge to adapt the parameters that correspond to the NFs in convolution layer one and convolution layer two. These parameters enter the convolutional neural network thus giving the optimal value found by the FS to NFs the convolution layers respectively. The data (images) pass through CNN thus obtaining their recognition.
68
Y. Poma and P. Melin
Fig. 4 Proposed approach
Start
FGSA create the points corresponding to the MF
Values input fuzzy system
to
Fuzzy system gets the NF1 and NF2
CNN Error
Back
NFs get in CNN
End process
4 Results and Discussion The selected case study was the ORL database [17], it consists of 400 images of human faces taken at different angles, there are 40 different people and each one them has 10 images their faces. Each image is black and white and is 112 * 92 pixels size. In Fig. 5 we will see some the images of this database.
Filter Estimation in a Convolutional Neural Network …
69
Fig. 5 Examples ORL database
The architecture used for the CNN was having (2) convolution layers and (2) pooling layers respectively and (1) classification layer. Three main experiments were carried out in which the footprint of insecurity also known as FOU (Footprint of Uncertainty) varies from 0.02, 0.05 and 0.08. Each experiment was performed 30 times for each training epoch, these were 50, 70, 100, 200, 500, 1000 and 2000 training epochs the convolutional neural network. The 60% the data from the ORL database we was used for network training and 40% the data was for tested CNN. In Table 2, we can spot the results obtained when the insertion footprint is 0.02, which the maximum recognition value found is 96.25% when the neural network is trained 2000 times, the number of filters 1 and 2 is 24 filters for each convolution layer. The average 94.18% for the recognition the images and with a standard deviation (SD) of 1.28. In the Table 3 we can see the results when the insertion footprint is 0.05, the best result was 96.25% of image recognition with 2000 times of network training, the NFs Table 2 Results with Type-2 FS with FOU = 0.02 Epochs
Experiment number
Recognition rate
NF1
NF2
Average
SD
50
8
95.62
25
27
92.77
1.22
70
19
95.62
43
39
93.10
1.28
100
20
95
29
39
93.16
1.12
200
28
95.62
23
25
93.37
0.93
500
12
95
36
34
93.33
0.87
2000
2
96.25
24
24
94.18
1.28
70
Y. Poma and P. Melin
Table 3 Results with Type-2 FS with FOU = 0.05 Epochs
Experiment number
Recognition rate
NF1
NF2
Average
SD
50
13
96.87
18
19
92.33
1.45
70
10
96.81
20
21
92.36
1.51
100
2
95
23
23
92.45
1.26
200
12
96.25
28
29
93.18
1.40
500
25
96.87
43
44
93.25
1.27
1000
17
96.01
26
23
93.97
1.21
2000
2
96.25
10
9
94.31
1.15
in convolution layer 1 is 10 while in convolution layer 2 it is 9, while the average was 94.31 with a SD of 1.15. When the insertion footprint is 0.08 in the Type-2 FS, we can see in the Table 4, that the best value obtained was when the network was trained at 2000 epochs for image recognition with 95.62% in turn, with 40 NFs. in each the convolution layers that make up the convolutional neural network, having a best average of 93.75% and a SD of 1.21. In Table 5 we’ll view the abstract the best values obtained when the insertion footprint is manually modified with 0.02, 0.05 and 0.08. We can see that the highest image recognition value was 95.62% with 0.08 in the FOU, but if we observe the averages we can define that the best value was obtained was 94.18% in recognition when the FOU is 0.02. In all cases the best recognition value was when the convolutional neural network is trained with 2000 epochs. In Table 4 Results with Type-2 FS with FOU = 0.08 Epochs
Experiment number
Recognition rate
NF1
NF2
Average
SD
50
1
93.12
13
14
92
0.92
70
6
93.75
31
33
92.25
1.07
100
30
95.62
45
49
92.54
1.12
200
6
95
41
34
93.04
1.20
500
6
96.25
22
22
93.18
1.23
1000
13
95
20
20
93.31
0.99
2000
2
95.62
40
40
93.75
1.21
Table 5 Comparison of best results FOU
Epochs
Recognition rate
NF1
NF2
Average
SD
0.02
2000
96.25
24
24
94.18
1.28
0.05
2000
96.25
10
9
94.31
1.15
0.08
2000
95.62
40
40
93.75
1.21
Filter Estimation in a Convolutional Neural Network …
71
Table 6 Comparison with other optimization methods for CNN CNN detail
Optimized
Recognition rate (%) Max
Recognition rate (%) Max average
Convolutional Neural Network (70 Epochs) Optimized with FGSA [41]
FGSA
97.5
94.43
Convolutional Neural Network (70 Epochs) adaptation of parameters With FL [40]
FL T1
95.62
93.77
Convolutional Neural Network (2000 Epochs) adaptation of parameters FL FOU = 0.02
FL T2
96.25
94.18
Convolutional Neural Network (2000 Epochs) adaptation of parameters with FL FOU = 0.05
FL T2
96.25
94.31
Convolutional Neural Network (2000 Epochs) adaptation of parameters with FL FOU = 0.08
FL T2
95.62
93.75
Table 6 we will view the comparison the results the CNN with different methods in its optimization, from the optimization the CNN with the FGSA method without any help from a FS of any kind, in addition the network was optimized with a Type-1 FS [40] we present the adaptation of parameters the NF of each convolution layer with a Type-2 FS in which the insert fingerprint is modified manually. Where the best result in image recognition is 97.5% when CNN is optimized with the FGSA, we can also observe that CNN optimized applying Type-1 fuzzy system results in lower results to be used in the search for parameters a type-2 fuzzy when the FOU is 0.05 with an average 94.31% recognition.
5 Conclusions As final conclusions we can define based on the experiments carried out, we can state that ultimately better results are obtained when the CNN is optimized with the FGSA method both in the maximum recognition and in the average, above the values to the parameter adaptation, sing Type-1 or Type- 2 FS. The comparison the values obtained both in the average and in the maximum of image recognition using
72
Y. Poma and P. Melin
a Type-1 FS is significantly less than using a Type-2 FS in adapting parameters the NFs of each layer of convolution the CNN. It was found that a maximum image recognition is obtained, an equal value using the Type-2 FS, varying FOU of 0.02 and 0.05, but if the averages are bought we can define that the best solution we can find when the inset fingerprint is of 0.05. As future work, it is intended to carry out more experimentation in addition to testing this method with more complex databases. Acknowledgements We are honored and grateful to be sponsored by CONACYT & the Tijuana Institute of Technology who support us financially with the scholarship number 816488.
References 1. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 770–778 (Jun 2016) 2. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 3. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 4. Gao, Y., Jia, C., Chen, H., Jiang, X.: Chinese fingerspelling sign language recognition using a nine-layer convolutional neural network. EAI Endorsed Trans. e Learn. 7(20), e2 (2021) 5. Hourri, S., Nikolov, N.S., Kharroubi, J.: Convolutional neural network vectors for speaker recognition. Int. J. Speech Technol. 24(2), 389–400 (2021) 6. Liu, W., Zhou, L., Chen, J.: Face Recognition Based on Lightweight Convolutional Neural Networks. Information 12(5), 191 (2021) 7. Li, K., Jin, Y., Waqar Akram, M., Han, R., Chen, J.: Facial expression recognition with convolutional neural networks via a new face cropping and rotation strategy. Vis. Comput. 36(2), 391–404 (2020) 8. Felea, I.I., Dogaru, R.: Improving light-weight convolutional neural networks for face recognition targeting resource constrained platforms. ESANN 2020, pp. 199–204 (2020) 9. Deng, Z., Peng, X., Li, Z., Qiao, Y.: Mutual component convolutional neural networks for heterogeneous face recognition. IEEE Trans. Image Process. 28(6), 3102–3114 (2019) 10. Yang, Z., Xiong, H., Chen, X., Liu, H., Kuang, Y., Gao, Y.: Dairy cow tiny face recognition based on convolutional neural networks. CCBR 2019, pp. 216–222 (2019) 11. Jialin Pan, S., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (Oct 2010) 12. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: Proceedings of International Conference on Artificial Neural Networks, pp. 270–279 (2018) 13. Govindarajan, S., Swaminathan, R.: Differentiation of COVID-19 conditions in planar chest radiographs using optimized convolutional neural networks. Appl. Intell. 51(5), 2764–2775 (2021) 14. Govindarajan, S., Swaminathan, R.: Correction to: differentiation of COVID-19 conditions in planar chest radiographs using optimized convolutional neural networks. Appl. Intell. 51(5), 2776 (2021) 15. Zhang, N., Cai, Y.X., Wang, Y.Y., Tian, Y.T., Wang, X.L., Badami, B.: Skin cancer diagnosis based on optimized convolutional neural network. Artif. Intell. Med. 102, 101756 (2020) 16. Meyer, M., Wiesner, J., Rohlfing, C.: Optimized convolutional neural networks for video intra prediction. ICIP 2020, pp. 3334–3338 (2020)
Filter Estimation in a Convolutional Neural Network …
73
17. Face recognition using ortho-diffusion bases-Scientific Figure on ResearchGate. https:// www.researchgate.net/figure/The-face-image-data-from-the-ORL-database_fig2_233545388. Accessed 02 Feb 2021 18. Eleyan, A., Demirel, H.: PCA and LDA Based Neural Networks for Human Face Recognition (2007). https://doi.org/10.5772/4833. 19. Wang, Q., Cheng, J., Gao, Q., Zhao G., Jiao, L.: Deep multi-view subspace clustering with unified and discriminative learning. In: IEEE Trans. Multimed. https://doi.org/10.1109/TMM. 2020.3025666 20. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965). ISSN 0019-9958 21. Khan, N., Elizondo, D.A., Deka, L., Molina-Cabello, M.A.: Fuzzy logic applied to system monitors. IEEE Access 9, 56523–56538 (2021). https://doi.org/10.1109/ACCESS.2021.307 2239 22. Yanes, N., Bououd, I., Alanazi, S.A., Ahmad, F.: Fuzzy logic based prospects identification system for foreign language learning through serious games. IEEE Access 9, 63173–63187 (2021). https://doi.org/10.1109/ACCESS.2021.3074374 23. Guzmán, J.C., Melin, P., Prado-Arechiga, G.: Optimization for type-1 and interval type-2 fuzzy systems for the classification of blood pressure load using genetic algorithms. In: Intuitionistic and Type-2 Fuzzy Logic Enhancements in Neural and Optimization Algorithms, pp. 63–71 (2020) 24. Castillo, O., Angulo, L.A., Castro, J.R., Valdez, M.G.: A comparative study of type-1 fuzzy logic systems, interval type-2 fuzzy logic systems and generalized type-2 fuzzy logic systems in control problems. Inf. Sci. 354, 257–274 (2016) 25. Angulo, L.A., Mendoza, O., Castro, J.R., Díaz, A.R., Melin, P., Castillo, O.: Fuzzy sets in dynamic adaptation of parameters of a bee colony optimization for controlling the trajectory of an autonomous mobile robot. Sensors 16(9), 1458 (2016) 26. Bernal, E., Lagunes, M.L., Castillo, O., Soria, J., Valdez, F.: Optimization of type-2 fuzzy logic controller design using the GSO and FA algorithms. Int. J. Fuzzy Syst. 23(1), 42–57 (2021) 27. Cuevas, F., Castillo, O., Cortés-Antonio, P.: Design of a control strategy based on type-2 fuzzy logic for omnidirectional mobile robots. J. Multiple Valued Log. Soft Comput. 37(1–2), 107–136 (2021) 28. Valdez, F., Peraza, C.: Dynamic parameter adaptation in the harmony search algorithm for the optimization of interval type-2 fuzzy logic controllers. Soft Comput. 24(1), 179–192 (2020) 29. Karmaka, S., Seikh, M.R., Castillo, O.: Type-2 intuitionistic fuzzy matrix games based on a new distance measure: application to biogas-plant implementation problem. Appl. Soft Comput. 106, 107357 (2021) 30. Valdez, F.: A review of optimization swarm intelligence-inspired algorithms with Type-2 fuzzy logic parameter adaptation. Soft Comput. 24(1), 215–226 (2020) 31. Ontiveros Robles, E., Melin, P., Castillo, O. (2018). Comparative analysis of noise robustness of type 2 fuzzy logic controllers. Kybernetika 54(1), 175–201 32. Sombra, A., Valdez, F., Melin, P., Castillo, O.: A new gravitational search algorithm using fuzzy logic to parameter adaptation. In: 2013 IEEE Congress on Evolutionary Computation, no. 3, pp. 1068–1074 (2013) 33. Mirjalili, S., Hashim, S.Z.M., Sardroudi, H.M.: Training feedforward neural networks using hybrid particle swarm optimization and gravitational search algorithm. Appl. Math. Comput. 218(22), 11125–11137 (2012) 34. Hatamlou, A., Abdullah, S., Othman, Z.: Gravitational search algorithm with heuristic search for clustering problems. In: Conference Data Mining Optimization, pp. 190–193 (June 2011) 35. LeCun, Y., Bengio, Y.: Convolution networks for images, speech, and time-series. Igarss 2014(1), 1–5 (1998) 36. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings 27th International Conference Machine Learning, no. 3, pp. 807–814 (2010) 37. Yang, J., Yu, K., Gong, Y., Beckman, T.H.: Linear spatial pyramid matching using sparse coding for image classification. IEEE Computer Society Conference Computer Vision Pattern Recognition, pp. 1794–1801 (2009)
74
Y. Poma and P. Melin
38. Venkatesan, R., Li, B.: Convolutional Neural Networks in Visual Computing: A Concise Guide. CRC Press (2017) 39. Karnik, N.N., Mendel, J.M., Liang, Q.: Type-2 fuzzy logic systems. IEEE Trans. FUZZY Syst. 7, 16 (1999) 40. Poma, Y., Melin, P.: Estimation the number of filters in the convolution layers of a convolutional neural network using a fuzzy logic system. In: Fuzzy Logic Hybrid Extensions of Neural and Optimization Algorithms, pp. 1–14 (2021) 41. Poma, Y., Melin, P., González, C.I., Martinez, G.E.: Optimization of convolutional neural networks using the fuzzy gravitational search algorithm. J. Autom. Mob. Robot. Intell. Syst. 14(1), 109–120 (2020)
Optimization
Artificial Fish Swarm Algorithm for the Optimization of a Benchmark Set of Functions Cinthia Peraza, Patricia Ochoa, Leticia Amador, and Oscar Castillo
Abstract An efficient artificial fish swarm algorithm (AFSA) for the optimization of a benchmark set of functions is developed in this paper. An analysis of the principal parameters to effect on exploration or exploitation is presented. In this case, two important parameters are changed manually with the main idea to analyze the impact that S and V presents on the development of the AFSA algorithm. The performance and efficiency of the AFSA demonstrated that it is a good algorithm to solve benchmark sets of functions. The AFSA has proven to be successful in the implementation of various benchmarking optimization problems. A comparative with other metaheuristics is presented. Keywords Artificial fish swarm · Benchmark set of functions · Exploitation · Exploration
1 Introduction Currently meta-heuristic algorithms are an important field to solve various optimization problems. An interesting way to check the efficiency in the performance of some meta-heuristics is using mathematical functions, and some bio-inspired algorithm that has been implemented this benchmark functions are; in [1] a galactic swarm optimization (GSO) is used for the optimization of mathematical functions presented by Bernal E. et al. in [2] an algorithm based on the behavior of bees (BCO) is used in the benchmark functions presented by Castillo O. et al., in [16] a inspired grey wolf optimizer (GWO) is implemented to solve large-scale function optimization problems presented by Long W. et al., in [18] the algorithm called differential evolution (DE) is used in benchmark functions presented by Ochoa P. et al., in [19] an interesting algorithm based on the music called harmony search (HS) is used for this case study by Peraza, C. et al., and in [21] an improving particle swarm optimization (PSO) is implemented for the mathematical functions presented by Valdez F. et al. C. Peraza (B) · P. Ochoa · L. Amador · O. Castillo Division of Graduate Studies, Tijuana Institute of Technology, Tijuana, México e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_6
77
78
C. Peraza et al.
The meta-heuristics algorithms have parameters that defined the part of the heuristic, these parameters allow to measure the efficiency of the exploration and the exploitation in the algorithms, for example; in [2] BCO has alpha and beta parameters, in [18] DE has F and Cr parameters, and in [19] HS has PARate and HMR parameters. In the case of the AFS has S and V parameters. Several authors are interesting in to find the optimal values in this parameters based on different strategies, for mention some; in [4] Chen, G.Z. et al. presents an improved AFSA, in [5] Gao S., et al. analyze some aspects of the behavior fish, and in [25] Zhang, T. et al. presents a proposed to find the optimal values for the S and V applied to planning path of robots. In this paper, some important contribution is; the first consists in demonstrate that AFSA is an efficiency tool in the optimization of benchmark set of functions, and the second, consists in the exploration of the AFSA, especially in to observe the S and V parameters, the idea is to analyze the way in which these two parameters influence to find the best solutions for case study. An important analysis is presented in the performance of the AFSA based on experimentations. The following sections describes the organization in this paper: Sect. 2 shows some relevant related works about this research. Section 3 describes the artificial fish swarm optimization. Section 4 describes the benchmark set of functions used. Section 5 presents in detail of the analysis of the S and V parameters on the behavior in this algorithm. Various simulation results are presented in Sect. 6. Section 7 shows a comparative result with other meta-heuristics. Finally, some conclusions and future works are mentioned in the Sect. 8.
2 Related Works Actually, the bio-inspired algorithm based on behavior of some animals in nature is an important tool to solve problems in the field of computing. In this paper, the interest is focused on the behavior of the fish to find food. Several researches has analyzed and implemented the AFSA in different field of the intelligent computing such as; in fuzzy control [11, 17, 28], in clustering [7, 12], in medicine [15], in smart city [8], and in others important areas [3, 4, 6, 7, 13–15, 20, 26, 27]. In the state of art, several researchers are interested in metaheuristics algorithms to solve problems in the field of computing. A large list of authors is interested in this algorithm to solve various problems, for mentioning some; in [9] a misalignment fault prediction of wind turbines based on improved AFSA is presented, in [10] a layout optimization of fiber bragg grating strain sensor network based on modified AFSA is presented, in [20] a review of the family of AFSA: recent advances and applications is presented, in [22] a Parameters analysis of AFSA is presented, in [23] a modified AFSA is presented, in [24] a new AFSA for dynamic optimization problems is presented, in [26] an improved AFSA is presented, and in [28] an improved AFSA for neighborhood rough set reduction and its application is presented.
Artificial Fish Swarm Algorithm for the Optimization …
79
3 Artificial Fish Swarm Algorithm The main idea that Li proposed in 2002 with the inspiration in the analysis was to simulate a number of ecological behaviors of schooling in the water. This idea allows the creation of the AFSA which was proposed by Yazdani, D. et al. in 2012 [24]. In AFSA the behavior is praying or foraging, where every in nature probes for its prey individually within its visual distance. Figure 1 illustrates the idea in each movement that a fish performs in the algorithm. X represents the actual position of an artificial fish (AF). Step (S) indicates the maximum step–length that a fish can take at each movement, Visual (V ) is the visual distance. Each AF inspects the search space around its vision distance. In this case, the best solution in each iteration is changed according to the following condition, the criterion in the realization of a move in a fish is if a new position is better that its current location, the movement is given. A random distribution called n stars in AFSA for each generation in the search space. For behaviors that can to present a fish in this algorithm, the position on the search space that has major concentration of these behaviors is considered an excellent result in this algorithm; the four behaviors are called; preying, swarming, following, and random behavior. In AFSA the dynamism is based with the following equations; Eq. (1) represents the preying behavior, in this algorithm, the fish try to move to locations with highest food in the search space. Let X i(t) be the current position of the ith AF, and X j be a state of an AF randomly selected within the visual distance of the ith AF as follows: X j = X i(t) + V × rand() Fig. 1 Graphical representation of the movements for an artificial fish
(1)
80
C. Peraza et al.
where V is the visual distance of the AF, di, j < V and rand() is a random vector with each element between 0 and 1. If I f Yi < Y j , X i(t) moves a step towardX j , the Eq. (2) shows this behavior. X j − X i(t) X i(t+1) = X i(t) + S × rand(). ∗ (t) X j − X i
(2)
where .∗ represents multiplication of the corresponding elements in two vectors. In AFSA if the forward condition is not satisfied, a new state X j is randomly selected using Eq. (1). If the criterion is not satisfied after a number determined of time. X i(t) moves a step randomly, the Eq. (3) indicates this behavior. X i(t+1) = X i(t) + V. ∗ rand()
(3)
The second behavior is called swarming, which represents in nature a defense mechanism form its predators. Let n be a total number of AF while di, j < V , and X c = (xc1 , xc2 , . . . . . . ., xcD ) be the central position, where xcd is the dth dimension of X c , the Eq. (4) descripted this behavior. xcd =
nf j=1
x jd
nf
(4)
If the conditions in Eq. (5) are satisfied, i.e.;
Yc > Yi
nf >δ n
(5)
where δ is the crowding factor, this means there is more food in the center, and the area is not overcrowded. As such, X i(t) moves a step toward the companion center using Eq. (6) X c − X i(t) X i(t+1) = X i(t) + S × rand(). ∗ (t) X c − X i
(6)
However, if the conditions in Eq. (5) are not satisfied, the preying behavior is executed. If n f = 0 this means there is no companion within the visual distance of the ith AF, and the preying behavior is executed. The third behavior called following is executed when a fish in a location with a better concentration of food, is an observation by other fishes follow. Let X i(t) be the current position in the ith AF contains more food, i.e., Y j − Yi , and it is not overcrowded, i.e., n f /n < δ, X i(t) moves a step toward X j , and the Eq. (2) is executed.
Artificial Fish Swarm Algorithm for the Optimization …
81
Fig. 2 Flowchart of the procedure in the AFSA
In others words, the preying behavior is executed. Similar to the swarming behavior, the preying is executed if n f = 0. The final behavior called random, in nature represents a swarm of fishes moving freely to find food. This behavior enables an AF to search for food, or to follow a swarm in a larger space. The AF selects a random behavior when none of the criteria pertaining to the preying, swarming, and following behaviors are satisfied. Equation (7) represents this final behavior: X i(t+1) = X i(t) + V × rand()
(7)
Figure 2 illustrates the general description of the step in AFSA.
4 Benchmark Sets of Functions The study case that in this paper is implemented is benchmark sets of functions, this section describes a total of six classical benchmark functions with the main objective to validate to AFSA, such as; Ackley, Giewank, Quarticnoise, Rastrigin, Rosenbrock, Spherical; all functions were evaluated with 10, 30 and 50 dimensions (artificial fish). Figure 3 illustrates the plot, and Table 1 shows the equation for each study case.
82
C. Peraza et al.
Fig. 3 Plot of the six benchmark mathematical functions; a Ackley, b Griewank, c Quarticnoise, d Rastrigin, e Rosenbrock, and f Sphere
Table 1 Benchmark sets of functions Number
Name function
F1
Ackley
F2
Griewank
F3
Quarticnoise
F4
Rastrigin
F5
Rosenbrock
F6
Sphere
Mathematical representation x x f (x) = − 20e −0.2 n1x nj =1 x 2j − en1x nj =1 cos 2π x j + 20 + e
x 2i xi +1 − di=1 cos √ f (x) = di=1 4000 i d f (x) = i=1 i x 4i + r and[0, 1] f (x) = 10d + di=1 x 2i − 10cos(2π x i ) 2 2 2 f (x) = d−1 i=1 100(x i+1 − x i ) + (x i − 1) f (x) = di=1 x 2i
5 Experimental Results Several experiments were executed, for each benchmark mathematical function, a total of 5000 generations, the size of the vector solution (artificial fish) is 30. The dimensions that were executed are; 10, 30 and 50. When V is fixed by 0.5 S was varying from 0.1,0.3,0.5,0.7 and 0.9. Four metrics are used for each experimentation; Table 2 represents the results for 10 dimensions. Table 2 shows that better results are found with S is in the range of [0.7–0.9]. The italic term represents the better average and black color indicates the best results for each mathematical function. Table 3 shows to change in the V value with 0.1, 0.3, 0.5, 0.7 and 0.9 values and
Artificial Fish Swarm Algorithm for the Optimization …
83
Table 2 Benchmark sets of functions varying the “S” value for 10 dimensions Function Performance index
Varying S 0.1
F1
Average
F2
F3
F4
F5
F6
0.5
0.7
2.04E+01 2.00E+01 2.01E+01 1.97E+01
Standard deviation (σ) 1.24E-01 Best
0.3 1.84E-01
4.02E-01
0.9 1.97E+01
7.86E-02
3.56E-01
2.01E+01 1.98E+01 1.91E+01 1.96E+01
1.80E+01
Worst
2.06E+01 2.03E+01 2.10E+01 1.99E+01
1.99E+01
Average
3.57E+02 3.56E+02 2.03E+02 9.96E-01
5.12E-01
Standard deviation (σ) 1.81E+01 1.06E+01 2.02E+01 4.83E-01
2.32E-01
Best
3.12E+02 3.38E+02 1.52E+02 1.75E-01
7.64E-02
Worst
3.93E+02 3.72E+02 2.26E+02 1.85E+00
9.06E-01
Average
2.69E+01 2.59E+01 1.84E+01 3.75E+00
3.24E+00
Standard deviation (σ) 2.48E+00 1.55E+00 4.12E+00 6.96E-01
9.98E-01
Best
2.09E+01 2.24E+01 7.83E+00 2.83E+00 1.87E+00
Worst
3.20E+01 2.85E+01 2.27E+01 4.91E+00
5.76E+00
Average
1.47E+02 1.45E+02 1.23E+02 1.18E+02
7.16E+01
Standard deviation (σ) 1.27E+01 6.49E+00 1.27E+01 1.99E+01
2.88E+01
Best
3.58E+01
1.08E+02 1.38E+02 9.09E+01 7.86E+01
Worst
1.70E+02 1.61E+02 1.35E+02 1.49E+02
1.26E+02
Average
7.29E+06 7.21E+06 6.27E+06 2.72E+02
6.29E+01
Standard deviation (σ) 2.52E+03 1.49E+04 9.77E+04 3.17E+02
8.86E+01
Best
7.28E+06 7.18E+06 6.14E+06 1.58E+01
7.28E+00
Worst
7.29E+06 7.22E+06 6.39E+06 1.12E+03
3.89E+02
Average
2.62E+02 2.57E+02 2.25E+02 3.03E-03
9.98E-04
Standard deviation (σ) 2.69E-01
1.92E+00 2.66E+00 5.54E-03
9.28E-04
Best
2.61E+02 2.53E+02 2.23E+02 9.76E-04
1.58E-05
Worst
2.62E+02 2.58E+02 2.30E+02 2.32E-02
5.50E-03
S is fixed in 0.5. The metrics are; the best, worst, average and standard deviation. Table 3 shows that better results are found with V is with the value of 0.9. The italic term represents the better average and black color indicates the best results for each mathematical function. The simulations result with 30 dimensions, with S and varying V is presented in Tables 4 and 5. Table 5 indicates the values to find by AFSA with 30 dimensions varying the V parameter. Tables 4 and 5 show that better results are found with the value in the parameter of the S of 0.9. Tables 6 and 7 show the simulation results with 50 dimensions, with S and V varying, respectively.
84
C. Peraza et al.
Table 3 Benchmark sets of functions varying the “V ” value for 10 dimensions Function Performance index
Varying V 0.1
F1
Average
F2
F3
F4
F6
0.7
1.77E-01
1.87E-01
0.9 1.86E+01
8.87E-02
3.68E+00
2.01E+01 1.98E+01 1.98E+01 1.95E+01
3.23E+00
Worst
2.05E+01 2.04E+01 2.05E+01 1.99E+01
1.99E+01
Average
3.62E+02 3.51E+02 2.57E+02 1.43E+00
7.83E-01
Standard deviation (σ) 1.79E+01 1.61E+01 3.01E+01 7.99E-01
5.38E-01
Best
3.24E+02 3.07E+02 2.01E+02 4.11E-01
1.62E-01
Worst
3.93E+02 3.81E+02 3.21E+02 3.43E+00
2.13E+00
Average
2.65E+01 2.65E+01 1.79E+01 3.53E+00
3.18E+00
Standard deviation (σ) 2.09E+00 2.65E+00 2.25E+00 7.01E-01
1.01E+00
Best
2.18E+01 1.92E+01 1.29E+01 2.00E+00
1.97E+00
Worst
3.04E+01 3.21E01
6.90E+00
Average
1.45E+02 1.48E+02 1.38E+02 1.01E+02
8.21E+01
Standard deviation (σ) 1.17E+01 1.01E+01 1.26E+01 3.09E+01
3.12E+01
Best F5
0.5
2.03E+01 2.01E+01 2.01E+01 1.97E+01
Standard deviation (σ) 1.07E-01 Best
0.3
2.31E+01 4.76E+00
1.24E+02 1.29E+02 1.14E02
1.62E+01 3.38E+01
Worst
1.68E+02 1.72E+02 1.67E+02 1.44E+02
1.47E+02
Average
7.29E+06 7.22E+06 6.39E+06 2.17E+02
4.81E+01
Standard deviation (σ) 2.07E+03 6.19E+03 7.34E+04 1.96E+02
5.94E+01
Best
7.28E+06 7.20E+06 6.22E06
5.85E+00
2.12E-01
Worst
7.29E+06 7.23E+06 6.52E+06 8.85E+02
2.50E+02
Average
2.62E+02 2.58E+02 2.24E+02 3.15E-03
8.75E-04
1.12E+00 3.70E+00 5.30E-03
4.83E-04
Best
Standard deviation (σ) 1.78E-01
2.61E+02 2.54E+02 2.14E+02 8.57E-05
2.32E-06
Worst
2.62E+02 2.58E+02 2.31E02
2.70E-03
2.70E-02
When the dimensions are increased to 50, the best results are with values of 0.7 and 0.9 of the S parameter. Table 7 shows that when V has a value of 0.9 best results are found by AFSA.
6 Analysis of the Parameters in AFSA This paper presents an analysis of the performance of the AFSA variating two important parameters which as; S and V values. Based on simulation, both parameters were changed with the 0.1, 0.3, 0.5, 0.7 and 0.9 values individually, for example, be fixed S to 0.5 and changed V and the experiments, and be fixed V to 0.5 and changed
Artificial Fish Swarm Algorithm for the Optimization …
85
Table 4 Benchmark sets of functions varying the “S” value for 30 dimensions Function Performance index
Varying S 0.1
F1
Average
F2
F3
F4
F5
F6
0.5
0.7
2.09E+01 2.09E+01 2.07E+01 1.99E+01
Standard deviation (σ) 4.02E-02 Best
0.3 1.01E-01
2.15E-01
0.9 1.97E+01
1.28E-01
9.73E-02
2.08E+01 2.07E+01 2.01E+01 1.97E+01
1.96E+01
Worst
2.10E+01 2.10E+01 2.10E+01 2.03E+01
1.99E+01
Average
1.26E+03 1.27E+03 1.09E+03 9.98E+01
4.92E+01
Standard deviation (σ) 3.75E+01 2.91E+01 7.48E+01 4.37E+01
2.59E+01
Best
1.21E+03 1.19E+03 8.98E+02 5.18E+01
6.17E+00
Worst
1.34E+03 1.30E+03 1.22E+03 2.40E+02
9.39E+01
Average
2.92E+02 2.78E+02 2.35E+02 3.51E+01 4.92E+01
Standard deviation (σ) 1.51E+01 1.40E+01 2.76E+01 1.30E+01
2.59E+01
Best
2.35E+02 2.55E+02 1.80E+02 2.30E+01
1.11E+01
Worst
3.17E+02 3.10E+02 3.05E+02 8.04E+01
3.62E+01
Average
5.67E+02 5.61E+02 5.42E+02 4.06E+02
2.74E+02
Standard deviation (σ) 5.22E+01 1.61E+01 2.94E+01 4.03E+01
4.63E+01
Best
2.02E+02
2.99E+02 5.32E+02 4.59E+02 3.29E+02
Worst
6.04E+02 5.96E+02 5.86E+02 4.66E+02
3.77E+02
Average
2.35E+07 2.32E+07 2.06E+07 7.44E+03
3.61E+02
Standard deviation (σ) 9.27E+03 4.28E+04 3.59E+03 5.48E+03
2.97E+02
Best
2.35E+07 2.31E+07 1.99E+07 1.04E+03
5.94E+01
Worst
2.35E+07 2.33E+07 2.15E+07 2.56E+04
1.19E+03
Average
7.84E+02 7.69E+02 6.69E+02 1.25E-02
1.44E-03
4.83E+00 7.95E+00 2.19E-02
7.88E-04
Best
Standard deviation (σ) 9.81E-01
7.83E+02 7.60E+02 6.48E+02 3.35E-04
8.38E-06
Worst
7.86E+02 7.75E+02 6.86E+02 8.13E-02
3.90E-03
S varying the values before mentioned. Figure 4 illustrates the idea general in this paper. A general analysis in both parameters is presented based on experimentation. In this paper an important contribution consists and identifies what parameters affect the exploitation and exploration in the behavior of the AFSA. Tables 8 and 9 show an analysis for the Sphere function with the values obtained by AFSA for S and V parameters. Based on the experiments, V parameter in AFSA represents the exploitation when V is greater than 0.9 better results are found in the benchmark mathematical functions, for example; the function of Sphere (F6), with V value of 0.5 and S value of 0.9 of best found by AFSA is of 1.58E-05, and with S value of 0.5 and V value of 2.32E-06. Similarly, for S parameters, better results are found when this value is in the range
86
C. Peraza et al.
Table 5 Benchmark sets of functions varying the “V ” value for 30 dimensions Function Performance Index
Varying V 0.1
F1
Average
F3
F4
F5
F6
0.5
0.7
2.09E+01 2.07E+01 2.07E+01 1.99E+01
Standard deviation (σ) 5.71E-02
F2
0.3 1.30E-01
0.9 1.98E+01
1.52E+01 9.79E-02
8.44E-02
Best
2.08E+01 2.04E+01 2.04E+01 1.98E+01
1.95E+01
Worst
2.10E+01 2.10E+01 2.10E+01 2.02E+01
1.99E+01
Average
1.27E+03 1.25E+03 1.03E+03 1.10E+02
5.92E+01
Standard deviation (σ) 3.63E+01 3.02E+01 1.98E+02 4.29E+01
2.42E+01
Best
1.20E+03 1.16E+03 2.58E+01 4.90E+01
2.71E+01
Worst
1.33E+03 1.31E+03 1.17E+03 2.42E+02
1.36E+02
Average
2.86E+02 2.82E+02 2.26E+02 3.70E+02
5.92E+01
Standard deviation (σ) 1.68E+01 1.11E+01 2.50E+01 5.10E+01
2.42E+01
Best
2.57E+02 2.58E+02 1.78E+02 2.53E+02
1.72E+01
Worst
3.20E+02 3.08E+02 2.77E+02 4.64E+02
1.36E+02
Average
5.61E+02 5.63E+02 5.50E+02 3.70E+02
3.05E+02
Standard deviation (σ) 2.59E+01 2.19E+01 2.61E+01 5.01E+01
5.96E+01
Best
5.10E+02 4.92E+02 4.90E+02 2.53E+02
2.00E+02
Worst
6.04E+02 5.98E+02 6.03E+02 4.64E+02
4.26E+02
Average
2.35E+07 2.32E+07 2.06E+07 8.32E+03
4.45E+02
Standard deviation (σ) 8.97E+03 4.79E+04 2.72E+05 6.54E+03
2.48E+02
Best
2.35E+07 2.31E+07 2.02E+07 1.45E+03
7.42E+01
Worst
2.35E+07 2.33E+07 2.12E+07 2.76E+04
1.14E+03
Average
7.84E+02 7.68E+02 6.68E+02 4.47E-03
1.19E-03
Standard deviation (σ) 1.09E+00 7.15E+00 1.03E+01 5.44E-03
1.20E-03
Best
7.82E+02 7.51E+02 6.49E+02 4.20E-04
1.48E-07
Worst
7.86E+02 7.75E+02 6.93E+02 2.14E-02
6.30E-03
of [0.7 to 0.9], therefore this parameter is considered to aid in the exploration, in this case, the main analysis consists in identifying the optimal values for each parameter analyzed of S and V. As part of the analysis in this article, the following can be provided; with respect to the variation of S; when 10 dimensions are used, all the functions present better results (see Table 2). For example, for the sphere function, good results are obtained in 10, 30 and 50 dimensions, when increasing the size of the dimensions the results are higher in most of the functions, except that of sphere. Regarding the analysis of the behavior of the values varying V is the following; for few dimensions, better results are obtained when V has a high value, example 0.7 and 0.9, however for dimensions of 30 and 50 the results are not very competitive compared to 10 dimensions, then the variation of V in 30 dimensions is better using
Artificial Fish Swarm Algorithm for the Optimization …
87
Table 6 Benchmark sets of functions varying the “S” value for 50 dimensions Function Performance index
Varying S 0.1
F1
F2
Average
2.10E+01
F4
F5
F6
0.5
0.7
0.9
9.27E+00 1.50E+01 7.87E+00
1.53E+01
Standard deviation (σ) 5.03E-02
4.54E+00 4.65E-01
3.85E+00
1.50E+00
Best
2.09E+01
1.31E+00 1.40E+01 1.53E+00
1.15E+01
Worst
2.11E+01
2.09E+01 1.57E+01 1.46E+01
1.76E+01
Average
9.27E+00 2.19E+03 1.70E+03 1.00E+03
4.12E+02
Standard deviation (σ) 4.54E+00
F3
0.3
2.87E+01 1.53E+02 4.97E+00
1.50E+02
Best
1.31E+01
2.13E+03 1.44E+03 2.93E+02
1.33E+02
Worst
2.09E+01
2.24E+03 1.98E+03 1.71E+03
6.56E+02
Average
8.17E+02
8.17E+02 6.85E+02 1.71E+02
1.37E+02
Standard deviation (σ) 2.18E+01
2.18E+01 6.36E+01 4.97E+00
2.91E+01
Best
7.77E+02
7.77E+02 5.78E+02 1.64E+02
6.32E+01
Worst
8.59E+02
5.89E+02 8.21E+02 1.94E+02
1.82E+02
Average
9.89E+02
9.89E+02 1.07E+03 7.36E+02
6.01E+02
Standard deviation (σ) 1.28E+01
1.28E+01 6.24E+01 8.39E+01
2.30E+01
Best
9.47E+02 9.44E+02 6.00E+02
5.47E+02
9.47E+02
Worst
1.01E+03
1.03E+03 1.17E+03 8.97E+02
7.05E+02
Average
3.94E+07
3.94E+07 3.40E+07 1.59E+04
5.57E+03
Standard deviation (σ) 3.92E+04
3.92E+04 6.00E+05 1.05E+04
2.62E+03
Best
3.93E+07
3.93E+07 3.30E+07 1.01E+03 1.06E+03
Worst
3.94E+07
3.94E+07 3.51E+07 4.74E+04
9.83E+03
Average
1.27E+03
1.27E+03 1.10E+03 1.06E-01
5.16E-02
Standard deviation (σ) 4.29E+00
4.29E+00 1.69E+01 6.03E-02
3.64E-02
Best
1.27E+03
1.27E+03 1.07E+03 8.55E-04
1.92E-04
Worst
1.29E+03
1.29E+03 1.13E+03 2.03E-01
1.14E-01
a V of 0.9 than a V of 0.1, the same happens with 50 dimensions, the best results obtained are with a V of 0.9, which means that the higher the V value, the better results obtained.
7 Comparison with Others Metaheuristics Based on experimentation and analysis of the behavior of the AFSA, the best values are the values in S and V parameters are 0.5 and 0.9, respectively. A total of three metaheuristics are analyzed for three mathematical functions. Table 10 shows a comparison with other important metaheuristics, specifically, for the Rastrigin (F4) function
88
C. Peraza et al.
Table 7 Benchmark sets of functions varying the “V” value for 50 dimensions Function Performance index
Varying V 0.1
F1
Average
F3
F4
F5
F6
0.5
0.7
2.10E+01 2.09E+01 2.09E+01 2.02E+01
Standard deviation (σ) 2.22E-01
F2
0.3 8.54E-02
1.27E-01
0.9 1.98E+01
1.30E-01
9.03E-02
Best
2.01E+01 2.08E+01 2.05E+01 1.99E+01
1.96E+01
Worst
2.11E+01 2.11E+01 2.11E+01 2.04E+01
2.02E+01
Average
2.04E+03 2.20E+03 1.90E+03 3.90E+02
2.35E+02
Standard deviation (σ) 5.43E+02 3.62E+01 3.18E+02 6.39E+01
5.01E+01
Best
3.96E+02 2.12E+03 2.67E+02 2.75E+02
1.50E+02
Worst
2.28E+03 2.31E+03 2.13E+03 4.97E+02
3.37E+02
Average
8.59E+02 8.39E+02 7.36E+02 1.50E+02
9.24E+01
Standard deviation (σ) 3.43E+02 2.81E+01 5.73E+01 2.83E+01
2.62E+01
Best
7.71E+02 7.81E+02 5.78E+02 1.00E+02
4.97E+01
Worst
9.06E+02 8.97E+02 8.32E+02 2.24E+02
1.55E+02
Average
1.01E+03 1.00E+03 9.83E+02 7.19E+02
5.74E+02
Standard deviation (σ) 2.79E+01 3.35E+01 3.77E+01 6.04E+01
6.47E+01
Best
9.55E+02 9.25E+02 8.87E+02 6.00E+02
4.72E+02
Worst
1.06E+03 1.06E+03 1.04E+03 8.28E+02
7.77E+02
Average
3.97E+07 3.92E+07 3.48E+07 1.77E+04
2.06E+03
Standard deviation (σ) 2.10E+04 7.89E+04 4.97E+05 1.09E+04
1.32E+03
Best
3.96E+07 3.91E+07 3.41E+07 5.06E+03
3.78E+02
Worst
3.97E+07 3.93E+07 3.58E+07 5.87E+04
5.14E+03
Average
1.31E+03 1.28E+03 1.09E+03 1.83E-02
3.90E-03
Standard deviation (σ) 2.18E+00 1.32E+01 1.82E+02 2.49E-02
6.70E-03
Best
1.30E+03 1.25E+03 1.32E+02 4.07E-04
9.00E-05
Worst
1.31E+03 1.29E+03 1.15E+03 1.26E-01
3.46E-02
with the variation of the parameter to V, this comparison is with 30 dimensions for two metaheuristics, such as; Bee Colony Optimization (BCO) [2] and Harmony search algorithm (HS) [19]. Table 10 shows better results with HS compare to AFSA respect to average, this comparative demonstrates that AFSA has an error close to HS. Table 11 shows a comparison with the metaheuristic called Differential Evolution (DE) [18] specifically for the Ackley functions with 30 dimensions compare to the S value of 0.5 and V of 0.9. Analyzing the results in Table 11, the best result (minimal value) is better with AFSA with a value of 1.95E+01 compare to DE that the value is of 3.21E+02. Table 12 shows a comparison with the metaheuristic of DE [18] specifically for
Artificial Fish Swarm Algorithm for the Optimization …
89
Fig. 4 Flowchart of the idea general in the proposal of this paper Table 8 Results for the shpere function with variyng in “S” value Function
Dimensions 10
Sphere (F6)
30
50
0.7
0.9
0.7
0.9
0.7
0.9
3.03E-03
9.98E-04
1.25E-02
1.44E-03
1.06E-01
5.16E-02
Table 9 Results for the shpere function with variyng in “V ” value Function
Dimensions 10
Sphere (F6)
30
50
0.7
0.9
0.7
0.9
0.7
0.9
3.15E-03
8.75E-04
4.47E-03
1.19E-03
1.83E-02
9.00E-05
Table 10 Comparative results with meta-heuristics algorithms for Griewank function Meta-Heuristics algorithms
Performance index Average
Standard deviation (σ)
Best
Worst
Proposed method (AFSA)
3.05E+02
5.96E+01
2.00E+02
4.26E+02
BCO [2]
5.23E-03
N/A
1.14E-04
4.96E-02
HS [19]
5.56E+01
N/A
N/A
N/A
90
C. Peraza et al.
Table 11 Comparative results with meta-heuristics algorithms for Ackley function Meta-Heuristics algorithms
Performance index Average
Standard deviation (σ)
Best
Worst
Proposed method (AFSA)
1.98E+01
8.44E-02
1.95E+01
1.99E+01
DE [18]
3.21E+02
3.21E+02
3.21E+02
3.21E+02
Table 12 Comparative results with meta-heuristics algorithms for Rosenbrock function Meta-Heuristics algorithms
Performance index Average
Standard deviation (σ)
Best
Worst
Proposed method (AFSA)
4.45E+02
2.48E-02
7.42E+01
1.14E+03
DE [18]
1.06E+03
4.21E+01
9.31E+02
1.13E+03
the Rosenbrock functions with 30 dimensions compare to the S value of 0.5 and V of 0.9. Finally, Table 12 respect to Rosenbrock the best algorithm is AFSA with an average of 4.45E+02 compare to DE that has a value of 4.06E+03, other analysis is the standard deviation, with AFSA the value is of 2.48E+02 and for DE the value is 4.21E+01. This metrics and analysis allow to demonstrated the efficiency that this proposed method of the variation of the values in S and V parameters.
8 Conclusions An analyzed of the behavior in S and V parameters is presented in this paper, the contributions most important with which it can be concluded are the following; when lower dimensions for example 10, the AFSA with the variation of S value presents best results. All mathematical functions have the best results when the number of dimensions increasing to 50 and when the V value is of 0.9 (see Tables 6 and 7), thus based on experimentation, an important conclusion is that V parameter represents the exploitation in the algorithm and S indicates the exploration, because the larger V (0.9) the search area is limited to finding local minima in the algorithm, as well as the larger the S means that the algorithm is performing the exploitation in the search space.
Artificial Fish Swarm Algorithm for the Optimization …
91
As a future works, the proposed analysis in this paper is a principal idea to realize a dynamic adjustment with the values of S and V parameters using a fuzzy logic system, and explore other study cases such as: fuzzy controllers and fuzzy recognition systems.
References 1. Bernal, E., Castillo, O., Soria, J., Valdez, F.: Galactic swarm optimization with adaptation of parameters using fuzzy logic for the optimization of mathematical functions. In: Fuzzy Logic Augmentation of Neural and Optimization Algorithms: Theoretical Aspects and Real Applications, pp. 131–140. Springer, Cham (2018) 2. Castillo, O., Amador-Angulo, L.: A generalized type-2 fuzzy logic approach for dynamic parameter adaptation in bee colony optimization applied to fuzzy controller design. Inf. Sci. 460, 476–496 (2018) 3. Chen, Y., Zhu, Q., Xu, H.: Finding rough set reducts with fish swarm algorithm. Knowl. Based Syst. 81, 22–29 (2015) 4. Chen, G.Z., Wang, J.Q., Li, C.J., Lu, X.Y.: An improved artificial fish swarm algorithm and its applications. Syst. Eng. 27(12), 105–110 (2009) 5. Gao, S., Wen, Y.: An improved artificial fish swarm algorithm and its application. In: 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), pp. 649–652. IEEE (2018) 6. He, Q., Hu, X., Ren, H., Zhang, H.: A novel artificial fish swarm algorithm for solving largescale reliability–redundancy application problem. ISA Trans. 59, 105–113 (2015) 7. He, S., Belacel, N., Hamam, H., Bouslimani, Y.: Fuzzy clustering with improved artificial fish swarm algorithm. In: 2009 International Joint Conference on Computational Sciences and Optimization, vol. 2, pp. 317–321. IEEE (2009) 8. Huang, X., Xu, G., Xiao, F.: Optimization of a novel Urban growth simulation model integrating an artificial fish swarm algorithm and cellular automata for a smart city. Sustainability 13(4), 2338 (2021) 9. Hua, Z., Xiao, Y., Cao, J.: Misalignment fault prediction of wind turbines based on improved artificial fish swarm algorithm. Entropy 23(6), 692 (2021) 10. Huang, J., Zeng, J., Bai, Y., Cheng, Z., Feng, Z., Qi, L., Liang, D.: Layout optimization of fiber Bragg grating strain sensor network based on modified artificial fish swarm algorithm. Opt. Fiber Technol. 65, 102583 (2021) 11. Jibril, Y., Salawudeen, A.T., Salawu, A., Zainab, M.: An optimized PID controller for deep space antenna DC motor position control using modified artificial fish swarm algorithm. Yanbu Journal of Engineering and Science 13(1), 45–54 (2021) 12. Krishnaraj, N., Jayasankar, T., Kousik, N.V., Daniel, A.: 2 Artificial Fish Swarm Optimization Algorithm with Hill Climbing Based Clustering Technique for Throughput Maximization in Wireless Multimedia Sensor Network (2021) 13. Li, T., Yang, F., Zhang, D., Zhai, L.: Computation scheduling of multi-access edge networks based on the artificial fish swarm algorithm. IEEE Access 9, 74674–74683 (2021) 14. Lin, M., Hong, H., Yuan, X., Fan, J., Ji, Z.: Inverse kinematic analysis of bionic hands based on fish swarm algorithm. J. Phys. Conf. Ser. 1965(1), 012006 (2021). (IOP Publishing) 15. Liu, Q., Odaka, T., Kuroiwa, J., Ogura, H.: Application of an artificial fish swarm algorithm in symbolic regression. IEICE Trans. Inf. Syst. 96(4), 872–885 (2013) 16. Long, W., Jiao, J., Liang, X., Tang, M.: Inspired grey wolf optimizer for solving large-scale function optimization problems. Appl. Math. Model. 60, 112–126 (2018) 17. Luo, Y., Zhang, J., Li, X.: The optimization of PID controller parameters based on artificial fish swarm algorithm. In: 2007 IEEE International Conference on Automation and Logistics, 2007, pp. 1058–1062. IEEE
92
C. Peraza et al.
18. Ochoa, P., Castillo, O., Soria, J.: The differential evolution algorithm with a fuzzy logic approach for dynamic parameter adjustment using benchmark functions. In: Hybrid Intelligent Systems in Control, Pattern Recognition and Medicine, pp. 169–179. Springer, Cham (2020) 19. Peraza, C., Valdez, F., Castillo, O.: Harmony search with dynamic adaptation of parameters for the optimization of a benchmark set of functions. In: Hybrid Intelligent Systems in Control, Pattern Recognition and Medicine, pp. 97–108. Springer, Cham (2020) 20. Pourpanah, F., Wang, R., Lim, C.P., Yazdani, D.: A review of the family of artificial fish swarm algorithms: recent advances and applications (2020). arXiv:2011.05700 21. Valdez, F., Vazquez, J.C., Melin, P., Castillo, O.: Comparative study of the use of fuzzy logic in improving particle swarm optimization variants for mathematical functions using co-evolution. Appl. Soft Comput. 52, 1070–1083 (2017) 22. Wang, L.G., Shi, Q.H.: Parameters analysis of artificial fish swarm algorithm. Comput. Eng. 36(24), 169–171 (2010) 23. Xiao, J., Zheng, X., Wang, X., Huang, Y.: A modified artificial fish-swarm algorithm. In: 2006 6th World Congress on Intelligent Control and Automation, vol. 1, pp. 3456–3460. IEEE (2006) 24. Yazdani, D., Akbarzadeh-Totonchi, M.R., Nasiri, B., Meybodi, M.R.: A new artificial fish swarm algorithm for dynamic optimization problems. In: 2012 IEEE Congress on Evolutionary Computation, pp. 1–8. IEEE (2012) 25. Zhang, Y., Guan, G., Pu, X.: The robot path planning based on improved artificial fish swarm algorithm. In: Mathematical Problems in Engineering (2016) 26. Zhang, C., Zhang, F.M., Li, F., Wu, H.S.: Improved artificial fish swarm algorithm. In: 2014 9th IEEE Conference on Industrial Electronics and Applications, pp. 748–753. IEEE (2014) 27. Zhou, J., Qi, G., Liu, C.: A Chaotic parallel artificial fish swarm algorithm for water quality monitoring sensor networks 3D coverage optimization. J, Sens. (2021) 28. Zou, L., Li, H., Jiang, W., Yang, X.: An improved fish swarm algorithm for neighborhood rough set reduction and its application. IEEE Access 7, 90277–90288 (2019)
Hierarchical Logistics Methodology for the Routing Planning of the Package Delivery Problem Norberto Castillo-García, Paula Hernández-Hernández, and Edilberto Rodríguez Larkins
Abstract This chapter proposes a hierarchical logistics methodology based on artificial intelligence and optimization for the routing problem of package delivery companies. These companies face several operational problems in a daily basis. The routing problem in the context of the package delivery companies consists in determine the best route to locally deliver the packages to the customers. The hierarchical logistics methodology consists of two phases. In the first phase the packages whose geographic distances are relatively close are grouped. In the second phase, once the clusters are obtained, the route in each one is optimized. In the literature, this methodology implements the classical fuzzy c-means algorithm and exact optimization. Unlike the literature, in the first phase we propose a variant of fuzzy c-means which measures the proximity between each pair of locations by means of the geographic distance instead of the Euclidean distance. In the second phase we propose the use of approximate optimization through an ant colony optimization algorithm. Keywords Fuzzy C–means · Ant colony optimization · Traveling salesman problem · Package delivery problem · Routing problem
1 Introduction The purpose of package delivery companies is to send products from one place to another. The products are typically packed in rectangular–shaped boxes, and hence, they are called packages. Due to the nature of these companies, they face several operational problems every day. In the literature there are documented two major problems that delivery companies have to overcome: how to stowage the packages in the delivery truck and how to determine the best route for locally delivering the packages [1]. N. Castillo-García · P. Hernández-Hernández (B) · E. Rodríguez Larkins Department of Engineering, Tecnológico Nacional de México/I.T, Altamira, Mexico e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_7
93
94
N. Castillo-García et al.
In this chapter we focus on the second major problem, that is, the problem of determining the best route to efficiently deliver the packages. Specifically, the optimization of a route consists in finding the sequence of destinations (delivery locations) to be visited such that the total cost of the route is minimized. Clearly, there are some benefits in optimizing the routes. One of the benefits is the reduction of fuel consumption. This is so because the cost of the route is typically expressed as the distance that the delivery truck must travel to deliver the packages to the customers (destinations). Since the total cost of the route is minimum, the distance traveled by a delivery truck is also minimum as well as the fuel consumed. Another benefit is the minimization of pollutant emissions since the delivery truck spends less time on the road. In the literature there is one work that deals with the optimization of routes for package delivery companies [2]. In that work the authors propose a two–phase methodology. In the first phase, the methodology group together the destinations that are relatively close to each other by means of the well-known Fuzzy C-Means algorithm (FCM). Then, in the second phase, each cluster is optimized by solving the mathematical formulation for the asymmetric variant of the Traveling Salesman Problem (TSP) with the subtour elimination constraints proposed by Miller, Tucker and Zemlin [3]. It is important to point out that the optimization process is performed by the optimization software CPLEX. In this study we propose a hierarchical logistics methodology which is based on the methodology proposed in [2]. Unlike that work, in the first phase we cluster the destinations by means of a variant of FCM. The variant consists in the use of the geographic distance instead of the classical Euclidean distance. We consider important to mention that FCM has been successfully used in several domains such that [4–6]. Additionally, in the second phase we propose a heuristic approach rather than the exact approach used in [2]. Specifically, we use the well–known Ant Colony Optimization algorithm (ACO) in this phase. The heuristic approach used in the second phase allows us to solve larger instances without depending on any commercial optimization software. The remainder of this chapter is organized as follows. In Sect. 2 we formally describe the problem faced. Section 3 describes the hierarchical logistics methodology previously mentioned. In Sect. 4 we report the computational experiment conducted to measure the effect of the parameter β (used by ACO in the second phase) on the behavior of our methodology. Finally, in Sect. 5 we discuss the major findings of this study.
2 Problem Definition Let D = {d1 , d2 , . . . , dn } represent the discrete and finite set of |D| = n destinations. Each destination di is described by an ordered pair that represents its global position on the Earth, i.e., di = (ϕi , λi ). In the ordered pair, ϕi corresponds to the latitude and λi corresponds to the longitude.
Hierarchical Logistics Methodology for the Routing Planning …
95
As mentioned previously, in this study we use the haversine formula [1] to measure the geographic distance between two destinations di and d j . It is important to point out that we use this formula since it is capable to compute the distance between two points placed on a spherical surface such as the Earth. The haversine formula is shown in Eq. (1). (
δ di , d j
)
√ a = 2r tan ∀i, j = 1, . . . , n, i / = j, √ 1−a −1
(1)
where: ) ( • δ di , d j is the geographic ( ) distance (in kilometers) between destinations di = = ϕ , λ d (ϕi , λi ) and j j j . ( ) ϕ j −ϕi λ −λ 2 • a = sin + cos(ϕi )cos ϕ j sin2 j 2 i . 2 • r represents √ the radius of the Earth. In this study we consider r = 6, 371 km. • tan−1 √ a is the angle (in radians) resulting from converting the rectangular 1−a ) (√ √ coordinates a, 1 − a to polar coordinates. The goal of the route planning of package delivery companies is to find a tour of destinations with the minimum cost [2]. In this chapter we model this problem as an application of the well–known Traveling Salesman Problem (TSP). Therefore, the problem consists in finding a permutation of destinations π ★ such that the total cost of the tour (permutation) is the minimum. Formally:
n−1 ( ) ( ) π = argmin δ dπ(i) , dπ(i+1) + δ dπ(n) , dπ(1) , ★
π ∈∏
i=0
where dπ (i ) is the i–th destination in the permutation π , and ∏ is the set of all permutations of destinations, that is, the solution space. Notice that the tour cost is computed from the geographic distances among the set of destinations D.
3 Hierarchical Logistics Methodology This section describes the hierarchical methodology proposed to solve the routing process of package delivery companies described in the previous section. Our methodology consists of two phases: clustering and optimization. The first phase is described in Sect. 3.1 and the second phase in Sect. 3.2.
96
N. Castillo-García et al.
3.1 Phase 1: Clustering by FCM The idea of the first phase of the methodology is to cluster the delivery locations according to their global position. Thus, the destinations that are relatively close to each other (in geographic distance) will be in the same cluster. The goal is to assign one delivery truck to one particular cluster. In order to perform the clustering process, we use the well–known Fuzzy C-Means (FCM) algorithm. Nevertheless, unlike in the literature, we do not use the Euclidean distance. Instead, we use the geographic distance among the destinations according to Eq. (1). Algorithm 1 shows the pseudocode of FCM. Algorithm 1: Fuzzy c–means algorithm
As can be observed in Algorithm 1, FCM requires the number of clusters (k), the level of fuzzification (m , ), the level of tolerance (∈), and the set entire of destinations (D). In the methodology, the number of clusters is a given value since it depends on the number of delivery trucks of the company, and hence, FCM does not determine it. The level of fuzzification m , can theoretically take any value in the domain (1, ∞). Nevertheless, in this study we set m , = 2 since it is the value recommended in the literature [2, 7]. The level of tolerance ∈ is a small value used to determine if FCM should continue its execution or not. Here we also use the value recommended in the literature, i.e., ∈ = 0.001 [2, 7]. In addition to the previous parameters, the algorithm also requires the set of delivery locations. As mentioned previously, the set of delivery locations consists of a collection of ordered pairs (ϕi , λi ) with 1 ≤ i ≤ n, in which ϕi and λi stands for the latitude and longitude of the i–th destination, respectively. FCM operates in the following way. The algorithm firstly initializes the matrix Uk×n , which is used to record the membership levels of the n delivery locations to the k clusters. Since this matrix stores membership levels, all the entries are real values between zero and one. As recommended in the literature [2], we initialize this matrix in the following way. Each column of U has exactly one entry whose value is 1 and the remaining entries are zero. In addition, the rows of U does not contain only zero values. Figure 1 depicts an example of a matrix U for k = 3 clusters and
Hierarchical Logistics Methodology for the Routing Planning …
97
Fig. 1 Initialization of a matrix U with k = 3 rows (clusters) and n = 6 columns (destinations)
n = 6 destinations. The second step of FCM is to assign the value of false to the flag stop. Steps 3–8 constitutes the main loop of the algorithm. Step 4 (first step in the main loop) consists in computing the centroids v1 , v2 , . . . , vk considering both the current values of U and the set of destinations D = {(ϕ1 , λ1 ), (ϕ2 , λ2 ), ..., (ϕn , λn )}. The computation of the centroids is given by the following equations: )m , Ui j × ϕj Σn ( )m , ∀i = 1, . . . , k, j=1 Ui j
Σn vi,ϕ =
Σn
(
j=1
,
(Ui j )m ×λ j ∀i = 1, . . . , k. , Σn m j=1 (Ui j ) Once the k centroids have been computed, FCM executes step 5. In this step, the algorithm computes the distances from the centroids to the destinations. Unlike the literature, in this study we propose to compute the geographic distances [see Eq. (1)] rather than the Euclidean distance.( Thus,) the geographic distances from the centroids to the destinations are given by δ vi , d j . The following step of FCM consists in updating the matrix U according to the recently computed geographic distances from the centroids to the destinations. The purpose of this process is to recompute the membership levels of the destinations to the new centroids. This process is performed by the following equation:
and vi,λ =
j=1
upt
Ui j
⎡ ⎤−1 ( ) 2/ m , −1 k ⎢ δ vi , d j ⎥ ( ) =⎣ ⎦ ∀i = 1, . . . , k, j = 1, ..., n. δ v , d l j l=1
Finally, step 7 of FCM consists in determining if the |algorithm should continue its | | upt prev | execution. This is achieved as follows. If |Ui j − Ui j | ≤ ∈, ∀i j then the procedure stopExecution() returns the value of true and the execution of FCM must be stopped. Otherwise, the procedure returns false and the execution of FCM must continue. Here, matrix Uprev represents the membership values of the previous iteration. Finally, in Step 9 FCM outputs the k clusters of delivery locations.
98
N. Castillo-García et al.
3.2 Phase2: Optimization by ACO In the second phase of the methodology, we perform an optimization process in order to determine the best route for each cluster. Unlike the literature [2], in this chapter we approach this phase heuristically. More precisely, we propose an Ant Colony Optimization (ACO) algorithm to optimize each cluster found in the previous phase. ACO is an algorithm based on the collaborative behavior of real ants [8, 9]. Algorithm 2 shows the general pseudocode of ACO. Algorithm 2: Ant Colony Optimization algorithm
As can be observed from Algorithm 2, ACO requires the number of ants (m), the parameter used to intensify the contribution of the pheromone trail (α), the parameter used to intensify the contribution of the heuristic information (β), the evaporation rate in the local update (ϕ), the evaporation rate in the global update (ρ), the initial value of the pheromone trail (τ0 ), the subset of n geographic distances of destinations in the current cluster (Dn×n ), and the maximum number of ant generations (G M AX ). ACO starts by setting the initial values of both the pheromone table τ and heuristic information table η. On the one hand, the pheromone table is initialized with a small value τ0 , i.e., τi j ← τ0 for all i, j = 1, ..., n. On the other hand, the heuristic information is initialized as follows: ηi j ←
1 ∀i, j = 1, ..., n, Di j
where Di j is the geographic distance between destinations i and j. Once the pheromone trails and heuristic information have been initialized, each ant constructs a tour (solution) by iteratively selecting the next destination to be visited.
Hierarchical Logistics Methodology for the Routing Planning …
99
This selection is performed probabilistically by means of the roulette wheel selection and taking into account both the pheromone trail and the heuristic information according to the following equation: α β τi j · ηi j Pi j = Σ α β τi j · ηi j where i is the current node, j is an unselected node and Pi j is the probability of selecting arc (i, j ) to be included in the current tour. The algorithm computes the probabilities of the nodes which are not in the current tour and selects one of them according to the roulette wheel selection. In this chapter we follow the roulette wheel implementation documented in [10]. When an ant has selected the following node, the algorithm locally updates the pheromone trail of the selected arc. This process is performed as follows: τi j = (1 − ϕ)τi j + ϕτ0 where i is the current node and j is the selected node. Once an ant has completed a tour, the algorithm globally updates the pheromone trails as follows: τi j = (1 − ρ)τi j + ρ∆τ i j with: ∆τ i j =
1 OV(π )
where i = π (1), π (2), ..., π (n), j = π (i + 1), O V (π ) is the objective value of permutation π , and π (i ) is the i-th destination in the tour. This process is repeated until the maximum number of ant generations (G M AX ) is reached.
4 Computational Experiment In this section we report the computational experiment conducted to assess the behavior of the hierarchical logistics methodology proposed. More precisely, the goal of the experiment is to measure the effect of the parameter β on the methodology performance. Recall that β is used to intensify the contribution of the heuristic information of the problem in the second phase of the methodology.
100
N. Castillo-García et al.
Table 1 Parameter values used to conduct the experiment Methodology phase
Parameter value
Meaning
1
k=5
Number of clusters
m , = 2.0
Level of fuzzification
∈ = 0.001
Level of tolerance
m = 10
Number of ants
α = 1.0
Level of contribution of the pheromone trail
ϕ = 0.35
Evaporation rate in the local update
ρ = 0.50
Evaporation rate in the global update
τ0 = 0.001
Initial value of the pheromone trails
G M AX = 100
Maximum number of ant generations
2
The experiment was conducted on a workstation with an AMD Ryzen Threadripper 3960X 24-core processor at 3.8 GHz and 128 GB of RAM. The methodology was implemented in Java and executed in the Java Runtime Environment 1.8.0_201b09. In this study we solved 30 artificial instances generated by the automatic tool proposed in [1]. All of these instances follow a uniform distribution and are categorized as small, medium and large according to the number of destinations. Specifically, there are 10 small instances whose number of destinations ranges from 28 to 100 (i.e., 28 ≤ n ≤ 100); 10 medium instances with 107 ≤ n ≤ 199; and 10 large instances with 209 ≤ n ≤ 287. In the experiment, we evaluate 11 different values for the parameter β, namely: β = {0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0}. We also kept fixed all the remaining parameters. The punctual values for each parameter are reported in Table 1. The hierarchical logistics methodology was executed 30 times for each instance and for each β-value in order to mitigate the randomness effect. Since there are 30 instances and 11 β-values, the total number of executions is 9900. An execution refers to the solution of a given instance with a specific value of β. At each execution, we record the objective value for each cluster. The global objective value of an execution is obtained by summing the objective values and the computing times of each cluster. Table 2 reports the results of the experiment, that is, the average objective values of the 30 repetitions. As can be observed from Table 2, the value of β influences the performance of the methodology. Specifically, small β-values lead to poor objective values whereas large values of β yield better objective values. The behavior of the methodology can be clearly observed in Fig. 2. In this figure we depict 11 box plots that represents the objective values (y-axis) with respect to the β-values (x-axis). Figure 2 provides a clear perspective of the methodology behavior. In this figure we can observe that small values of β not only produce poor objective values on
Medium
Small
Dataset
120.67
119.48
9
10
159.06
227.20
164.96
157.66
7
9
10
232.20
6
8
315.35
177.04
4
5
297.73
151.54
2
3
213.76
109.60
8
1
46.18
142.53
6
101.05
5
7
92.55
76.10
3
4
84.19
137.35
1
0.0
β-values
2
Instance
136.79
141.53
186.31
134.42
193.18
148.20
254.29
129.95
244.44
178.48
106.52
103.97
96.65
123.38
46.17
88.57
68.54
83.76
117.11
76.17
0.5
119.21
118.52
148.45
115.47
156.70
123.46
196.41
108.98
192.04
147.42
95.16
91.40
85.26
107.70
46.17
79.14
63.05
76.77
98.80
70.69
1.0
105.01
104.33
126.06
101.13
131.83
106.88
159.11
95.45
157.91
126.27
86.58
82.29
78.88
95.57
46.17
72.48
59.37
71.62
88.09
67.43
1.5
Table 2 Experimental results for the different values for the parameter β
96.56
95.48
113.11
93.48
119.12
97.76
139.91
87.42
137.69
113.22
80.44
77.93
74.27
87.67
46.17
68.19
57.59
69.15
81.79
65.83
2.0
91.16
90.28
105.15
88.00
111.87
92.04
128.53
81.59
126.38
104.71
77.19
75.41
71.88
82.97
46.17
66.41
56.81
68.00
78.22
64.80
2.5
87.99
86.83
100.26
85.69
105.84
88.68
120.49
78.44
120.15
100.16
75.49
73.73
70.82
80.78
46.17
65.46
56.23
67.13
76.66
64.35
3.0
85.91
84.82
97.68
83.43
102.12
86.91
116.11
76.31
115.10
96.77
74.50
72.71
69.77
79.11
46.27
64.99
56.09
66.50
75.59
64.17
3.5
84.49
82.79
95.84
82.62
99.40
85.62
113.19
75.03
112.69
94.49
74.17
72.15
69.40
78.34
46.33
64.77
55.98
66.25
75.26
64.13
4.0
83.46
82.08
94.53
81.91
98.29
85.02
111.41
74.46
110.17
93.25
73.93
72.03
69.08
77.70
46.43
64.53
55.91
66.11
74.84
64.11
4.5
(continued)
83.28
81.35
93.56
81.49
97.44
84.59
109.53
74.06
109.19
92.21
73.77
71.83
69.06
77.32
46.57
64.51
55.87
66.03
74.56
64.06
5.0
Hierarchical Logistics Methodology for the Routing Planning … 101
Large
Dataset
374.48
478.77
425.38
467.83
485.19
6
8
9
10
338.21
5
7
434.81
430.97
3
4
415.98
384.23
1
0.0
β-values
2
Instance
Table 2 (continued)
380.91
373.64
338.68
377.89
301.80
274.21
347.51
350.27
309.26
337.53
0.5
278.64
276.83
251.61
271.46
229.24
212.30
256.86
261.14
232.06
253.47
1.0
214.31
214.70
197.54
209.15
181.22
171.55
199.95
202.10
181.40
201.58
1.5
179.86
181.59
169.16
175.90
156.52
148.02
168.34
173.70
155.57
171.23
2.0
163.06
162.69
152.60
159.32
142.01
134.96
150.91
156.90
139.60
153.51
2.5
151.42
151.23
141.89
148.72
132.65
126.97
140.23
147.79
130.45
143.10
3.0
145.29
143.50
135.59
141.92
126.61
122.06
133.86
141.38
124.35
136.62
3.5
139.80
139.35
131.75
138.14
122.94
118.78
129.12
137.03
120.65
132.57
4.0
137.20
135.94
128.86
134.84
120.71
116.26
126.35
134.26
118.08
129.40
4.5
134.54
133.87
126.66
132.57
118.54
114.71
124.28
132.16
116.57
127.54
5.0
102 N. Castillo-García et al.
Hierarchical Logistics Methodology for the Routing Planning …
103
Fig. 2 Box plots representing the behavior of the hierarchical logistics methodology with different values of the parameter β
average, but also a large variability. On the contrary, large values of β produces good objective values and a small variability. Furthermore, from the figure we can observe that the methodology behavior is more stable and produces the best results for β-values ranging from 3.0 to 5.0. In order to statistically validate the experimental results and observations, we conducted the Friedman test on the values reported in Table 2. The Friedman test is used to determine if at least one treatment is significantly different from the others. In the context of this study, there are 11 treatments (the β-values) and 30 blocks (the instances). We use the statistical software R to automatically conduct the test. The Friedman test found a p-value less than 2.2 × 10−16 , which implies a confidence level of 99.99%. The result of the Friedman test provides strong evidence that the methodology behavior is significantly affected by the value of the parameter β.
5 Conclusions In this study we have faced the routing planning process of package delivery companies. In particular, we propose a hierarchical methodology consisting in two phases. The first phase consists in clustering the delivery locations by means of the wellknown Fuzzy C-Means (FCM) algorithm. In the second phase we perform a heuristic optimization process on each cluster through the Ant Colony Optimization (ACO)
104
N. Castillo-García et al.
algorithm. In both phases, we measure the geographic distances among the delivery locations by means of the haversine formula since it considers the curvature of the Earth. Unlike the methodology documented in [2], the hierarchical logistics methodology proposed here uses a variant of FCM in which the proximity between each pair of destinations is measured by the geographic distance rather than the Euclidean distance. In addition, in the second phase of the methodology proposed we use a metaheuristic algorithm (ACO) instead of an integer linear programming formulation. Clearly, the metaheuristic algorithm allows us to solve larger instances in a relatively small period of time. We conducted a computational experiment in order to assess the performance of the methodology proposed. Specifically, the experiment consisted in measuring the effect of the parameter β (used by ACO in the second phase) on the behavior of the methodology. The experimental results clearly indicated that the value of this parameter has a significant effect on the quality of the solutions found with 99.99% of confidence. Therefore, taking into account the evidence of this study, we conclude that our methodology is a good alternative to heuristically solve the routing planning of package delivery companies. Acknowledgements The authors would like to thank Tecnológico Nacional de México, and especially the authorities of Instituto Tecnológico de Altamira for their support in this research. The first author also thanks the Mexican Council for Science and Technology (CONACYT) for its support through the Mexican National System of Researchers (SNI) (Grant No. 70157).
References 1. Hernández, P.H., Castillo-García, N., Larkins, E.R., Ruiz, J.G.G., Díaz, S.V.M., Resendiz, E.S.: A fuzzy logic classifier for the three dimensional bin packing problem deriving from package delivery companies application. In: Handbook of Research on Metaheuristics for Order Picking Optimization in Warehouses to Smart Cities, pp. 433–442. IGI Global (2019) 2. Hernández–Hernández, P., Castillo–García, N.: Optimization of route planning for the package delivery problem using fuzzy clustering. In: Technological and Industrial Applications Associated with Intelligent Logistics, pp. 239–252. Springer, Cham (2021) 3. Chen, D.S., Batson, R.G., Dang, Y.: Applied integer programming: modeling and solution. Wiley (2011) 4. Asha, G.R.: Energy efficient clustering and routing in a wireless sensor networks. Proc. Comput. Sci. 134, 178–185 (2018) 5. Miyamoto, S., Ichihashi, H., Honda, K., Ichihashi, H.: Algorithms for fuzzy clustering, pp. 9– 12. Springer, Heidelberg (2008) 6. Su, S., Zhao, S.: An optimal clustering mechanism based on Fuzzy-C means for wireless sensor networks. Sustain. Comput. Inf. Syst. 18, 127–134 (2018) 7. Cruz, P.P.: Inteligencia artificial con aplicaciones a la ingeniería. Alfaomega (2011) 8. Hernández, P., Gómez, C., Cruz, L., Ochoa, A., Castillo, N., Rivera, G.: Hyperheuristic for the parameter tuning of a bio-inspired algorithm of query routing in P2P networks. In: Mexican International Conference on Artificial Intelligence, pp. 119–130. Springer, Berlin, Heidelberg (Nov 2011)
Hierarchical Logistics Methodology for the Routing Planning …
105
9. Hernández, P.H., Cruz-Reyes, L., Melin, P., Mar-Ortiz, J., Huacuja, H.J.F., Soberanes, H.J.P., Barbosa, J.J.G.: An ant colony algorithm for improving ship stability in the containership stowage problem. In: Mexican International Conference on Artificial Intelligence, pp. 93–104. Springer, Berlin, Heidelberg (Nov 2013) 10. Huacuja, H.J.F., Castillo-García, N.: Optimization of the vertex separation problem with genetic algorithms. In: Handbook of Research on Military, Aeronautical, and Maritime Logistics and Operations, pp. 13–31. IGI Global (2016)
A Novel Distributed Nature-Inspired Algorithm for Solving Optimization Problems J. C. Felix-Saul, Mario García Valdez, and Juan J. Merelo Guervós
Abstract Several bio-inspired algorithms use population evolution as analogies of nature. In this paper, we present an algorithm inspired by the biological Life-Cycle of animal species, which consists of several stages: birth, growth, reproduction, and death. As in nature, we intend to execute all these stages in parallel and asynchronously on a population that evolves constantly. From the ground up, we designed the algorithm as a cloud-native solution using the cloud available resources to divide the processing workload among several computers or running the algorithm as a cloud service. The algorithm works concurrently and asynchronously on a constantly evolving population, using different computers (or containers) independently, eliminating waiting times between processes. This algorithm seeks to imitate the natural life cycle, where new individuals are born at any moment and mature over time, where they age and suffer mutations throughout their lives. In reproduction, couples match by mutual attraction, where they may have offspring. Death can happen to everyone: from a newborn to an aged adult, where the individual’s fitness will impact their longevity. As a proof-of-concept, we implemented the algorithm with Docker containers by solving the OneMax problem comparing it with a basic (sequential) GA algorithm, where it shows favorable and promising results. Keywords Distributed bioinspired algorithms · Genetic algorithms · Cloud computing
J. C. Felix-Saul (B) · M. G. Valdez Department of Graduate Studies, Tijuana Institute of Technology, Tijuana, Mexico e-mail: [email protected] J. J. M. Guervós Department of Computer Architecture and Technology, Universidad de Granada, Granada, Spain © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_8
107
108
J. C. Felix-Saul et al.
1 Introduction Bio-inspired algorithms have been very successful when used to solve complex optimization problems [1, 4, 21], but as their complexity increases so does the computing power required [14]. One strategy to address this processing need is to use distributed computing [20], or the resources available in the cloud [5, 7] to help find the solution. This strategy gives us elasticity to adjust (increase or reduce) the computing power to achieve a balance according to the nature of the problem. One of the problems that we have identified in this research is that most bioinspired algorithms are designed with a traditionally sequential perspective [3, 15], where each process must wait for the previous task to finish before continuing. Some architectures address this issue [9, 12, 22], what distinguishes us, is that our proposal presents an algorithm designed in its entirety as a native solution in the cloud, fully distributed, where its processes are executed in parallel and asynchronously. This technique makes it easy to scale the computing power according to the complexity required by the problem [2]. We designed our algorithm using observation, analysis, and abstraction in nature to identify what most species in the animal kingdom have in common, where individuals of a population that best adapt to the environment have greater chances of survival, reproduction, and improvement of the species. Imagining as if the task of writing these rules of evolution was in our hands, questioning how we could solve it? Like nature, we identified what most species have in common in the life cycle. Our algorithm works on a constantly evolving population, experiencing the stages that all living beings undergo [17]: being born, growing up, reproducing, and dying, where each of these stages will be processes (of our algorithm) that randomly affect the population individuals, emulating the organic way. The main contribution of the paper is to design and create a new algorithm with the purpose to demonstrate that it is possible to evolve a population of individuals, similar to a Genetic Algorithm (GA), using a distributed, parallel, and asynchronous methodology by implementation and testing of the algorithm. We present the description of the following sections in our paper. First, we introduce our algorithm inspired by the Life-Cycle of animal species in Sect. 2, followed by our experimental configuration and results in Sect. 3, we analyze and describe some of our research findings in Sect. 4, and we finalize by presenting some inferences based on our experiments’ results in Sect. 5.
2 Proposal Many of the publications with the most influence on our research [8, 11, 12, 22] propose a cloud-native optimization architecture that features population-based algorithms. They present ideas such as those mentioned in Fig. 1, which planted the seed of thought of our current paper.
A Novel Distributed Nature-Inspired Algorithm …
109
Fig. 1 These ideas planted the seed of thought in our algorithm proposal
Our main inspiration for this algorithm was born from the study and observation of nature, where many successful optimization algorithms have previously found their own. Our focus was not on a single animal species but all of them, using general abstraction to identify what those species have in common. Some questions in our train of thought were: How does nature work? Is evolution real? How do we enforce natural selection in evolution? What is the role of the attraction between couples on reproduction? How much of a factor is the attraction’s role in the evolution of the species? What is the role of death in all of this? How much of an impact does death have on the evolution of a species? How do we include those ideas into our algorithm? In our analysis, we consider a broad animal spectrum, from bacteria to humans and lions, finding our inspiration on the biological Life Cycle of the animal species, which consists of several stages [17]: birth, growth, reproduction, and death. As in nature, we intend to execute all these stages in parallel and asynchronously on a population that evolves constantly. The combination of those processes is what we call evolution. One of the main challenges when designing the algorithm was how to capture the essence of life by reflecting a population evolution?. In our minds, it was clear this task requires a combination of simultaneous working forces that influences the population improvement over time. Our strategy was simple: divide and conquer, looking into the clouds. Enter cloud-computing: previous cloud-computing algorithms have proven successful results working in optimization problems [8, 22]. We turned back to our genetic algorithm knowledge, mixed with our inspiration and cloud computing, and designed the algorithm with the quest to divide the processing workload among multiple computers. This strategy would also make it possible to run our algorithm as a cloud service. Theoretically, one consequence of the workload distribution is to obtain lower execution times. This algorithm seeks to imitate the natural life cycle, where new individuals are born at any moment and mature over time, where they age and suffer mutations throughout their lives. In reproduction, couples match by mutual attraction, where they may have offspring. Death can happen to everyone: from a newborn to an aged adult, where the individual’s fitness will impact their longevity. The general model concept is shown in Fig. 2. The general flowchart for the Life-Cycle algorithm is described in Fig. 3.
110
J. C. Felix-Saul et al.
Fig. 2 Algorithm model inspired by the animal species’ biological Life-Cycle
Fig. 3 Life-Cycle algorithm general flowchart
A Novel Distributed Nature-Inspired Algorithm …
111
2.1 Birth This algorithm starts with a randomly generated population, where the processes will interact with the population independently. This means that at any given time, any individual can experience any of these processes. Birth is the first process of our algorithm, and it is responsible for the initial generation of individuals.
2.2 Growth As in nature, all individuals constantly grow, mature, or age. With increasing age, individuals may lose strength but also gain more knowledge to solve problems. We represent this with a possible mutation in each increment of age. The growth process will take an individual to assess whether it’s ready to mature and undergo changes.
2.3 Reproduction The attraction of a couple will depend on fitness: the better individual’s fitness, the more attractive it will be, making it easier to find mating matches. This process will be taking random pairs of individuals to evaluate their attraction as a couple, to try to breed; when the gestation is successful, a new pair of individuals will be born (as the offspring). Not all couples will be compatible, so reproduction will not always be possible, but the problem arises: how to quantify the attraction between two individuals? We could have used several strategies to evaluate this attraction. Considering this algorithm takes a similar focus as the study of bacteria growth in microbiology, where we can observe and analyze the evolution of a population over time. As in nature’ most species, the bigger a specimen is, the more attractive it is to its mating candidates. To be consistent with both ideas in our algorithm, we use the equation of Newton’s Law of Universal Gravitation (1) to calculate the attraction between two individuals, where we visualize the individual fitness as its mass when using the equation. In previous work, Newton’s Law of Universal Gravitation has shown success in helping to solve optimization problems [16, 18]. F=G
m1m2 , r2
where: • F is the gravitational force acting between two objects.
(1)
112
J. C. Felix-Saul et al.
• m1 and m2 are the masses of the objects. • r is the distance between the centers of their masses. • G is the gravitational constant: 6.67430(15)×10−11 m3 ·kg−1 ·s−2 .
2.4 Death The death stage represents the challenges and adversities that life presents to overcome. This process evaluates the individual resistance to survive in the environment. The better fitness the individual has will increase its chances of survival. As time progresses, the demands of nature will also increase, pushing for only the best individuals to survive.
3 Experiments As a proof-of-concept, we implemented the algorithm with Docker containers by solving the OneMax problem to compare it with a traditional GA algorithm using the DEAP (Distributed Evolutionary Algorithms in Python) library [6]. The OneMax problem [10, 23] uses a genetic algorithm to evolve a population of randomly generated individuals with zeros and ones, and it stops until a solution of only ones is found.
3.1 Experimental Setup One strategic difference in our algorithm implementation is the reproduction process, which is flexible to work in parallel with multiple couple selection methods, for example, tournament selection, random selection (Closed Reproduction process), and couple match by creating a new mating individual (Open Reproduction process). We use many combinations of the containerized processes, up to the minimum number required to obtain good results. Figure 4 shows a comparison of only two alternatives for the implementation of the algorithm. Our experimentation started with a configuration of ten processes and ended in five, shown on the left and right sides of Fig. 4, respectively.
3.2 Experiment Configuration In our four experiments, we needed to match or balance the experimentation parameters according to the specifications of the algorithm used by the DEAP library. This
A Novel Distributed Nature-Inspired Algorithm …
113
Fig. 4 Life-Cycle algorithm processes in containers comparison
is to be able to verify if our algorithm would also converge on the solutions in a similar number of evaluations and execution time. Table 1 shows the OneMax initial configuration for DEAP and Life-Cycle experiment. Table 1 OneMax initial configuration for DEAP and Life-Cycle experiment
Configuration
DEAP
Life-Cycle
Population
60
60
Max generation
20
20
Stagnation
Off
10
Chromosome length
20
20
Target fitness
20
20
Crossover rate
100
100
Mutation rate
7
7
Tournament rate
100
50
Tournament sample
3
4
Open reproduction rate
NA
5
Closed reproduction rate
NA
45
Max age
NA
80
Base approval
NA
80
Goal approval
NA
100
114
J. C. Felix-Saul et al.
3.3 Experiment Results For each experiment, we ran 60 independent executions per algorithm and recorded the following results: Last Generation, Total Evaluations, Time (seconds), Evaluations/second. The labels used on the results of our summarized experiments tables are the following: • • • •
Last Gen. is the generation that found the solution. Eval. is the total number of evaluations the algorithm executed. Time (sec) is the total elapsed or wall clock time, in seconds. Eval/sec is the calculated rate of evaluations per second.
3.3.1
Experiment 1
The goal of our first OneMax experiment was to make sure our Life-Cycle algorithm worked as expected, converging on the solution. For this experiment, we intentionally used delays (on the tenth of a second magnitude) to follow the Life-Cycle algorithm behavior on the console output. For this experiment, we compared the DEAP algorithm versus the Life-Cycle using ten processes (on containers) where we show the summarized results in Table 2, and Fig. 5 shows their Box and Whisker chart. Table 2 Experiment 1 results: OneMax DEAP versus Life-Cycle (10 processes) Run
OneMax DEAP
Life-Cycle (10P)
1–60 Last Gen
Eval Time (s) Eval/s
Avg
391
6.5
0.036
Last Gen
10,836 9.6
Eval Time (s) Eval/s 578
6.830
Fig. 5 Box and Whisker chart for the Experiment 1: OneMax DEAP versus Life-Cycle
85
A Novel Distributed Nature-Inspired Algorithm …
115
Table 3 Experiment 2 results, Life-Cycle Tournament selection: 10 versus 5 processes Run
Life-Cycle (Tourn. 10P)
Life-Cycle (Tourn. 5P)
1–60 Last Gen
Eval Time (s) Eval/s Last Gen
Eval Time (s) Eval/s
Avg
434
495
7.2
0.740
587
8.2
0.826
599
Fig. 6 Box and Whisker chart for the Experiment 2: Life-Cycle Tournament selection (10 vs. 5 processes)
3.3.2
Experiment 2
The goal of our second experiment was to confirm the Life-Cycle algorithm continued working as expected, converging on the solution. For this experiment, we reduced the time used on delays (now on the thousands of a second magnitude) to follow the Life-Cycle algorithm behavior. For this experiment, we only used tournament selection for the reproduction, on the first configuration running the Life-Cycle on ten processes, versus the second configuration where we reduced the processes to the minimum basic 5. We show the summarized results in Table 3, and Fig. 6 shows their Box and Whisker chart.
3.3.3
Experiment 3
The goal of our third experiment was to follow and study the behavior of the LifeCycle algorithm that continued converging on the solution. For this experiment, we remained using minimal delays (thousands of a second magnitude). For this experiment, we only used tournament selection for the reproduction, on the first configuration running the Life-Cycle on six processes, two of whom was the Death process, versus the second configuration where we reduced the processes to the minimum (five) but increasing the (Death) goal approval range to 115. We show the summarized results in Table 4, and Fig. 7 shows their Box and Whisker chart.
116
J. C. Felix-Saul et al.
Table 4 Experiment 3 results, Life-Cycle Tournament: 6 processes (Death × 2) versus 5 processes (80–115 goal) Run
Life-Cycle (Tourn. 6P, Death × 2)
Life-Cycle (Tourn. 5P, 80-115 g)
1–60 Last Gen
Eval Time (s) Eval/s Last Gen
Eval Time (s) Eval/s
Avg
483
384
8.0
0.846
571
6.4
0.695
553
Fig. 7 Box and Whisker chart for the Experiment 3: Life-Cycle Tournament (6 processes double Death, vs. 5 processes with an 80–115 goal approval)
3.3.4
Experiment 4
The goal of our fourth and last experiment was to follow and study the behavior of the Life-Cycle algorithm that continued converging on the solution. For this experiment, we remained using minimal delays (thousands of a second magnitude). For this experiment, we compared the OneMax DEAP implementation versus the Life-Cycle algorithm, only using tournament selection for the reproduction, with the Life-Cycle configuration running on the minimum (five) processes but increasing the (Death) goal approval range to 125. We show the summarized results in Table 5, and Fig. 8 shows their Box and Whisker chart. Table 5 Experiment 4 results, OneMax DEAP versus Life-Cycle Tournament: 5 processes (80 to 125 goal) Run
OneMax DEAP
Life-Cycle (Tourn. 5P, 80-125 g)
1–60 Last Gen
Eval Time (s) Eval/s
Avg
391
6.5
0.036
Last Gen
10,836 6.3
Eval Time (s) Eval/s 377
0.678
556
A Novel Distributed Nature-Inspired Algorithm …
117
Fig. 8 Box and Whisker chart for the Experiment 4: OneMax DEAP versus Life-Cycle Tournament (5 processes with an 80 to 125 goal approval)
4 Discussion At some crucial moments during our research, we got the feeling we were doing nature reverse engineering. One strategy was to let our imagination run wild and go for the simple solution. One of our earlier experiment’s findings was to find balance, like the yin and yang, between death and reproduction. In our experiments, we found that when solving simple problems, the cost of communication between containers can become a factor to increase the total performance time, even though, we believe that the opposite must also be true. When solving complex problems, distributing the work on multiple resources should reduce throughput time, making the communication cost negligible. We must acknowledge the performance time of the DEAP library [6] is highly efficient and short, due in part because the processing work execution is on a single processor with multiple cores, where the communication time is nearly non-existent. In contrast, our algorithm implementation requires a minimum of five containers and a message queue server. Even though the container work has proven to be very effective and efficient [11, 22], we must consider some time was added to our experiments by the minimal delays (in the thousands of a second magnitude) we used to keep the process execution relatively random, to mimic how the life-cycle stages work in nature [17]. As future work, we could test the Life-Cycle algorithm behavior by eliminating all remaining delays. This scheme allows for multiple parameter’s fine tuning, granting us the freedom to experiment with different selection and reproduction strategies simultaneously, which will impact how fast it finds the solution and the obtained quality.
118
J. C. Felix-Saul et al.
5 Conclusions We implemented the algorithm with Docker containers by solving the OneMax problem comparing it with a traditional (sequential) GA algorithm, where it showed favorable and promising results. To further validate this work, we could use control or some more complex and demanding problem that requires computing real numbers [13, 19]. As the complexity of problems increases, it is essential to have a scalable, replicable, and fault-tolerant model that uses collaborative techniques to work in the cloud, where multiple resources will be communicating asynchronously. This research has shown that it is possible to evolve a population of individuals, similar to a Genetic Algorithm (GA), using a distributed, parallel, and asynchronous methodology. Acknowledgements This paper has been supported in part by projects DeepBio (TIN2017–85727– C4–2–P) and TecNM Project 11356.21.
References 1. Acherjee, B., Maity, D., Kuar, A.S.: Ultrasonic machining process optimization by cuckoo search and chicken swarm optimization algorithms. Int. J. Appl. Metaheuristic Comput. (IJAMC) 11(2), 1–26 (2020) 2. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., et al.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010) 3. Back, T.: Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford University Press (1996) 4. Castillo, O., Valdez, F., Soria, J., Amador-Angulo, L., Ochoa, P., Peraza, C.: Comparative study in fuzzy controller optimization using bee colony, differential evolution, and harmony search algorithms. Algorithms 12(1), 9 (2019) 5. Eshratifar, A.E., Esmaili, A., Pedram, M.: Bottlenet: a deep learning architecture for intelligent mobile cloud computing services. In: 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). pp. 1–6. IEEE (2019) 6. Fortin, F.A., De Rainville, F.M., Gardner, M.A.G., Parizeau, M., Gagné, C.: Deap: Evolutionary algorithms made easy. J. Mach. Learn. Res. 13(1), 2171–2175 (2012) 7. García-Valdez, M., Mancilla, A., Trujillo, L., Merelo, J.J., Fernández-de Vega, F.: Is there a free lunch for cloud-based evolutionary algorithms? In: 2013 IEEE Congress on Evolutionary Computation. pp. 1255–1262. IEEE (2013) 8. García-Valdez, M., Merelo, J.J.: Event-driven multi-algorithm optimization: mixing swarm and evolutionary strategies. In: International Conference on the Applications of Evolutionary Computation (Part of EvoStar). pp. 747–762. Springer (2021) 9. García-Valdez, M., Trujillo, L., Merelo, J.J., de Vega, F.F., Olague, G.: The evospace model for pool-based evolutionary algorithms. J. Grid Comput. 13(3), 329–349 (2015) 10. Krejca, M.S., Witt, C.: Lower bounds on the run time of the univariate marginal distribution algorithm on onemax. Theoret. Comput. Sci. 832, 143–165 (2020) 11. Merelo, J.J., Castillo, P.A., García-Sánchez, P., de las Cuevas, P., Rico, N., García Valdez, M.: Performance for the masses: experiments with a web based architecture to harness volunteer resources for low cost distributed evolutionary computation. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 837–844 (2016)
A Novel Distributed Nature-Inspired Algorithm …
119
12. Merelo, J.J., García-Valdez, M., Castillo, P.A., García-Sánchez, P., Cuevas, P., Rico, N.: Nodio, a javascript framework for volunteer-based evolutionary algorithms: first results (2016). arXiv: 1601.01607 13. Miikkulainen, R., Liang, J., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju, B., Shahrzad, H., Navruzyan, A., Duffy, N., et al.: Evolving deep neural networks. In: Artificial Intelligence in the Age of Neural Networks and Brain Computing, pp. 293–312. Elsevier (2019) 14. Ontiveros, E., Melin, P., Castillo, O.: High order α-planes integration: a new approach to computational cost reduction of general type-2 fuzzy systems. Eng. Appl. Artif. Intell. 74, 186–197 (2018) 15. Porto, V.W.: Evolutionary programming. In: Evolutionary Computation 1, pp. 127–140. CRC Press (2018) 16. Rashedi, E., Nezamabadi-Pour, H., Saryazdi, S.: Gsa: a gravitational search algorithm. Inf. Sci. 179(13), 2232–2248 (2009) 17. Read, K., Ashford, J.: A system of models for the life cycle of a biological organism. Biometrika 55(1), 211–221 (1968) 18. Sanchez, M.A., Castillo, O., Castro, J.R., Melin, P.: Fuzzy granular gravitational clustering algorithm for multivariate data. Inf. Sci. 279, 498–511 (2014) 19. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002) 20. Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the condor experience. Concurr. Comput. Pract. Exp. 17(2–4), 323–356 (2005) 21. Valdez, F.: Swarm intelligence: a review of optimization algorithms based on animal behavior. In: Recent Advances of Hybrid Intelligent Systems Based on Soft Computing pp. 273–298 (2021) 22. Valdez, M.G., Guervós, J.J.M.: A container-based cloud-native architecture for the reproducible execution of multi-population optimization algorithms. Futur. Gener. Comput. Syst. 116, 234– 252 (2021) 23. Witt, C.: Upper bounds on the running time of the univariate marginal distribution algorithm on onemax. Algorithmica 81(2), 632–667 (2019)
Evaluation and Comparison of Brute-Force Search and Constrained Optimization Algorithms to Solve the N-Queens Problem Alfredo Arteaga, Ulises Orozco-Rosas, Oscar Montiel, and Oscar Castillo
Abstract The N-Queens problem is relevant in Artificial Intelligence (AI); the solution methodology has been used in different computational intelligent approaches. Max Bezzel proposed the problem in 1848 for eight queens in 8 × 8 chessboard. After that, the formulation was modified to an N-Queens problem in a chessboard. There are several ways of posing the problem and algorithms to solve it. We describe two commonly used mathematical models that handle the position of queens and restrictions. The first and easiest way is to find one combination that satisfies the solution. The second model uses a more compact notation to represent the queen’s potions. This generic problem has been solved with many different algorithms. However, there is no comparison of the performance among the methods. In this work, a comparison of performance for different problem sizes is presented. We tested the Backtracking, Branch and Bound, and Linear Programming algorithms for a different number of queens, reaching 17. In addition, we present statistical comparative experimental results of the different methods.
A. Arteaga · O. Montiel (B) Instituto Politécnico Nacional, CITEDI-IPN, Nueva Tijuana, 1310, 22435 Tijuana, Baja California, México e-mail: [email protected] U. Orozco-Rosas CETYS Universidad, Av. CETYS Universidad No. 4, Fracc. El Lago, 22210 Tijuana, Baja California, México e-mail: [email protected] O. Castillo Tijuana Institute of Technology, Calzada del Tecnológico S/N, Tomás Aquino 22414 Tijuana, Baja California, México e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_9
121
122
A. Arteaga et al.
1 Introduction This chapter aims to present the evaluation and comparison of the brute-force search and constrained optimization algorithms to solve the N-Queens problem; specifically, we tested the Backtracking, Branch and Bound, and Linear Programming algorithms, as well as a brief review of each of the mentioned algorithms. The general idea of the N-queen problem is to place the queens on a chessboard without them getting threatened by each other, as we can see in Fig. 1, where no queens are threatened between them. The N-Queens problem has its origin in the year 1848, when Max Bezzel, a German chess player, proposed to place a total of 8 queens on a chessboard in such a way that no queen threatens each other [1], that is, that only one queen would meet by row, by column and by diagonal. The problem has two ways to be solved; the first and most straightforward is to find a single combination where the queens are positioned and without threat between each of them the second and most complex way is to find each of the combinations where the queens do not threaten each other. In 1850 Carl Gauss [2] claimed that there were 72 combinations when positioning eight queens on the board. However, this was wrong it was not until 1874 when James Glaisher proved 92 combinations. The N-Queens problem has multiple methods to be solved, for example, Backtracking [3], heuristic approximations [4], genetic algorithms [5], local search [6], among others. Fig. 1 An 8 × 8 chessboard with eight queens
Evaluation and Comparison of Brute-Force Search …
123
The contribution of this work is to show the efficiency of specific algorithms that are usually used, which is important since there are no reported comparisons of computational behavior when solving the N-Queens problem for different instances. This chapter is divided as follows. Section 2 presents the related work, which consists of a selection of representative algorithms of different techniques. Section 3, provides two mathematical models for solving the N-Queens problem. Section 4 describes the algorithms used for solving the N-Queens problem, specifically the Backtracking, Branch and Bound, and Linear Programming algorithms. Section 5 shows the experimental results. Finally, in Sect. 6, the conclusions and future work are presented.
2 Related Work The N-Queens problem has been solved using mathematical and statistical techniques, heuristic and metaheuristic algorithms, neural networks, fuzzy logic, and quantum computing. The different applications, such as object detection [7], pixel space sampling [8], traffic control [9], deadlock prevention [10], among others, where is possible to identify the exponential processing time growth while the input (number of queens) increase, defines the N-Queens problem as a non-deterministic polynomial-time problem with a classification of NP-Complete problem. The NQueens problem has a computational complexity defined as O(n n ) when it is solved with the Backtracking algorithm, which is a brute force approach. In that sense, several approaches have been proposed to solve the N-Queens problem efficiently and effectively. In the heuristic and metaheuristic algorithms, we can find the work presented by Al-Gburi et al. [11], where the hybridization of the bat algorithm and the genetic algorithm to solve the N-Queens problem is presented. In that work, the results show that the proposal can solve the N-Queens problem with an input of 8, 20, 50, 100, and 500 queens. In [12], Jain and Prasad present a genetic algorithm to solve the N-Queens problem by the use of an advanced mutation operator, where two instances were tested, with 8 and 50 queens against the classical genetic algorithm, permutation PSO (Particle Swarm Optimization), and swarm refinement PSO, where the proposed algorithm obtained the best results. The PSO algorithm is one of the most efficient and effective swarm intelligence-based algorithms; in that sense, in [13], a research to inspect the effectiveness of the PSO to produce an optimal solution for the N-Queens problem is presented. Concerning the employment of heuristic and metaheuristic algorithms with parallel computing, Cao et al. [14] presented a parallel implementation on CPU and GPU (Graphics Processing Unit) of the candidate solution evaluation algorithm for the N-Queens problem; in that work, three schemes are applied to the simulated annealing algorithm for problems with a smaller size than 3,000 queens. Another example is presented in [15], where, Jianli et al. employed a parallel genetic algorithm for the N-Queens problem based on a message passing interface using a GPU.
124
A. Arteaga et al.
An alternative GPU implementation was presented by Janssen and Liew [16]. In their work, the authors present acceleration of the genetic algorithm using a GPU, where some modifications of the evolutionary operations are implemented to fit the GPU architecture, in that way is possible to speed up the computation of the solutions. In the neural networks and fuzzy logic approaches, it can be found the developed neuro-fuzzy architecture for solving the N-Queens problem presented by Nunes et al. [17]. In that work, a modified Hopefield neural network and a valid-subspace technique were employed to create a subspace that contains only the solutions that represent feasible solutions to the problem analyzed. Another contribution employing the Hopefield neural network to solve the N-Queens problem was presented by Waqas and Bhatti [18]. Their proposal presents a solution for the variation of the N + 1-Queens problem, that variation possibilities the solution for large values of N. In [19] Funabiki et al. present a maximum neural network approach for the NQueens problem. Lakshmi and Muthuswamy [20] presented a proposal to solve the N-Queens problem using a predictive neural network and incorporating the AHPFuzzy algorithm in order to reduce the neural network prediction time; with this incorporation, the speedup is increased. In the quantum computing field, where the quantum computers are based on quantum physics principles to solve the N-Queens problem is necessary to define a quantum state for all the possible solutions found, followed by the application of the Hadamard gate, to produce a quantum register. Once the quantum register is completed, two unitary operators are combined to employ Grover’s algorithm through an oracle, which evaluates a function f ω (x) that verifies the proposed solution. For quantum computing implementations, we found the work presented by Souza and Mello [21]. Here, the Grover’s algorithm is employed to identify all the safe places where it is possible to place a queen and to get a combination that satisfies the restrictions. Another approach is presented in [22], where a quantum-inspired differential evolution algorithm is proposed. In that work, an hybridization of the differential evolution algorithm that employs the differential evolution operator as an alternative to the classical genetic algorithm’s mutation operator and the quantum genetic algorithm has been used to solve the N-Queens problem obtaining better results either in terms of computation time or in term of fitness evolution.
3 Mathematical Models The N-Queens problem is constrained because the queens need to be safe, and this condition is only satisfied if and only if we have one queen on a row, one queen on an arrow, and one queen in any diagonal. In this way, the queens are not threatened between them. There are several algorithmic approaches to solve this problem. In [23] the row-based, column-based, and several issues regarding problem representation are provided in detail [24]. Provides a survey of known results and research areas for the N-Queen problem; here, the authors describe methods and explain some theorems that help to understand the problem mathematically better. This section explains two
Evaluation and Comparison of Brute-Force Search …
125
mathematical models chosen according to the software implementations that we achieved.
3.1 Mathematical Model No. 1 This model is the most general and easier to implement. It uses a binary variable xi j ; where i is a column, and j is a row, to represent the existence of a queen in the square (i, j). In this way, if xi j = 1 there is a queen in that square, and if xi j = 0 there is no queen. This model imposes restrictions for the rows, columns, and positive and negative diagonals. We say that a diagonal is positive if we can trace a positive angle through the squares in the cartesian plane contrary, the opposing diagonals form a negative angle. The row restriction prevents the existence of two queens in the same row which can be verified by adding the values of the squares xi j of a row. It is expected to obtain “1” if there is only one queen in the row. This restriction can be formulated as follows, n
xi j = 1∀i = {1, . . . n}
(1)
j=1
Similarly, the column restriction prevents that two or more queens from being placed on a column. The restriction is as follows, n
xi j = 1∀ j = {1, . . . , n}
(2)
i=1
It is more complicated for the diagonals since it is necessary to identify if two or more queens are on the same positive or negative diagonal. Figure 2 shows a 6 × 6 chessboard describing the positive and negative diagonals. Using this figure, we can verify that it is possible to determine which queens are in the same positive diagonal using, k=i+ j
(3)
By using (3), we can say that all the pairs (i, j) with the same value of k are in the same positive diagonal, for example, the pairs (4,5) and (3,6) are in the same diagonal, and their k value is 9. Just like that, we can see that for this chessboard, the range of k varies from {2, . . . , 12} considering all the diagonals. In general, we can say that for a chessboard of size n × n the range varies from {2, . . . , 2n}. Depending on the implementation algorithm, it is possible not to consider the squares of the corners; hence, the range of k will vary from {3, . . . , 2n − 1}.
126
A. Arteaga et al.
Fig. 2 Chessboard of size 6 × 6. There are four queens in the positive column and six queens in the negative column. The index i identifies the row number and j the column number. A position is a pair is defined as a pair (i, j)
For the case where all the squares of the chessboard are considered, we have the expression (4). For the case where the squares are not considered expression (4) is modified only in the valid range of case; i.e., we will write ∀k = {3, . . . , 2n − 1}. n n
xi j ≤ 1∀k = {2, . . . , 2n}
(4)
i=1 j=1
Similarly, for the negative diagonals, the k value is calculated by k=i− j
(5)
And all the pairs (i, j) with the same k value will be in the same negative diagonal. The k range values for the Fig. 2 will vary from {−5, . . . , 5} considering all the squares of the chessboard. If we did not consider the corner squares, the k range would vary from {−4, . . . , 4}. For the general case, the negative diagonal restriction produces values of k in the range of {−n + 1, . . . , n + 1}, which can be expressed by, n n
xi j ≤ 1∀k = {−n + 1, . . . , n − 1}
(6)
i=1 j=1
The searching space of this model is 2n×n . For Fig. 2, where n = 6, the searching space has 68,719,476,736 possible combinations. For n = 17, which is the biggest case that we tested.
Evaluation and Comparison of Brute-Force Search …
127
3.2 Mathematical Model No. 2 This model only uses the variable xi to represent all the rows where the queens can be placed on the i position in the domain determined as {1, . . . , n}; the restrictions only consider the columns and diagonals (positive and negative). All the rows are evaluated by default because a value is assigned automatically since two or more queens cannot be on the same row. The column restriction does not allow two queens in the same column. This can be verified by checking the variable values if they are different, the restriction is fulfilled. Otherwise, it is not satisfied. This restriction can be written as follow, xi = x j ∀i = jiand j ∈ N
(7)
xi − x j = 0∀i = jiand j ∈ N
(8)
Analogously
Similar to the first model, the diagonals, positive and negative, can have only one queen. It is known that all the diagonals have angles of ±45◦ , depending on it is positive or negative, and using the model definition where the variable value indicates de column and the index indicates the row, we can determine if two queens are in the same diagonal. Viewing the position of the queens as points in the Cartesian plane; say Q i (xi , yi ) for the first queen, and Q j (x j , y j ) for the second queen, and writing this using the variable nomenclature of the model, we have Q 1 = (x1 , i ) and Q 2 = x j , j ; then x = x j − x1 and y = j − i. The tangent of an angle y can be calculated using x , and we know that tan−1 45 = 1, therefore, x = y. Therefore, the diagonal restriction can be written as, xi − x j = |i − j|∀i = jiand j ∈ N
(9)
4 N-Queens Problem Solution Algorithms This section explains the algorithms that we used to solve the N-Queens problem: they are, Backtracking, Branch and Bound, and Linear Programming algorithms.
4.1 Backtracking Algorithm This algorithm performs a recursive search on trees. It performs a depth-search, branch by branch exploring all the node subbranches, selecting valid values for one
128
A. Arteaga et al.
Fig. 3 Backtracking algorithm representation
variable at a time otherwise, it backtracks. Thus, the procedure can find several valid solutions at the branches, finding success nodes. If we find an invalid solution in any part of the branch, it is tagged as a dead-end, and the searching process is finished. Hence, the general idea of this algorithm is to build solutions incrementally, removing those solutions that fail to satisfy the imposed constraints. Figure 3 describes the searching process of the backtracking algorithm on a binary tree. To solve the N-Queens problem, we used the mathematical model no. 1 explained in Sect. 3.1. Here, it is necessary to consider partial solutions for determining which combinations are valid and which ones are invalid, so if a partial solution is detected in a branch, the algorithm will perform searching in the same branch until a valid combination is found or we reach a dead-end. In this way, the algorithm will search in all the branches moving forward or backward to place the queens without attacking each other [25]. Güldal, Baugh, and Allehaibi [26] in 2006 solved the problem using the Backtracking algorithm. They realized that it was very efficient; however, a bigger chessboard size implies having more queens increasing the processing time exponentially. Table 1 shows some processing times using this algorithm for chessboard sizes up to nine queens. As mentioned, the Backtracking algorithm needs to check all the possible candidate solutions in each branch. Hence it is a brute force search method where it is necessary to enumerate all the possible candidates that can be the solution to the problem. For example, applying it to the N-Queens problem is necessary to find all the possible combinations in a chessboard of size N × N .
Evaluation and Comparison of Brute-Force Search … Table 1 Backtracking processing time for obtaining different results
Number of queens
129 Time
1
≈0 s
2
≈0 s
3
≈0 s
4
≈0 s
5
≈0 s
6
0.03 s
7
0.17 s
8
0.74 s
9
3.57 s
10
17.81 s
11
96.86 s
12
558.07 s
13
3,599.7 s
A block diagram of the algorithm is shown in Fig. 4. It is divided into three parts, the first part is dedicated to variables initialization like the number of queens and commands to measure the execution time. The objective of the second part is to register all the obtained solutions and control the queen’s placement on the board. Finally, the third is devoted to determining if the placed queen’s position is safe or not, with this it is possible to discriminate the threatened positions using the Backtracking algorithm.
4.2 Branch and Bound Algorithm This algorithm is used to solve combinatorial optimization problems, works searching solutions in a finite combination tree [27], and with an alternated sequence that generates another partial sequence, evaluating partial classifications determining optimum solutions [28]. The algorithm takes care of finding the optimum solution through combinations, and it can be represented as a combinational tree, as shown in Fig. 5. Where each branch has a partial result, this branch will be marked as a solution. Otherwise, it will be omitted and the algorithm will change from branch until it gets a solution, optimizing execution time for getting all the combinations for the problem. The Branch and Bound algorithm works by applying a Backtrack only when the row does not have options to evaluate. Thus, that branch is eliminated, and this generates an optimization [29]. Harold and Janice Stone did an exercise where it was registered the size of the combinational tree generated to give a solution to the N-Queens solution [29]. They
130
Fig. 4 Block diagram representation of the Backtracking algorithm
Fig. 5 Branch and Bound algorithm representation using a tree
A. Arteaga et al.
Evaluation and Comparison of Brute-Force Search … Table 2 Branch number for N-Queens problem using lexicographic and restrictions variants
Number of queens
131 Lexicographic size
Restriction size
7
552
416
8
2057
1,415
9
8,394
5,610
10
35,539
20,863
11
166,926
89,670
12
853,189
432,103
13
4,674,890
2,230,980
used two variants, a lexicographic and restrictions. The results obtained are shown in Table 2. We divided the Branch and Bound algorithm into a diagram consisting of five parts, as shown in Fig. 6. The first one is for setting the initial values for the solution counter and the number of queens to be placed. The second part will place the queens on the board and count the number of solutions found. In the third part, the algorithm alternates between the rows where the queens are going to be placed in a safe position. In the fourth part, the positions that are a threat to the queens are identified. Finally, the last part dismisses the threatening positions for the queens. In the Branch and bound algorithm, we used the mathematical model no. 1, explained in Sect. 3.1, to solve the N-Queen problem.
4.3 Linear Programming Algorithm Linear Programming is a methodology initiated by George B. Dantzig, John von Neumann, Leonid Kantorovich, and others in the 1940s. This mathematical methodology includes modeling techniques to formulate real-world problems into linear programs and algorithms for numerically solving linear programs. Linear Programming is employed in several areas such as administration, economy, science, business, engineering, among others. Linear Programming is capable to solve optimization problems to minimize or maximize a linear objective function subject to linear equality or inequality constraints. It is composed of two elements, the objective function, and the restrictions, the restrictions limit the results using decision variables [30]. To use Linear Programming for the N-Queens problem we employ the formulation described in Sect. 3.2. Given an n number of queens and an n × n chessboard, we define the total of n queens as follows, n n i=1 j=1
X [i, j] = n
(10)
132
Fig. 6 Block diagram representation of the branch and bound algorithm
A. Arteaga et al.
Evaluation and Comparison of Brute-Force Search …
133
where the following rules are considered. Restriction 1, at least a queen needs to be placed on a column. n
∀i = 1, . . . , n
X [i, j] ≤ 1
(11)
j=1
Restriction 2, at least a queen needs to be placed on a row. ∀ j = 1, . . . , n
n
X [i, j] ≤ 1
(12)
i=1
Restriction 3, at least a queen needs to be placed on a positive diagonal. ∀k = 2 − n, . . . , n − 2
X [i, j] ≤ 1
(13)
i, j∈1,...,n,i− j=k
Restriction 4, at least a queen needs to be placed on a negative diagonal. ∀k = 3, . . . , 2n − 1
X [i, j] ≤ 1
(14)
i, j∈1,...,n,i+ j=k
In [31], Al-Rudaini applied the restrictions previously stated to solve the NQueens problem with an N = 16 using Linear Programming. Figure 7 shows a result employing this conditions. Fig. 7 The N-Queens problem for sixteen queens was solved with Linear Programming
134
A. Arteaga et al.
Fig. 8 Block diagram representation of the Linear Programming algorithm
To implement the Linear Programming algorithm to solve the N-Queens problem, we propose to divide the algorithm into two parts, as it is shown in Fig. 8. The first part establishes the number of queens and the chessboard size. The second part named “Queens funct” determines the best position of the queens that comply with the established restrictions.
5 Experiments and Results This section presents the experimental results of the three tested algorithms for solving the N-Queen problem: Backtracking, Branch and Bound, and Linear Programming algorithms. These algorithms were tested on a 16 core Intel Xeon Gold 5218 CPU @ 2.3 GHz with 512 GB of RAM running Ubuntu 20.04.2 LTS, and they were executed 20 times to obtain the mean and the standard deviation of the execution time for a different number of queens.
5.1 Experiment 1: Backtracking The results are registered in Table 3. It can be observed an exponential increase in the execution time after solving the problem for 11 queens. Before this instance,
Evaluation and Comparison of Brute-Force Search … Table 3 Execution time for the N-Queens problem using the Backtracking algorithm
135
Number of queens Execution time (mean) Standard deviation 4
0.00007 s
1.39046E-20
5
0.0002 s
2.78092E-20
6
0.0008 s
1.11237E-19
7
0.003405 s
2.23607E-05
8
0.015385 s
0.000113671
9
0.074975 s
0.000990415
10
0.22796 s
0.017698718
11
1.199305 s
0.021038798
12
6.877155 s
0.069985235
13
42.3444 s
0.744777032
14
278.702635 s
6.718637124
15
1896.905275 s
35.63734744
16
13,887.82939 s
158.5726299
17
109,411.7678 s
4167.875636
the times required for solving the problem were very close to 0 seconds. For the standard deviation, it can be observed that the data is not far enough from the mean until the algorithm worked with 14 queens where the standard deviation increases exponentially in other words, the precision of the algorithm is good, but while the number of queens increases the data is moving away from the mean.
5.2 Experiment 2: Branch and Bound It is considered a Backtracking variant however, it has many optimizations that reduce the processing time, allowing it to obtain the results in a shorter time. It works by generating a solution tree, and each branch makes it possible to obtain a possible solution. The algorithm performs an analysis to determine if the actual branch is optimal; otherwise, it gets trim, discarding that path as a possible solution, reducing the number of combinations, and taking advantage of the resources in a better way. The obtained results were registered in Table 4 it can be observed an exponential increase in the execution time after solving the problem for 12 queens; before this instance, the time required for solving the problem was very close to zero seconds. Regarding the standard deviation, it can be observed that the data is not far enough from the mean until the algorithm work with 15 queens, where the standard deviation increases exponentially, almost the same as the Backtracking algorithm. The registered results in Tables 3, 4, and 5 were used to analyze which one had the best execution time for all the tested algorithms. Starting with Linear Programming, we can observe that the execution time grows exponentially beginning from N = 14,
136 Table 4 Execution time for the N-Queens problem using Branch and Bound algorithm
Table 5 Execution time for the N-Queens problem using the linear programming algorithm
A. Arteaga et al. Number of queens Execution time (mean) Standard deviation 4
0.0004 s
5.56184E-20
5
0.0005 s
2.22474E-19
6
0.0009 s
1.11237E-19
7
0.00243 s
4.70162E-05
8
0.008565 s
8.75094E-05
9
0.03553 s
0.000231926
10
0.104655 s
0.01866814
11
0.471255 s
0.013809245
12
2.548845 s
0.021257432
13
14.530725 s
0.110244355
14
89.19125 s
0.622579839
15
583.959185 s
3.048646538
16
4115.949215 s
40.31555712
17
30,531.33199 s
1152.404159
Number of queens Execution time (mean) Standard deviation 4
0.006335 s
8.12728E-05
5
0.01257 s
0.000126074
6
0.01768 s
0.000119649
7
0.035475 s
0.007626883
8
0.11619 s
0.009265039
9
0.524145 s
0.012362783
10
2.00735 s
0.011765628
11
11.46305 s
0.05936269
12
63.38689 s
0.549429424
13
534.99516 s
1.906083907
14
−s
–
15
−s
–
16
−s
–
17
−s
–
which means, from 14 onwards is possible to observe the growth; for sixteen and seventeen queens, we got no result because of processing time was enormous. From the results obtained with the Backtracking algorithm, we can observe the notable exponential growing starting on an N value of 15, as we can observe, it passes from 359 to 2,457 s, which let us identify the behavior of the computational complexity and the number of resources needed to perform this operation.
Evaluation and Comparison of Brute-Force Search …
137
Fig. 9 The results show the mean time obtained from Linear Programming (blue), Backtracking (orange), and Branch and Bound (gray) algorithms
With the Branch and Bound algorithm, we can observe that it is the fastest and best-optimized algorithm because the execution time is shorter than the other two algorithms; as we can see, for N = 16 and N = 17 it took 5,821 and 42,613 s, respectively which is a shorter time than the one obtained using the Backtracking algorithm, shown in Fig. 9.
5.3 Experiment 3: Linear Programming Linear Programming is a mathematical method used to optimize an objective function with restrictions that limit the values that the decision variables can take. In that sense, the N-Queens problem can be defined as follows. X i j = f (N ) =
1 ther e is no thr eat in (i, j ) ∈ {1, 0} 0 other wise
(15)
where: • X i j is the ordered pair where the queens are placed. • For the first case: 1 represents the queen on the position (i, j). • And, for the second case: 0 represents the position (i, j) where is not possible to place a queen.
138
A. Arteaga et al.
Fig. 10 The standard deviation for the N-Queens problem with Backtracking, Branch and Bound and Linear Programming algorithms
The obtained results in terms of execution time with the Linear Programming algorithm are presented in Table 5. An exponential increase in the execution time can be observed after solving the problem for N = 10 queens before this instance, the time required for solving the problem is almost zero (≈0 s). For the standard deviation, it can be observed that the results are not far enough from the mean, in other words, the precision of the Linear Programming algorithm is good compared with the Backtracking and Branch and Bound algorithms. However, the linear programming algorithm requires a huge amount of processing time when N > 13 queens, for this reason, the solutions for 14, 15, 16, and 17 queens couldn’t be obtained in this implementation. As we can see in Table 5, the standard deviation indicates how scattered the results in terms of execution time are concerning the mean. Figure 10 shows the level of accuracy that each algorithm has when is executed at least 20 times for each instance. The results in Fig. 10 shows that the Backtracking algorithm presents the smallest accuracy and that the Branch and Bound algorithm has the best accuracy.
6 Conclusions and Future Work The present work evaluated three different algorithms to solve the N-Queens problem brute force and constrained optimization algorithms were tested and evaluated. With the obtained results, we concluded that the Branch and Bound algorithm is the faster one because it can solve the N-Queens problem and get all the combinations that satisfy the restrictions. The Backtracking algorithm also solved the problem; however, the required time to achieve it is longer than the Branch and Bound algorithm. The Linear Programming algorithm is less optimal regarding time for a more
Evaluation and Comparison of Brute-Force Search …
139
significant number of queens some results could not be obtained in a reasonable, that is the case of fifteen, sixteen, and seventeen queens. The three algorithms show an exponential behavior that is more notable when they work with thirteen queens or more. This allows us to confirm that the input size increases the computational complexity, which will affect the necessary time to get the expected results. In future work, we will test other methods to solve the N-queen problem, such as heuristic and metaheuristic algorithms, among others, including quantum algorithms, to perform a complexity analysis. In addition, other metaheuristics could be used, like [32–34].
References 1. Rivin, I., Vardi, I., Zimmermann, P.: The n-queens problem. Am. Math. Mon. 101(629–639), 08 (1994) 2. Letavec, C., Ruggiero, J.: The queens problem - delta. INFORMS Trans. Educ. 2(101–103), 05 (2002) 3. Mu, S.C.: Calculating a backtracking algorithm: an exercise in monadic program derivation (2021) 4. Osaghae, E.: Solution to n-queens problem: heuristic approach. Trans. Mach. Learn. Artif. Intell. 9, 26–35 (2021) 5. Sharma, S., Jain, V.: Solving n-queen problem by genetic algorithm using novel mutation operator. IOP Conference Series: Materials Science and Engineering 1116, 012195 (2021) 6. Andov, L.: Local search analysis n-queens problem. In: Griffith University, School of Information and Communication Technology, Intelligent Systems – 2802ICT (2018) 7. Stojkoska, B., Davcev, D., Vladimir, T.: N-queens-based algorithm for moving object detection in distributed wireless sensor networks. In: ITI2008 - 30th International Conference on Information Technology Interfaces, pp. 899–904 (2008) 8. Wang, C.-N., Yang, S.-W., Liu, C.-M., Chiang, T.: A hierarchical decimation lattice based on n-queen with an application for motion estimation. IEEE Signal Process. Lett. 10(8), 228–231 (2003) 9. Sosic, R., Gu, J.: A polynomial time algorithm for the n-queens problem. ACM SIGART Bull 1 (1996) 10. Erbas, C., Tanik, M., Aliyazicioglu, Z.: Linear congruence equations for the solutions of the n-queens problem. Inf. Process. Lett. 41, 301–306 (1992) 11. Al-Gburi, A., Naim, S., Boraik, A.: Hybridization of bat and genetic algorithm to solve Nqueens problem. Bull. Electr. Eng. Inform. 7, 626–632 (2018) 12. Jain, V., Prasad, J.S.: Solving N-queen problem using genetic algorithm by advance mutation operator. Int. J. Electr. Comput. Eng.. 8, 4519–4523 (2018) 13. Ahmed, A., Shah, S., Kamran, A., Sani, A., Bukhari, H.S.: Particle swarm optimization for N-queens problem. J. Adv. Comput. Sci. Technol. 1 (2012) 14. Cao, J., Chen, Z., Wang, Y., Guo, H.: Parallel implementations of candidate solution evaluation algorithm for N-queens problem. Complexity 2021, 1–15 (2021) 15. Jianli, C., Zhikui, C., Yuxin, W., He., G.: Parallel genetic algorithm for N-Queens problem based on message passing interface-compute unified device architecture. Comput. Intell. (2020) 16. Janssen, D.M., Liew, A.W.: Acceleration of genetic algorithm on GPU CUDA Platform. In: 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2019, pp. 208–213 (2019) 17. Nunes, I., Ulson, J., Nunes, A.: Development of neurofuzzy architecture for solving the NQueens problem. Int. J. Gen Syst 34(6), 717–734 (2005)
140
A. Arteaga et al.
18. Waqas, M., Bhatti, A.: Optimization of N+1 queens problem using discrete neural network. Neural Netw. World. 27, 295–308 (2017) 19. Funabiki, N., Takenaka, Y., Nishikawa, S.: A maximum neural network approach for N-queens problems. Biol. Cybern. 76, 251–255 (1997) 20. Lakshmi, A.J., Muthuswamy, V.: A predictive context aware collaborative offloading framework for compute-intensive applications. J. Intell. Fuzzy Syst.. 40, 1–12 (2020) 21. de Souza, F., de Mello, F.: N-queens problem resolution using the quantum computing model. IEEE Lat. Am. Trans. 15(3), 534–540 (2017) 22. Draa, A., Meshoul, S., Talbi, H., Batouche, M.: A quantum-inspired differential evolution algorithm for solving the N-queens problem. Int. Arab. J. Inf. Technol. 7, 21–27 (2010) 23. Nadel, B.: Representation selection for constraint satisfaction: a case study using n-queens. IEEE Expert. 5, 16–23 (1990) 24. Bell, J., Stevens, B.: A survey of known results and research areas for N-queens. Discret. Math. 309, 1–31 (2009) 25. Kondrak, G., van Beek, P.: A theoretical evaluation of selected backtracking algorithms. Artif. Intell. 89, 365–387 (1995) 26. Güldal, S., Baugh, V., Allehaibi, S.: N-queens solving algorithm by sets and backtracking (2016) 27. Hazama, K., Ebara, H.: Branch and bound algorithm for parallel many-core architecture. 272– 277 (2018) 28. Koontz, W., Narendra, P., Fukunaga, K.: A branch and bound clustering algorithm. IEEE Trans. Comput. C-24, 908–915 (1975) 29. Stone, H., Stone, J.: Efficient search techniques—an empirical study of the n-queens problem. IBM J. Res. Dev. 31, 464–474 (1987) 30. Nasira, G., Kumar, S.: A backpropagation neural network implementation for hybrid algorithm in solving integer linear programming problems. In: Computing Communication and Networking Technologies (ICCCNT), pp. 1–6 (2010) 31. Al-Rudaini, M.: N-queens problem solving using linear programming in gnu linear programming kit (GLPK) (2016) 32. Olivas, F., Valdez, F., Castillo, O., Gonzalez, C.I., Martinez, G., Melin, P.: Ant colony optimization with dynamic parameter adaptation based on interval type-2 fuzzy logic systems. Appl. Soft Comput. 53, 74–87 (2017) 33. Olivas, F., Valdez, F., Castillo, O., Melin, P.: Dynamic parameter adaptation in particle swarm optimization using interval type-2 fuzzy logic. Soft Comput. 20(3), 1057–1070 (2016) 34. Olivas, F., Valdez, F., Melin, P., Sombra, A., Castillo, O.: Interval type-2 fuzzy logic for dynamic parameter adaptation in a modified gravitational search algorithm. Inf. Sci. 476, 159–175 (2019)
Performance Comparative Between Single and Multi-objective Algorithms for the Capacitated Vehicle Routing Problem David Bolaños-Rojas, Luis Chávez, Jorge A. Soria-Alcaraz, Andrés Espinal, and Marco A. Sotelo-Figueroa Abstract Heuristic optimization algorithms are an important and relevant area in Artificial Intelligence. The objective of these algorithms is to find the best solution to a problem from a set of available and feasible alternatives using a guided search based on an empirical criterion at execution time. There is a wide range of algorithms that can be applied to a specific optimization problem with single or multiple objectives. Usually, the novel researcher must deal with the selection of an algorithm to solve their problem. In this work, we compare the performance of two types of optimization algorithms; namely, single objective and multi-objective algorithms to gather information about which type of strategy achieves the best result over the capacitated vehicle routing problem. The aim of this work can be summarized as follows: to find if there exist qualitative differences in the solutions obtained by the usage of mono and multi-objective algorithms and to identify which type of strategy achieves the best quantitative solution.
1 Introduction The Vehicle Routing Problem (VRP) is a combinatorial problem concerned with finding solutions to a goods distribution situation [1]. It tries to find the set of optimal routes in which a delivery entity or vehicle can traverse a set of delivery points once and return to the origin point or depot. It is also a generalization of the famous travelling salesman problem. Many ways of solving this problem exist, including dynamic programming and the use of evolutionary algorithms. An instance of the VRP is the Capacitated Vehicle Routing Problem (CVRP), which consists of also considering the additional constraint of capacity of each delivery entity or vehicle. CVRP is also commonly paired with a distance constraint to also assess the cost of transit of the goods. D. Bolaños-Rojas · L. Chávez · J. A. Soria-Alcaraz (B) · A. Espinal · M. A. Sotelo-Figueroa Departamento de Estudios Organizacionales, Division de Ciencias Económico Administrativas, Universidad de Guanajuato, Guanajuato, México e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_10
141
142
D. Bolaños-Rojas et al.
2 Important Concepts 2.1 Continuous Optimization Optimization is the search of the most optimal or the best answer to a particular problem [2]. Therefore, continuous optimization is search for the optimal solution to a problem which can be modelled through a continuous-valued mathematical function.
2.2 Multi-objective Optimization Multi-objective optimization is very similar to continuous optimization. The main difference is that the algorithms utilized will optimize more than one function. This yields a frontier of optimal values know as Pareto Front. The Pareto Front is a set of solutions for which the increase of fitness to one function, requires the diminishing of fitness in another one.
2.3 Multi-objective Genetic Algorithms For this research, two novel multi-objective algorithms with known performance were chosen to experiment with. The chosen algorithms are NSGA-II and WASFGA.
2.4 Nsga-Ii The Non-dominated Sorting Genetic Algorithm II is a multi-objective genetic algorithm that utilizes a non-dominated sorting approach with a O(MN2 ) complexity. It also utilizes a selection operator that combines the parents and offspring population and selects the best N solutions (according to the fitness functions) for recombination. It can find a wide spread of solutions along the real Pareto front of the problems according to previous statistical analysis [3].
2.5 Wasfga The Global weighting achievement secularizing function genetic algorithm is an aggregation-based genetic algorithm. Its main objective is approximating the whole Pareto front of the functions provided. This algorithm classifies solutions at each
Performance Comparative Between Single …
143
iteration into different fronts and then grades them with an achievement secularizing function. This algorithm also allows the decision-maker to weigh in their own observations on the heuristics of the fitness functions [4].
2.6 Library jMetalPy is a python port of the jMetal framework [5]. It contains classes, functions and methods that ease the process of defining and solving optimization problems utilizing several available metaheuristics. By working on single solutions, it is also capable of employing its algorithms for mono-objective problem solving. jMetalPy contains several base classes for use as parents in new problem classes. The relevant base class for this paper is Permutation Problem. This class is utilized as a base for combinatorics problems that can be solved by permuting the elements from a solution. Implementing a class descending from a Permutation Problem, required overwriting methods to query the name, to create an initial solution and to evaluate and create a new solution. jMetalPy also contains a predefined method to run the experiments on an instance, specifying the probability and techniques parameters of the algorithm for crossover, mutation and selection. For this research, the methods used included the Permutation Swap Mutation operator, PMX Crossover operator, CX Crossover operator, Roulette Wheel Selection operator, and Binary Tournament Selection operator.
3 Mono-objective Approach 3.1 Class’s Description The main classes which were directly used for this comparison where from three types: algorithm utility classes, crossover operator classes and selection classes. The only used algorithm utility class was genetic algorithm class. The classes are described below. Genetic algorithm is a utility class which gets constructed from the desired parameters. Such parameters are the problem instance, population size, offspring population size, mutation operator, crossover operator, termination criterion, selection operator, and a class defining the evaluator type. The evaluator type was set on the multiprocessor evaluator class to make use of every available thread on the evaluating computer. • Permutation Swap Mutation does a random index value exchange on the solution array depending on the result of random.random(). If it is greater than the defined mutation probability, it will exchange the randomly selected indexes values. If it is lower than it, then it does not mutate the solution.
144
D. Bolaños-Rojas et al.
• PMX Crossover is the implementation of the Partially Mapped Crossover genetic crossover algorithm [6, 7]. This operator randomly selects two cutting points. Then, it inherits the outer alleles of the first parent and the inner alleles of the second parent to the first offspring and then the other way around for the second offspring. • CX Crossover is the implementation of the Cycle Crossover genetic crossover algorithm [6, 7]. This operator inherits alleles to an offspring by selecting from either parents’ alleles in a particular position such that an invalid individual is not created. • Roulette Wheel Selection is a simple selection operator which first uniformly assigns probabilities of being picked to the solutions in the solution set and then picks n solutions from them. • Binary Tournament Selection starts by uniformly picking pairs of solutions which are then evaluated for fitness. The fittest solution of the pair gets selected for crossover.
3.2 Restrictions We chose to do analysis on mono-objective meta heuristics for the CVRP, therefore, we evaluated fitness by adding the individual scores of trip solution distance and total weight. Since we evaluated single vehicle instances, the weight value was a constant. The trip distance score was calculated by adding the distances of travel between the nodes in solution’s order.
4 Methodology In order to evaluate several test cases, the library split [1] was used as a source of CVRP problems. This library contains several CVRP problem’s data. The first step was creating a simple parser for the CVRP files. The files where parsed using Python 3’s regular expressions with capture groups. Afterwards the files where parsed into Python objects, which were used to instance objects from a class which inherited from jMetalPy’s Permutation problem class. The problem class is the class which implements the methods required by jMetalPy in order to compute the problem using the Permutation problem class’s over writable methods. This class was responsible for the evaluation of each solution and for the generation of solutions for each generation of the evolutionary algorithm. After the problem class was instantiated, the class genetic algorithm is also instantiated by calling its constructor with the required operators as parameters. To ensure consistency among all the tests, the operator was constructed using the same population size, offspring population size, mutation operator, termination criterion and evaluator type. Such parameters are found in the Table 1.
Performance Comparative Between Single … Table 1 Common parameters for the genetic algorithm instance’s constructor call
145
Population size
200
Offspring population size
200
Mutation operator
Scramble mutation
Mutation probability
0.7
Crossover probability
0.1
Termination criterion
100,000
Evaluator type
Multi process evaluator
5 Experiments The evolutionary algorithm was run 30 times on 3 different CVRP instances for each of the following parameter combinations: • • • •
PMX Crossover and Roulette Wheel Selection PMX Crossover and Binary Tournament Selection CX Crossover and Roulette Wheel Selection CX Crossover and Binary Tournament Selection.
The mutation operator was set on permutation swap mutation for every run. Each run was recorded into a Pandas data frame and then exported to a csv file. The exported dataset was then analyzed statistically utilizing R.
6 Results The first test was a Shapiro–Wilk normality test. This test’s null hypothesis is that the data evaluated comes from a normal distribution with equal parameters. The results of the test showed that the data was indeed normal. Such result is favorable, as it allows us to apply an Analysis of Variance Table test (ANOVA). The results of the Shapiro–Wilk normality test on the A-n33-k5-results.csv instance are found in the Table 2. Since the data showed to be normal, we proceeded by applying an Anova test. This test’s null hypothesis is that the fitness score’s variation for every combination comes Table 2 Results of Shapiro–Wilk test on A-n33-k5-results.csv instance Combination
W
p-value
PMX Crossover and Roulette Wheel Selection
0.98695
0.9447
PMX Crossover and Binary Tournament Selection
0. 97,372
0.5531
CX Crossover and Roulette Wheel Selection
0.97809
0.6961
CX Crossover and Binary Tournament Selection
0.96961
0.4327
146
D. Bolaños-Rojas et al.
Table 3 Anova test’s results for A-n33-k5-results.csv instance Df
Sum squared
Mean squared
F value
Pr (>F)
3
1,373,717
457,906
162.47
C R i f Dku > Dku other wise
(5)
Scheme 2 (Easy Classes)
This scheme is the inverse of scheme 1, creating the X t crossover individual by placing at the beginning of the sequence those classes with lower difficulty and then those with higher difficulty. Letting the chromosomal repairer take care of interleaving the classes of greater difficulty among those of lesser difficulty.
Specialized Crossover Operator for the Differential …
297
In contrast to the previous scheme, a lighter load is set at the beginning to give the last vehicles a higher workload. This is done following the same comparisons previously mentioned with the only modification of the sign (Eq. 6) where Dk u means the difficulty of class k in individual u with u = X m in the first condition, u = X rj in the second and in the third u = X m for the left term and u = X rj for the right term. ⎧ Xr j ⎪ ⎪ ⎪ ⎨X m Xt = ⎪ Xm ⎪ ⎪ ⎩ Xr j
if
Dku < C R
if if
Dku < C R Dku < Dku
(6)
other wise
3.2 Specialized Crossover Operator for Chromosome B (Color) This operator focuses on the grouping of vehicles of the same color based on the demand for a color, denoted dc and respecting the hard constraint of the problem Q (lot size; maximum number of consecutive cars of the same color that can be grouped). The demand for color c is calculated by dividing the number of vehicles requiring that color, denoted by vc by the ceiling function (integer greater than or equal to) of multiplying the input sequence size n by the hard constraint Q. dc = vc ÷ [n ∗ Q]
(7)
This operator uses two schemes to create the crossover individual by grouping vehicles in small groups at the beginning or at the end of the sequence; in the case of scheme 1 it starts with those vehicles with lower demand of the color and at the end those with higher demand, in scheme 2 it starts with those with lower demand and at the end those with lower demand. This is done by Eqs. 8 and 9, where d c u means the difficulty of class k in individual u with u = X m in the first condition, u = X rj in the second and in the third u = X m for the left term and u = X rj for the right term. ⎧ X ⎪ ⎪ rj ⎪ ⎨X m Xt = ⎪ Xm ⎪ ⎪ ⎩ Xr j ⎧ Xr j ⎪ ⎪ ⎪ ⎨X m Xt = ⎪ X m ⎪ ⎪ ⎩ Xr j
i f Dku < C R i f Dku < C R i f Dku < Dku other wise if
(8)
Dku > C R
i f Dku > C R i f Dku > Dku other wise
(9)
298
J. Manzanares et al.
Table 1 Cases of application of the crossover operator within the DECR-s Chromosome
Case 1
Case 2
Case 3
Case 4
A (classes)
Scheme 1
Scheme 2
Scheme 1
Scheme 2
B (color)
Scheme 1
Scheme 2
Scheme 2
Scheme 2
Table 2 Instances of set A ranked by level of difficulty and priority of objectives Groups
# Instances
Identificator
Color
4
PHEL1, PHEL2, PHEL3, PHE
Easy
5
HEPL1, HEPL2, HEPL3, HEPL4, HELP
Difficult
7
HDPL1, HDPL2, HDPL3, HDLP1, HDLP2, HDLP3, HDP
4 Experiment Design For the crossover operator and its schemes, 4 cases of combination of the A and B chromosomes were designed for the assembly line as shown in Table 1. In the experiments we used set A of the instances provided by RENAULT taken from the Constraint Satisfaction Problem Lib repository (see www.csplib.org). This set consists of 16 instances (Table 2) which are divided, in terms of difficulty level and goal priority, into three groups: color, easy and difficult. The names assigned to each instance correspond to their characteristics, for example, PHEL1 means that target P (color changes) is more important than target H (high priority overloads) and this in turn more important than L (low priority overloads). The letter E means that the objective H is considered as easy to optimize for RENAULT and the letter D as difficult to optimize; if there is more than one instance of this type it is assigned a consecutive integer. In some instances, the objective L is not considered. The DECR-s algorithm was implemented in JAVA and compiled with Intellij IDEA Community Edition 2018. The experiments were run on a Toshiba computer with an Intel i3-3217u processor and 6 GB of RAM with Windows 10. The parameters used for the DECR-s algorithm, were mutation factor ϕ = 1, crossover factor CR = 0.9 and population size N = 10, both parameters used in the state of the art. For the execution approach, two values of generations G1 = 25 in general and G1.1 = 100 for each objective were used.
5 Results The performance of the DECR-s algorithm was compared against the results of the simulated annealing algorithm reported by RENAULT and the DECR algorithm. Relative percentages of the results and a statistical analysis are presented. In the case of the statistical analysis of the smoothing objective a Friedman test was applied
Specialized Crossover Operator for the Differential …
299
Table 3 Results for color instances P1
P2
P3
P4
H1
H2
H3
H4
L1
L2 L3
Case 1
30
10
65.2
69
197
43
464.6
398
61
5
Case 2
31.2 10
71
69.5
179
50.2 488.8
394.4 63.8 8.1 850
Case 3
30
11
64
69,6
197
48
398.6 61
Case 4
30
11
64
69
196
53.6 461.8
392
59.6 5.9 843
DECR-s
30
10.5 64
69
196.5 45.5 464.2
395
61
DECR
31.2 11
RENAULT 30
11
463.8
5 5
L4
842.8 NA NA
842.6 NA NA
846.5 NA
73.88 74.37 200
49,8 502.55 435.3 63.5 4.7 854.4 NA
64
48
69
197
462
392
61
5
883
NA
since the results are not normally distributed, and for the color change’s objective the Student’s t-test was applied since the results are normally distributed; in both cases it was used with a significance level α = 0.05. The number of overloads and color changes obtained for the group of color instances are shown in Table 3, which records the results obtained by the DECRs algorithm for all experimental cases and the average value (fifth row) of the cases, together with the results reported by the DECR algorithm (average value) and RENAULT. Figure 2a plots the average of the results of the DECR-s, DECR and RENAULT algorithms for the group of color instances. For the other groups, the overall performance of the DECR-s algorithm with respect to the DECR and RENAULT algorithms is shown in Fig. 2b for the easy ones and in Fig. 2c for the difficult ones. Figures 2a and b show in them, the DECR-s (solid line) algorithm competed in most instances with Renault and improved the DECR algorithm. Figure 2c in these instances, the results are more unstable, however, the DECR-s algorithm, in general, improves the results obtained by the DECR algorithm and achieves a better result than the RENAULT algorithm.
5.1 Results Per Case of the Crossover Operator From the information obtained from all the experimentation, the relative percentages of the cross operator with respect to the results reported by RENAULT were calculated, where better means the cases where the fitness is lower than that of RENAULT and worse are higher. In Table 4, in a general way the total relative percentage of the 46 comparisons performed for the 4 types of cases are shown. The data for the best and worst categories show that the DECR-s algorithm improved the performance of the DECR algorithm in any of its cases.
300
J. Manzanares et al.
Fig. 2 Fitness of the P, H and L objectives obtained by the DECR-s, DECR and RENAULT algorithms for the a color group b easy group and c difficult group instances
Specialized Crossover Operator for the Differential …
301
Table 4 Total relative percentages of the 4 cases of the experimental design Case 1 (%)
Case 2 (%)
Case 3 (%)
Case 4 (%)
DECR (%)
Better
52
29
50
39
19
Worse
22
27
28
35
44
5.2 Statistical Analysis of Cases This section first presents an analysis to determine if among the 4 cases there is one that is better. Subsequently, an analysis is conducted to determine if the performance of the 4 cases is the same or different than that of the RENAULT for each of the objectives (P, H, L). The p-values obtained for objectives H and L are shown in Table 5 where it is observed that the p-values obtained exceed the level of significance; therefore, the null hypothesis cannot be rejected, i.e., it is not possible to determine if any of the 4 cases is better. To determine the performance of the 4 cases, a Friedman test with α = 0.05 was applied. Table 6 shows that the values obtained for objectives H and L exceed the level of significance. Therefore, the null hypothesis cannot be rejected and it is not possible to determine if any of the 4 cases is better than the RENAULT. To contrast the performance of the cases against that reported by RENAULT, a Friedman test was applied again, finding that only for objective H the p-value shows that there is a significant difference in performance. The Ranks test is applied to identify which of the algorithms has better performance (Table 7) where it can be observed that RENAULT has the best performance. Table 5 Friedman test results of the crossover operator cases for the H and L objectives Objective
Significance
H
0.325
L
0.417
Table 6 Friedman test results for cases and RENAULT Objective
Significance
H
0.011
L
0.180
Table 7 Average ranges for cases and RENAULT Average ranges Objective
RENAULT
Case 1
Case 2
Case 3
Case 4
H
1,91
2,75
3,31
3,53
3,50
302
J. Manzanares et al.
For the objective P a Student’s t-test with α = 0.05 was applied pairwise between the cases and the RENAULT. In this case all p-values were in the range [0.203, 0.990] so no difference was found in the performance of the algorithms.
6 Conclusion In this work, a specialized crossover operator was proposed to improve the performance of the DECR algorithm. This operator is included in the DECR-s algorithm, which does not depend on randomness to generate the crossover individual, but on specific information of the MCS problem such as: proportion constraints, classes, and their difficulty, demands of the options and the paint color. The crossover process allows selecting a gene (vehicle) to place it in a certain position (best position) achieving balance and separability among the vehicles that make up the individual (sequence) which allows decreasing the number of overloads and color changes. It was observed that in the case of chromosome A, depending on the case used, it is possible to decide the order of the sequence, either entering first the classes with greater difficulty or with less difficulty. Whereas, in the case of chromosome B the sequence order is decided by the paint color, which forms large and small groupings of vehicles that share the same color; allowing the chromosome repairer to make intercalations. Through the analysis of the total relative percentages by scheme, it can be observed that case 1 is the one that provides the best results, having 52% in results where it beats the state of the art, winning with 2% to case 3 and with 23% to case 2 and 13% to case 4. This indicates that by starting a sequence with a vehicle belonging to a class of greater difficulty, interspersing one or more vehicles belonging to a class of lesser difficulty, a greater distance is obtained between equal vehicles, ensuring that the one that requires a greater workload is attended, without leaving others waiting or overloading the workstation. The overall results show that DECR-s improves the performance of its predecessor DECR. With reference to RENAULT the DECR-s algorithm achieves competitive results in 10 out of 16 instances which represents 61%. However, aspects were observed that could improve the performance of DECR-s in future research, including that the crossover operator considers the number of options to be installed per class, and the specialization of the classical mutation operator supports the crossover operator specialized in gene recombination. Acknowledgements The authors wish to thank the Consejo Nacional de Ciencia y Tecnología (CONACYT) of México, through grants for postgraduate studies: 482923 (J. Manzanares), 634738 (E. Sánchez) and the Tecnológico Nacional de México/Instituto Tecnológico de León, for the support provided for this research.
Specialized Crossover Operator for the Differential …
303
References 1. Solnon, C.: The car sequencing problem: overview of state-of-the-art methods and industrial case-study of the ROADEF’2005 challenge problem. Eur. J. Oper. Res. 191, 912–927 (2008) 2. Zufferey, N.: Tabu Search Approaches for Two Car Sequencing Problems with Smoothing Constraints, de Metaheuristics for Production Systems, vol. 60, pp. 167–190. Springer, Francia (2016) 3. Zinflou, A., Gagn, C., Gravel, M.: Design of an efficient genetic algorithm to solve the industrial car sequencing problem. In: Advances in Evolutionary Algorithms. InTech (2008) 4. Cordeau, G.J.-F.: Iterated tabu search for the car sequencing problem. Eur. J. Oper. Res. 191, 945–956 (2008) 5. Zhang, X.-Y., Gao, L., Wen, L., Huang, Z.-D.: A hybrid algorithm based on tabu search and large neighbourhood search for car sequencing problem. J. Cent. South Univ. 25, 315–330 (2018). https://doi.org/10.1007/s11771-018-3739-2 6. Sánchez Márquez, E.M., Puga Soberanes, H.J., Mancilla Espinoza, L.E., Carpio Valadez, J.M., Ornelas Rodríguez, M., Manzanares Cuadro, J.I.: Algoritmo de evolución diferencial con reparador cromosómico aplicado a un problema de secuenciación de vehículos. RCS 148, 279–292 (2019). https://doi.org/10.13053/rcs-148-8-21 7. Zinflou, A., Gagné, C., Gravel, M.: Crossover Operators for the Car Sequencing Problem, pp. 229–239. Springer, Berlin, Heidelberg (2007) 8. Li, G.-Y., Liu, M.-G.: The summary of differential evolution algorithm and its improvements. In: 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE). IEEE (2010) 9. Labib, K., Uznanski, P., Wolleb-Graf, D.: Hamming distance completeness. In: 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019), Schloss Dagstuhl-LeibnitzZentrum für Informatik, p. 14 (2019) 10. Syswerda, G.: Schedule Optimization Using Genetic Algorithms, pp. 332–349 New York (1991)
A Brave New Algorithm to Maintain the Exploration/Exploitation Balance Cecilia Merelo , Juan J. Merelo , and Mario García-Valdez
Abstract At the beginning of this year one of the authors read “A brave new world”, a novel by Aldous Huxley. This book describes a dystopia, which anticipates the development of world-scale breeding technology, and how this technology creates the optimal human race. Taking into account that when talking about genetic algorithms our goal is to achieve the optimum solution of a problem, and this book kind of describes the process for making the “perfect human”, or rather the “perfect human population”, we will try to work on this parallelism in this paper, trying to find what is the key to the evolution processes described in the book. The goal is to develop a genetic algorithm based on the fecundation process of the book and compare it to other algorithms to see how it behaves, by investigating how the division in castes affects the diversity in the poblation. In this paper we describe the implementation of such algorithm in the programming language Julia, and how design and implementation decisions impact algorithmic and runtime performance. Keywords Evolutionary algorithm · Metaheuristics · Literary-inspired algorithms · Exploration/exploitation balance
1 Introduction Population-based algorithms [21] are global optimization, stochastic methods that use different techniques for exploring the search space in such a way that it makes finding the solution feasible in a reasonable amount of time. As such, they must C. Merelo (B) · J. J. Merelo University of Granada, Granada, Spain e-mail: [email protected] J. J. Merelo e-mail: [email protected] M. García-Valdez Tijuana Technological Institute, Tijuana, Mexico e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_20
305
306
C. Merelo et al.
strike a balance between feasibility (exploitation of the features found in the current population of solutions to optimize fitness) and off-line performance (exploration of parts of the search space that might hold the key to those solutions) [26]. And in this balance, diversity is one of the keys [1]. This drives the search for new algorithms that explicitly try to keep this balance in a comfort zone. In this search, inspiration may arrive from unexpected places. At the beginning of the year one of the authors read the famous book by Aldous Huxley: A brave new world [11]. The novel is a distopy that describes the development in reproductive technology, psychological manipulation (including virtual-reality plays called feelies [13]) and classical conditioning [2]. It introduced interesting ideas in several areas, but in this paper we are going to focus in how they optimize the population. In order to maximize efficiency and decrease unhappiness, the book describes how the population is divided in castes, assigned since birth, where everyone knows and accepts their place. That way, they achieve an “optimum world“, whose optimization is based on these reproduction restrictions and in the overall balance the division in castes creates, not in an individual. This tension between the individual happiness or realization and the overall harmony or optimal state is, precisely, the main plot driver, becoming a literary harbinger of the comparison between individual and populationbased optimization algorithms, although of course the intention of the author was to compare collectivist and individualist culture [14]. When we talk about evolutionary algorithms the target is reaching the optimum solution for a problem, and this book perfectly describes the process through which they have reached the perfect human race. Therefore we want to develop an algorithm based on the book’s fecundation process and compare its behaviour with a basic genetic algorithm. Our main assumption during the design of this process is that the division in castes affects the poblation’s diversity; we can draw a parallel between the collectivism/individualism tensions inherent in the novel and the exploration/exploitation balance in population based global algorithms, and thus try and apply their solutions to our problems. The main conclusion we draw from the book is that what makes collective society happy might make some individuals extremely isolated and sad; but if we apply this to our optimization realm we can see that what Aldous Huxley might be unwittingly proposing is the rough layout of an optimization algorithm, where individuals with different fitnesses undergo different differentiation/evolution/reproduction processes. We will take it from here to design an optimization algorithm, which of course we call Brave New Algorithm. Essentially, BNA is an evolutionary algorithm with mating restrictions and, additionally, differentiated application of operators depending on the fitness value. When designing an algorithm from scratch, we must also make technical choices on how this implementation is going to take place, as well as how the whole process of going from design (our user stories) to final implementation (product) can be performed according to best practices in software engineering. Implementation matters [16], and an agile development process can help us get the final result in the most efficient way, guaranteeing the quality of any software product [17]. This manifesto also encourages open science, which is the model that we are following from the onset of this research line.
A Brave New Algorithm to Maintain …
307
The rest of the paper is organized as follows: next, the state of the art in this area is examined. The algorithm (and its implementation) are described in Sects. 3 and 4. The first experiments performed with this implementation of the algorithm are shown in Sect. 5. Finally, we will discuss the results and present our conclusions in Sect. 6.
2 State of the Art One of the koans written by Goldberg in “Zen and the art of the evolutionary algorithm” states that we should let Nature be our guide. This has profitably led to many population-based metaheuristics that are inspired by the behavior of different species [19], going as far as the behavior of pigeons in a park [4]. The exhaustion of the pool of species with collective behavior to mimick has led further away, for instance to zombies [20], which eventually has led to backlash [24] claiming how metaphors obscure understanding of new algorithms, and do not advance the field of optimization. In fact, evolutionary algorithms are not the best way to reflect social processes [6], but in this case the intent of Aldous Huxley was exactly the opposite: how an industrial, at scale, version of biological evolution applied to the whole human race (except for what they called “savages”) could determine social processes. This does not imply, however, that metaphors are necessarily unhelpful. The potential of a book such as the one we deal with here to inspire optimization has, however, not been realized, although it has been mentioned at least one in relation with an evolutionary algorithm: [25] mentioned one of the “methods” of the book, “screening out savages”, as a way of, apparently, giving 0 fitness to missiles that didn’t meet the constraints of a “commanded flight profile”. Curiously enough, another oblique reference to Brave New World via the sentence Consider the horse. They considered it.
in [23] brings us to the main theme in this paper. By “considering the horse”, the author of the novel refers to how exploration, the enhancement of diversity, is able to find new solutions to problems. Keeping diversity hight becomes specially necessary in dynamic environments [7] since it allows the population to maintain a certain memory of what happened in the past. Many different techniques have been used to enhance diversity in evolutionary algorithms, from migration policies [10] through simple mutation [12] to reactive methods that are “aware” of diversity and kick-in when diversity is low [27]. However, this adds an additional layer of complexity to the algorithm, needing the establishment of thresholds and/or dynamic measures. In this paper we will present a proof of concept and initial experiments for BNA, which preserves diversity through natural mechanisms, and uses it to achieve higher performance in, initially, static problems.
308
C. Merelo et al.
3 Algorithm’s Nature As it was mentioned before, the algorithm is based in the optimization’s process of the human race described on the book, thus we are talking about an algorithm based in the evolution of a population, it will follow the structure of evolutionary algorithms specifically a genetic algorithm. The book describes how they achieved the perfect human race working with an assembly line with different phases. This will be reflected with a generational evolutionary algorithm with selection, crossover, mutation and replacement operators. The process begins in the Fecundation Room, here the eggs are created and fertilized. Once the fertilization is finished all the eggs got to the Hatchery where the caste to which each individual will belong is decided. Huxley describes how the higher castes (Alpha and Beta) are suministred a higher amount of nutrients and hormones during the incubation. While the lower castes (Gamma, Delta, and Epsilon) are deprived of these elements, needed for the development. To imitate this “lack of nutrients”, in the algorithm developed we will deprive the lower castes of the operators, they will only mutate. With all that has been mentioned, the castes will be developed in the following way: – Alphas: in the books they are the most intelligents, the elite belongs to this group. They have responsibilities, they are ones that take decisions. In our implementation they will be reproduced with other individuals of the caste and they will evolve with all the operators. – Betas: in the book they are less intelligents that the before mentioned and their main role is working in administrative tasks. In the implementation, the crossover will only be with individuals from the Alpha caste, – Gammas: in the book they are subordinates, whose tasks require hability. In the implementation they will only mutate, but using local search – Deltas and Epsilons: in the book both these castes are employees of the other castes and do repetitive works. In the implementation they will only have mutation by fixed segment. With this structure in min the metaheuristic will be divided in the following phases: – Fecundation room: the individuals are created in a randomized way. – Hatchery room: in this phase we will divide the population in castes. We will do this following the fitness value of the individual as the criteria. Furthermore, each caste will have a different population percentage. Because in the book they mention that lower castes are produced with the Bokanovsky’s process, where an ambryo its divided into 96 identical twins. In the algorithm this will be reflected in the population size, that will descend when the caste is higher. – Caste evolution: each caste will follow a different process, as it was mentioned before
A Brave New Algorithm to Maintain …
309
We are not talking about static castes, they are generated at the beginning of each generation. Let’s imagine that we have a population size of ten, each individual with a fitness value. In the fist iteration the population will be divided following that value. After that, each individual will follow the evolution process corresponding to the caste. At the end of each generation all the chromosomes will be mixed, regardless of the caste. The next generation will start dividing this chromosomes in castes again.
4 Implementation In first implementations and versions this algorithm was written in Python [18], but a lot of performance issues arised which made the algorithm hard to analyze. This is why for this second proof of concept, one of the requirements was to use a language that provided more performance, and also at the same time could have the potential for parallelism. This is why for this version we’ve chosen a language that joined the “Petaflop club”, which includes the languages that overcome the 1 petaflop/second as peak performance. We are talking about Julia [3] programming language, a multiparadigm dynamically typed language. In this section we will give a bit of insight about the data structures and implementation. The agile manifesto has seldom been used in the scientific realm. This manifesto has the customer at its center, and exhorts to follow best practices that guarantee quality software through the elaboration of minimally valuable products; applying it to science in general as suggested by the agile science manifesto [17] it includes also results and any other artifact into the agile development cycles, with frequent interaction among the different stakeholders. Besides, agile science is open science, and all experiments, configuration files, and data files, as well as obviously source, are available with an open source license in GitHub https://github.com/cecimerelo/UnAlgoritmoFeliz. Following best practices in this area, Domain Driven Design [8] methodology was applied to the problem domain. This allowed us to go from the existing user stories (which can be read in the above-mentioned GitHub repository) to the data structures used here. As we mentioned, Julia is a dynamically typed language, but it has some advantages from static typed ones, making it possible to indicate the types for some variables. So, for making a good use of this paradigm the castes have been defined as different types, for example, for alpha caste: @with_kw struct ALPHA 1)
6 7
group ← group[class == commonClass] end if
8
end for
9
examples ← ungroup(groups)
10
return windowing(examples, size)
// call to Algorithm 1
3.3 Methodology The experiments run the selected methods on the thirty artificial datasets generated as explained in Sect. 3.1. Respecting the configuration of J48, the default parameters setting in Weka is adopted, excepting that we enable the Laplace Smoothing correction. Provost [13] suggests this correction to improve the probability estimation on tree leaves when the sampling method modifies the class distribution, as Windowing does [10]. Even though most of the parameters remain fixed during the experimentation. The parameter for using the J48’s post-pruning method is modified to assess its impact. Concerning the setting of the Windowing methods, a stratified random sampling performs to obtain the initial windows with 20% of the training set. A 10-folds cross-validation process repeated ten times evaluates all the approaches. Three factors are considered in this work to assess the performance: the models’ predictive performance, the data usage, and the noise filtered. The predictive performance of the obtained models is measured using two metrics: accuracy and Area Under the ROC Curve (AUC). The decision trees perform over clean and noisy test sets. In the first scenario, an instantiated NCAR model generates
Extension of Windowing as a Learning Technique …
451
the clean test sets. However, this situation represents an ideal case. In the second scenario, a cross-validation process generates the noisy test sets. For datasets with noise levels higher than 10%, test sets contain similar noise distribution since the validation split is a random stratified sample. Considering the data usage, the percentage of examples used for the induction reveals the reductions performed by the methods. Values closer to 90% imply that all the training set is used, as the traditional approach does in a tenfold cross-validation process. Three metrics are adopted to measure the inconsistency filtering efficiency in the final Window: The percentage of inconsistency, the proportion of clean examples removed erroneously (ER1) [1], and the proportion of noise not removed (ER2) [1]. ER2 is adopted to measure the percentage of noise that remains in the samples to compare the performance of Windowing and WIF.
4 Results Although not shown because of the available space, accuracy and AUC results show that the decisions trees of all the methods perform similarly over noisy test sets: Noise causes a linear performance degradation. The performance ranges 90% in test sets with 10% of noise, decreasing down to 60% in sets with 40% of noise. Regarding the predictive performance over clean test sets, Fig. 3 shows the results in terms of AUC. While the columns represent the size of the training dataset, the rows represent the noise level. Higher numbers of available training examples tend to improve the performance in noisy domains, probably because decision trees are robust enough to deal with noise if there is enough data for the induction. Indeed, the worst results are with noise levels higher than 30% and less than 2,000 examples. However, the post-pruning method can improve the performance in these conditions. Although the results in terms of accuracy seem similar to those of AUC, models get slightly lower levels of accuracy. This observation could probably be due to the predictive accuracy achieved in each class and an imbalanced class distribution. However, it must be validated with experimentation. It is well-known that Windowing achieves drastic reductions of the training data [15], but this behavior has not been corroborated in noisy domains up to now. Figure 4 shows that the percentage of used training examples of the original version of Windowing is near 20% with no noise. However, this behavior tends to decrease with higher noise levels. When there is 40% of noise, the reductions are minimal. The reductions performed by the proposed variant are considerably better. When Windowing uses 88% of the training data, WIF uses 43% for the dataset of 10,000 examples and 40% of noise. Table 2 shows the number of inconsistent examples on each dataset. The proportion of inconsistency varies depending on two factors: the size of the dataset and the noise levels. Despite results in noisy datasets, data with 0% noise have a low portion of inconsistencies, this is probably explained by the probability distribution
452
D. Martínez-Galicia et al.
Fig. 3 AUC results for J48, Windowing (WND), and Windowing Inconsistency Filtering (WIF). Columns vary the number of training examples and rows the percentage of noise
of TrueClass when the attribute Competitiveness takes the value of N . However, the occurrence of inconsistencies is no likely to happen, hence it can be obviated. Table 3 presents the number of examples after the preprocessing. Results suggest that the higher level of noise, the more data is deleted. Indeed, the proportion of deleted examples is near to the noise percentage. Percentages of ER1 (clean examples removed) and ER2 (noise not removed) closer to zero suggest that most of the noise is filtered without deleting non-corrupted data. Table 4 describes the results of these metrics. Even though this preprocessing removes just inconsistencies, it deletes up to 95% of noise when there is a low percentage of noise and a high amount of data. Furthermore, the reductions achieved in the dataset with 40% of noise and 10,000 examples are not negligible. The preprocessing stage gets results near desirables levels of ER1 and ER2. However, some specific scenarios affect its performance, e.g., an inconsistency set with a perfectly balanced class distribution. This problem could be solved deleting the whole inconsistency. Figure 5 presents the comparison of the methods using the ER2 metric. This figure corroborates that Windowing aggregates all the noise in the Window since it does not
Extension of Windowing as a Learning Technique …
453
Fig. 4 Percentage of used examples for J48, Windowing (WND), and Windowing Inconsistency Filtering (WIF). Columns vary the number of training examples and rows the percentage of noise
Table 2 Inconsistent examples in the datasets Noise level Examples
0%
10%
20%
30%
40%
250
2
69
89
111
130
500
0
158
265
289
304
1,000
0
478
641
729
738
2,000
7
1,225
1,581
1,673
1,752
5,000
76
3,951
4,475
4,660
4,738
10,000
182
8,775
9,402
9,704
9,762
adopt a strategy to filter it and then noisy and counter examples are indistinguishable. Though, WIF achieves to reduce the noise from 40% up to 95%. As mentioned earlier, the data size is a determining factor for the success of WIF.
454
D. Martínez-Galicia et al.
Table 3 Examples after the inconsistency filtering algorithm Examples
Noise level
0%
10%
20%
30%
40%
250
249
231
221
211
202
500
500
469
418
398
388
1,000
1000
921
835
780
725
2,000
1997
1816
1621
1474
1355
5,000
4983
4518
4084
3543
3226
10,000
9962
9005
8079
6958
6221
Table 4 Results of ER1 and ER2 after the inconsistency deletion algorithm 10%
Noise level Examples
Metric
ER1
20% ER2
ER1
30% ER2
ER1
40% ER2
ER1
ER2
250
1.85
55.88
5.03
62.75
8.98
71.08
11.68
68.75
500
2.44
60.78
7.97
54.05
9.28
57.23
14.56
69.69
1,000
1.75
28.40
3.75
32.84
8.29
46.00
14.23
52.95
2,000
0.99
15.30
2.97
20.86
5.24
23.68
15.13
42.90
5,000
0.46
11.58
0.91
9.89
4.00
17.03
10.95
29.08
10,000
0.44
4.51
0.93
6.58
1.91
9.96
6.71
16.93
5 Conclusions Previous works have studied the behavior of Windowing in noisy domains focusing on its predictive performance, the complexity of the learned models, and its learning time. However, their conclusions discouraged the use of Windowing since it is not able to obtain more accurate models than the traditional induction does using all the training data. The lack of a clear definition of noise makes it difficult to understand how Windowing is affected by noisy examples. This paper proposes using a probabilistic model that describes the performing of class noise in well-known domains and generates artificial datasets with noise-controlled conditions. The generated datasets provide information about which examples are noisy and enable the description of the Window composition. Regarding the sampling, results suggest that the big data reductions performed by Windowing tends to decrease in noisy domains. Indeed, there are no significant reductions in datasets with 30% of noise or more. This behavior implies that all the noise is incorporated into the Window. This paper also proposes a Windowing extension with a preprocessing stage to delete inconsistency. The preprocessing algorithm, inconsistency filtering, is founded on a simple heuristic: If there is class noise in low proportions, the corrupted examples should belong to the minority class in an inconsistency. Results support that the proposed extension can reduce up to 55% of the data usage and 75% noise compared to its original formulation in datasets with 40% noise. This work suggests future lines of research on Windowing, including:
Extension of Windowing as a Learning Technique …
455
Fig. 5 ER2 results for J48, Windowing (WND), and Windowing Inconsistency Filtering (WIF). Columns vary the number of training examples and rows the percentage of noise
1.
2.
3.
Studying the generalization of the inconsistency deletion algorithm. Results show that the J48 algorithm is robust to noise when the post-pruning method is enabled. However, when this method is disabled, WIF gets better results in domains with 30% of noise or more. It is necessary to determine if the preprocessing stage is helpful when other classifiers are adopted, particularly those that do not employ a method to deal with noise, e.g., naïve Bayes. Analyzing the Windowing behavior in other class noise models. This paper proposes three types of class noise that are not usually studied, i.e., NAR1, NAR2 and NNAR. These models represent more realistic scenarios of noise that can lead to a better understanding of how Windowing methods deal with noise. Generating artificial continuous domains for the evaluation of learning techniques. The suggested model works with discrete BN, restricting the data generation to nominal datasets. Though, real-world data does not only contain discrete variables. The creation of mixed datasets opens a new research line focused on proposing new filtering noise techniques.
456
D. Martínez-Galicia et al.
Acknowledgements The first author was funded by a scholarship from Consejo Nacional de Ciencia y Tecnología (CONACYT), Mexico, CVU: 895160.
References 1. Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. JAIR 11, 131–167 (1999). https://doi.org/10.1613/jair.606 2. Catlett, J.: Mega induction: a test flight. In: Machine Learning Proceedings 1991. Elsevier, pp. 596–599 (1991). https://doi.org/10.1016/b978-1-55860-200-7.50121-5 3. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17, 3 (1996). https://doi.org/10.1609/aimag.v17i3.1230 4. Frenay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25, 845–869 (2014). https://doi.org/10.1109/tnnls.2013.2292894 5. Fürnkranz, J.: Noise-tolerant windowing. IJCAI (1998) 6. Hickey, R.J.: Noise modelling and evaluating learning from examples. Artif. Intell. 82, 157–179 (1996). https://doi.org/10.1016/0004-3702(94)00094-8 7. Kim, M.-J., Han, I.: The discovery of experts’ decision rules from qualitative bankruptcy data using genetic algorithms. Expert Syst. Appl. 25, 637–646 (2003). https://doi.org/10.1016/ s0957-4174(03)00102-7 8. Limón, X., Guerra-Hernández, A., Cruz-Ramírez, N., Acosta-Mesa, H.-G., Grimaldo, F.: A windowing strategy for distributed data mining optimized through GPUs. Pattern Recogn. Lett. 93, 23–30 (2017). https://doi.org/10.1016/j.patrec.2016.11.00 9. Limón, X., Guerra-Hernández, A., Cruz-Ramírez, N., Grimaldo, F.: Modeling and implementing distributed data mining strategies in JaCa-DDM. Knowl. Inf. Syst. 60, 99–143 (2018). https://doi.org/10.1007/s10115-018-1222-x 10. Martínez-Galicia, D., Guerra-Hernández, A., Cruz-Ramírez, N., Limón, X., Grimaldo, F.: Windowing as a sub-sampling method for distributed data mining. Math. Comput. Appl. 25, 39 (2020). https://doi.org/10.3390/mca25030039 11. Martínez-Galicia, D., Guerra-Hernández, A., Cruz-Ramírez, N., Limón, X., Grimaldo, F.: Towards windowing as a sub-sampling method for distributed data mining. Res. Comput. Sci. 149, 3 (2020) 12. Nettleton, D.F., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33, 275–306 (2010). https:// doi.org/10.1007/s10462-010-9156-z 13. Provost, F.: Tree induction for probability-based ranking. Mach. Learn. 52, 199–215 (2003). https://doi.org/10.1023/a:1024099825458 14. Quinlan, J.: Induction over large data bases. Stanford University (1979) 15. Quinlan, J.: Learning efficient classification procedures and their application to chess end games. Mach. Learn. 1, 463–482 (1983). https://doi.org/10.1007/978-3-662-12405-5_15 16. Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976). https://doi.org/10. 1093/biomet/63.3.581 17. Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7, 147–177 (2002). https://doi.org/10.1037/1082-989x.7.2.147 18. Scheines, R., Spirtes, P., Glymour, C., Meek, C., Richardson, T.: The TETRAD project: constraint based aids to causal model specification. Multivar. Behav. Res. 33, 65–117 (1998). https://doi.org/10.1207/s15327906mbr3301_3 19. Wirth, J., Catlett, J.: Experiments on the costs and benefits of windowing in ID3. In: Machine Learning Proceedings 1988. Elsevier, pp. 87–99 (1988). https://doi.org/10.1016/B978-0-934 613-64-4.50015-3
Extension of Windowing as a Learning Technique …
457
20. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data mining. Morgan Kaufmann Publisher (2016)https://doi.org/10.1016/C2015-0-02071-8 21. Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22, 177–210 (2004). https://doi.org/10.1007/s10462-004-0751-8
Why Rectified Linear Activation Functions? Why Max-Pooling? A Possible Explanation Julio C. Urenda and Vladik Kreinovich
Abstract At present, the most successful machine learning technique is deep learning, that uses rectified linear activation function (ReLU) s(x) = max(x, 0) as a nonlinear data processing unit. While this selection was guided by general ideas (which were often imprecise), the selection itself was still largely empirical. This leads to a natural question: are these selections indeed the best or are there even better selections? A possible way to answer this question would be to provide a theoretical explanation of why these selections are—in some reasonable sense—the best. This paper provides a possible theoretical explanation for this empirical fact.
1 Formulation of the Problem: An Explanation is Needed Deep learning is the most successful maching learning tool. At present, the most successful machine learning technique is deep learning; see, e.g., [2]. It is a version of neural networks where: • in contrast to the previously used 3-layer schemes, • many consecutive layers of neurons are used. What makes deep learning successful? Simply increasing number of layers is not sufficient to make deep learning successful. The current success is also largely due to the appropriate selection of other features.
J. C. Urenda Departments of Mathematical Sciences and Computer Science, University of Texas at El Paso, 500 W. University, El Paso, TX 79968, USA e-mail: [email protected] V. Kreinovich (B) Department of Computer Science, University of Texas at El Paso, 500 W. University, El Paso, TX 79968, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_28
459
460
J. C. Urenda and V. Kreinovich
One of such features is the selection of the rectified linear activation function (ReLU) s(x) = min(x, 0) as a non-linear data processing unit. This selection was guided by general ideas (which were often imprecise). At present, the selection of ReLU is mostly empirical. At present, the selection of ReLU is still largely empirical—researchers: • tried different selections corresponding to the original imprecise ideas, and • chose the selections that worked the best. A natural question. The empirical nature of these selections leads to the following natural question: • Are these selections indeed the best? • Or are there even better selections—and we do not know about them, since they have never been tried? What we do in this paper. In this paper, we provide a theoretical explanation of why this (and other) selections are—in some reasonable sense—the best.
2 Why Rectified Linear Neurons: Our Explanation The main purpose of machine learning: reminder. The main purpose of machine learning is: • based on examples x (k) , y (k) , k = 1, . . . , K , • to come up with an algorithm f (x) that fits all these examples, i.e., for which f x (k) ≈ y (k) for all k. Once the machine learning tool is trained, we can use it to compute y = f (x) for any given x. In many applications, decreasing computation time is the main objective. In many practical problems, we need the result y as soon as possible. An example for this need is when computing the control values y based on the current state x of a self-driving car. In such situations, it is necessary to minimize the time needed for computing f (x). How neural networks operate: a brief reminder. In a neural network, computations mean interchangingly applying: • linear transformations, and • nonlinear transformations z = s(x). So, it is important to select the corresponding function s(x) which is the fastest to compute.
Why Rectified Linear Activation Functions? Why Max-Pooling? A Possible …
461
In general, which functions are the fastest to compute? In a computer, every computation is performed as a sequence of hardware supported operations. These are min, max, sum, and product. Among these functions: • min and max are the fastest, • sum is somewhat slower, and • the product is slower than the sum. Thus, the fastest possible function s(x) is the one that can be computed: • by a single hardware supported operation, • and ideally, by the fastest of them. So, it is reasonable to use min and max. Which activation functions are the fastest to compute? When we compute s(x): • we can use the input x, and • we can use constants c, c , …. For activation functions: • Computing min(x, x) or max(x, x) does not makes sense, since this is simply x. • Similarly, computing min(c, c ) or max(c, c ) does not make sense—we simply get one of these constants. So, the only operations that make sense are min(x, c) and max(x, c), for some constant c. Which constant should we use? Out of all possible constants c, the constant c = 0 is the fastest to generate, since 0 is the default value of each computer-stored variable. Conclusion of this section. So, we end up with: • function max(x, 0)—which is exactly the rectified linear function, and • function min(x, 0)—which is, in some reasonable sense, equivalent to ReLU. So, we have indeed provide a simple theoretical explanation for the empirical success of rectified linear activation functions.
3 Why Max-pooling Need for pooling. One of the main applications of neural networks is to process pictures. In a computer, a picture is represented by storing intensity values for each pixel. For color pictures, we need intensity values corresponding to three basic colors
462
J. C. Urenda and V. Kreinovich
There are millions of pixels. Processing all these millions of values would take a lot of time. To save this time, we can use the fact that for most images: • once we know what is in a given pixel, • we can expect approximately the same information in the neighboring pixels. Thus, to save time: • instead of processing each pixel one by one, • we can combine (“pool”) values from several neighboring pixels into a single value. Which pooling operation should we select? The whole objective of pooling is to speed up data processing. From this viewpoint, we need to select a pooling operation which the fastest to perform. This means that we need to select a pooling operation which is performed: • by using the smallest possible number of hardware supported computer operations, and • these operations should be the fastest. If we use only one hardware supported operation, we get min(a, b), max(a, b) (and a + b). This is exactly what works well in deep learning; see, e.g., [2].
4 Which Fuzzy Operations? We can apply the same ideas to selecting “and”- and “or”-operations (t-norms and t-conorms) in fuzzy logic; see, e.g., [1, 3–7]. We can then conclude that: • Among all possible “and”-operations, the fastest is min(a, b). • Among all possible “or”-operations, the fastest is max(a, b). So, we get yet another explanation of why min(a, b) and max(a, b) are empirically successful. Acknowledgements This work was supported in part by the National Science Foundation grants 1623190 (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science), HRD-1834620 and HRD-2034030 (CAHSI Includes). It was also supported by the program of the development of the Scientific-Educational Mathematical Center of Volga Federal District No. 075-02-2020-1478. The authors are thankful to all the participants of the International Seminar on Computational Intelligence ISCI’2021 (Tijuana, Mexico, August 17–19, 2021), especially to Oscar Castillo and Patricia Melin, for valuable discussions.
Why Rectified Linear Activation Functions? Why Max-Pooling? A Possible …
463
References 1. Belohlavek, R., Dauben, J.W., Klir, G.J.: Fuzzy Logic and Mathematics: A Historical Perspective. Oxford University Press, New York (2017) 2. Goodfellow, I., Bengio, Y., Courville, A.: Deep Leaning. MIT Press, Cambridge, Massachusetts (2016) 3. Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic. Prentice Hall, Upper Saddle River, New Jersey (1995) 4. Mendel, J.M.: Uncertain Rule-Based Fuzzy Systems: Introduction and New Directions. Springer, Cham, Switzerland (2017) 5. Nguyen, H.T., Walker, C.L., Walker, E.A.: A First Course in Fuzzy Logic. Chapman and Hall/CRC, Boca Raton, Florida (2019) 6. Novák, V., Perfilieva, I., Moˇckoˇr, J.: Mathematical Principles of Fuzzy Logic. Kluwer, Boston, Dordrecht (1999) 7. Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965)
Localized Learning: A Possible Alternative to Current Deep Learning Techniques Javier Viaña, Kelly Cohen, Anca Ralescu, Stephan Ralescu, and Vladik Kreinovich
Abstract At present, the most efficient deep learning technique is the use of deep neural networks. However, recent empirical results show that in some situations, it is even more efficient to use “localized” learning—i.e., to divide the domain of inputs into sub-domains, learn the desired dependence separately on each sub-domain, and then “smooth” the resulting dependencies into a single algorithm. In this paper, we provide theoretical explanation for these empirical successes.
1 Formulation of the Problem Deep learning is successful but it is not a panacea: sometimes it cannot be used. In many problems, deep learning techniques—see, e.g., [4]—provide the most efficient learning: they lead to the most accurate approximation to the real-life phenomena. This does not mean, of course, that no other machine learning techniques are needed: in some other situations, alternative machine learning techniques are needed. For example, it is known that deep learning requires a large amount of data and a lot of computation time. So, if we do not have enough data and/or we do not have enough time to train the network, we have to use other machine learning tools. J. Viaña · K. Cohen · A. Ralescu · S. Ralescu University of Cincinnati, Cincinnati, Ohio 45219, USA e-mail: [email protected] K. Cohen e-mail: [email protected] A. Ralescu e-mail: [email protected] S. Ralescu e-mail: [email protected] V. Kreinovich (B) Department of Computer Science, University of Texas at El Paso, 500 W. University, El Paso, TX 79968, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_29
465
466
J. Viaña et al.
Even when deep learning can be used, other methods are sometimes better. Interestingly, it turns out that in some situations, even when there is enough data and enough time to use deep learning, alternative methods still lead to more accurate results; see, e.g., [12, 13]. Specifically, these papers use the following “localized” learning idea: • we divide the area of possible values of the input into several sub-domains, • we learn the dependence “locally”—i.e., separately on each sub-domain, and then • we combine the resulting dependencies by making smooth transitions between them—this can be naturally done, e.g., by using fuzzy techniques (see, e.g., [1, 5, 7, 9, 10, 14]). Comment. Of course, it is important to make sure that the comparison between different techniques is fair. For each machine learning technique, the more parameters we use, the more accurate results we get. So, the only way to claim that one technique is more accurate is: • either to compare variants of these two methods that use the same number of parameters, • or, alternatively, to show one of the techniques requires fewer parameters to reach the same approximation accuracy. The second alternative is exactly how the comparison was performed in [12, 13]. A natural question. Empirically, the results from [12, 13] are interesting, but in view of the current successes of deep learning, these results are somewhat unexpected. So, a question naturally arises: W hy are these localized techniques so much more accurate than deep learning? What we do in this paper. To answer this question, we first ask a related question: Why are deep learning techniques so successful? To answer this auxiliary question, we first go even further and ask: Why are neural networks so successful in the first place? Once we answer these two questions and find out what are the strong points of neural networks (including deep ones), it will also become clear what are the limitations of neural networks, and why shallow localized networks can, in some important practical situations, overcome these limitations. In line with this plan: • in Sect. 2, we analyze why neural networks are successful in the first place, • in Sect. 3, we focus on successes of deep learning, and • finally, in Sect. 4, we provide a possible explanation of why localized methods are sometimes better.
Localized Learning: A Possible Alternative to Current Deep …
467
Comment. Most main ideas described in Sects. 2 and 3 first appeared in [6]; ideas described in Sect. 4 are completely new.
2 Why Neural Networks: A Theoretical Explanation Why do we need a theoretical explanation. Neural networks appeared as a way to simulate how we humans solve problems: in our brains, signals are processed by neurons. The fact that we are the result of billions of years of improving evolution makes researchers believe that many biological processes are optimal (or at least close to optimal), so simulating them makes perfect sense. On the other hand, there is a difference between computers and us: computers are built from different materials than our brains, they operate on a different size and time scales, so an optimal solution for a computer may be different from the optimal solution for a brain. For example, birds have wings to fly, and airplanes—that, to some extent, simulate the birds–also have wings, but while for the birds, flapping the wings is the optimal flying strategy, airplane wings do not flap at all. To design an airplane, it is not enough to copy the birds, we also need to perform some theoretical analysis. Similarly, to decide which features should be used in computing, it is desirable to provide a theoretical analysis. The main objective of the original neural network: computation speed. Artificial neural networks appeared when the computation speed of computers was several orders of magnitude lower than now. This relatively slow speed was the main bottleneck, preventing computers from solving many practical problems. So, the question arose: how can we make computers faster? Main idea: parallelism. If a person has a task that takes too long—e.g., cleaning several offices, then to speed it up, a natural idea is to ask for help. If several people work on the same task, this task gets performed much faster. Similarly, if it takes too long for one processor to solve a problem, a natural idea is to have many processors working in parallel: • on the first stage, all the processors perform some tasks, • after that, on the second stage, processors use the results of the first stage to perform additional computations, etc. To decrease the overall computation time, we need to minimize the number of stages (also called layers), and to make each stage as fast as possible. Linear versus nonlinear functions. We consider deterministic computers, where the result is uniquely determined by the inputs. In mathematical terms, this means that on each stage, what each processor computes is a function of the inputs. In these terms, to select fast stages, we need to decide which function are the fastest to compute.
468
J. Viaña et al.
Functions can be linear or nonlinear. Of course, linear functions are easier—and thus, faster—to compute. However, we cannot have processor computing only linear functions—because in this case, all the computer will compute will be compositions of linear functions, and such compositions are themselves linear. On the other hand, many real-life dependencies are nonlinear. Thus, in addition to processors computing linear functions, we also need processors computing nonlinear functions. Which non-linear functions are the fastest to compute? In general, the fewer variables the function has, the faster it is to compute. Thus, the fastest to compute are nonlinear functions with the smallest possible number of inputs: namely, a single input. Resulting computation scheme. We thus arrive at a scheme at which at each stage, processors compute: • either a linear function y = a0 +
n
ai · xi
i=1
• or a nonlinear function of one variable y = s(x); such a function is known as an activation function. From the viewpoint of minimizing computation time, it makes no sense to have two linear stages one after another, since the composition of two linear functions is also linear—so we can replace these two stages by a single stage. Similarly, it makes no sense to have two nonlinear stages one after another, since a composition s1 (s2 (x)) of two functions of one variable is also a function of one variable. Thus, linear and nonlinear stages must interleave. It turns out that both 2-stage schemes: • a linear (L) stage followed by a nonlinear (NL) stage, i.e., a sequence L–NL, and • a nonlinear stage followed by a linear stage: NL–L cannot accurately represent general continuous functions on a bounded domain— neither of them can even represent the function f (x1 , x2 ) = x1 · x2 with sufficient accuracy. However, both 3-stage schemes L–NL–L and NL–L–NL can approximate any function. Since NL takes longer than L, the scheme L–NL–L is clearly the fastest. In this scheme: • First, each processor k (k = 1, . . . , K ) transforms the inputs xi into their linear combination yk = wk1 · x1 + . . . + wkn · xn − wk0 . • On the next stage, a nonlinear transformation is applied to each yk , so we compute the values z k = sk (yk ). • Finally, in the final third stage, we compute a linear combination y = W 1 · z 1 + . . . + W K · z K − W0 ,
Localized Learning: A Possible Alternative to Current Deep …
i.e., y=
K k=1
W k · sk
n
469
wki · xi − wk0 − W0 .
(1)
i=1
This is exactly the formula for the traditional 3-layer neural network. Usually, in this network, all the processors (called neurons) use the same activation function: s1 (x) = . . . = s K (x) = s(x). Then, the formula (1) takes the following form: n K y= Wk · s wki · xi − wk0 − W0 . (2) k=1
i=1
Comment. In most cases, traditional neural networks use a so-called sigmoid activation function 1 s(x) = . 1 + exp(−x)
3 Need to Go Beyond Traditional Neural Networks and Deep Learning At present, computation speed is no longer the major concern, accuracy is. While several decades ago, when neural networks were invented, computational speed was the main bottleneck, at present, computers are much faster. The main concern now is not the speed, but how accurately we can perform the computations—how accurately we can predict weather, how accurately we can estimate the amount of oil in a given oilfield, etc. How to increase accuracy. The more parameters we use, the better we can fit the data and thus, the more accurate the model. From this viewpoint, the larger the number K of the neurons, the more parameters we have in the formula (2), and thus, the more accurately we can represent any given function. Limitation. However, there is a serious limitation in this increase of number of options—caused by the fact that any perturbation of K neurons does not change the expression (2). There are K ! such permutations—which, for large K , is a huge number. So, while we have many possible combinations of the coefficients wki and Wk , there are much fewer (K ! times fewer) different functions represented by these combinations. How to overcome this limitation. To overcome this limitation, a natural idea is to decrease the number K of neurons in each layer. To preserve the same number of
470
J. Viaña et al.
parameters, we therefore need to place some neurons in other layers. Thus, instead of the original 3-layer configuration L–NL–L, we get a multi-layer configuration L−NL−L−NL− . . . This is what a deep neural network is about. In a nutshell, this multi-layer scheme is exactly what is known as a deep neural network. There are also some other differences: e.g., deep learning mostly uses a different activation function s(x) = max(x, 0) known as rectified linear unit (ReLU, for short).
4 Beyond Deep Learning, Towards Localization Let us get back to estimating accuracy. As we have mentioned, at present, the main objective in most computational problems is accuracy. In the previous section, we explained how transition to deep neural networks helps increase accuracy. But maybe there are other ways to do it? To answer this question, let us analyze the problem of decreasing accuracy somewhat more deeply. How to estimate accuracy with which we know the parameters: general reminder. Suppose that to “train” the system, i.e., to find the values of the corresponding parameters, we can use M measurement results, with an average accuracy ε. Let us denote by P the overall number of parameters in the model. Each measurement result means one equation in which the parameters are unknowns. So, if we take all measurement results into account, we have M equations with P unknowns. In general, if we have fewer equations than unknowns, then we cannot uniquely determine all the unknowns, some of them may be set arbitrarily—so we do not need all P parameters. Thus, we must have M ≥ P – and usually, we have M P. If we had exactly P observations, then we could determine each parameters with accuracy proportional to ε. Since we have a duplication—i.e., we have more observations than unknowns—we can use this duplication to make the results more accurate. In general, according to statistics, if we √ have d measurements to determine a parameter, the accuracy increases by a factor d; see, e.g., [11]. In our case, we have M measurements for P parameters, so, on average, we have d = M/P measurements per parameter. Thus, we can determine each parameter with accuracy proportional to ε δ=√ = M/P
P · ε. M
What is the accuracy of the result of using this model. The value δ describes the accuracy with which we know each parameter. These accuracies affect the accu-
Localized Learning: A Possible Alternative to Current Deep …
471
racy of prediction. In general, each predicted values depends on all P parameters. each parameters contributes accuracy ∼ δ to the prediction result. It is reasonable to statistics to assume that these P contributions are independent. Thus, according √ [11], the overall effect of all these contributions is proportional to P · δ. How can we make predictions more accurate. To make the results more accurate, we need to decrease the number of parameters on which each predicted value depends. If each predicted value depend only on P √P parameters, then the overall inaccuracy of the prediction is proportional to P · δ, which is much smaller √ than P · δ. In other words, we need localization. To make sure that P P, we need to make sure that each predicted value depends only on a few of the parameters. Thus, each predicted value f (x1 , . . . , xn ) depends only a few parameters. In other words, the list of all P parameters should be divided into sublists (maybe intersecting), so that in each sub-domain of the domain of all possible inputs x = (x1 , . . . , xn ) we have an expression depending only on a few parameters. In other words, we have separate few-parametric expressions describing the desired dependence on each sub-domain—this is exactly what is usually mean by localization—that the value of the function in each sub-domain is determined only by parameters corresponding to this sub-domain. Such a localization is exactly what is used in [12, 13]. Comments. • In terms of the function (1), this would mean that instead of using the same activation function sk (x) = s(x) for all the neurons, as in traditional and in deep neural networks, we use, in effect, different activation functions sk (x) = sk (x) each of which corresponding to a certain sub-domain of the original domain. When the activation functions are different, there is no K ! duplication and thus, no decrease in accuracy—even when we use a traditional (“shallow”) scheme. • There are additional advantages in a localized approach: – it is faster to find the parameters: we need to solve a system with fewer unknowns P P, and the computations corresponding to different sub-domains can be performed in parallel, and – it is easier to modify the solution when the values change in some sub-domain. • Similar arguments explain why an approximation by splines—where we have polynomial approximation on each sub-domain and then smooth them out—leads, in general, to a much better accuracy than a “global” (on the whole domain) approximation by a polynomial; see, e.g., [2, 3, 8]. Corollary: shallow or deep. According to the localization idea, each value f (x) depends only on the parameters corresponding to this point x and neighboring points x ≈ x. In a multi-layer scheme, this means that: • the signal produced by the last layer depends only on the values from the previous layer which are close to x;
472
J. Viaña et al.
• these values, in turn, depend only on coefficients of the pre-previous layer which correspond to locations x which are close to x , etc. We have x ≈ x, x ≈ x , etc., i.e., the differences x − x, x − x , etc., are small. However: • the difference x − x between x and x is the sum of two small differences: x − x = (x − x ) + (x − x); • if we go back one more layer, then the difference x − x is the sum of three small differences, etc. Thus, the more layers we have, the less localized our system, and therefore, the more parameters we need to take into account to predict each value f (x). So, to achieve the best accuracy, we need to use the smallest possible number of layers—i.e., one non-linear layer, as in traditional neural networks. This is exactly what is used in [12, 13]. Comment. Interestingly, there is some rudimentary localization effect in deep neural networks as well. Indeed, since the corresponding activation function s(x) = max(x, 0) is equal to 0 for half of the inputs—namely, for all negative inputs—this means that, on average, help of the value from the previous layer do not affect the next layer. So, the final value produced by the last layer is determined only by half of the neurons in the previous layer. These values, in turn, depend only on the one half of neurons in the previous layer, etc. So, at least half of the parameters are not used when estimating each value—and this fact decreases the approximation error in comparison with our generic estimates. Acknowledgements This work was supported in part by the National Science Foundation grants: 1623190 (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science). HRD-1834620 and HRD-2034030 (CAHSI Includes). It was also supported by the program of the development of the Scientific-Educational Mathematical Center of Volga Federal District No. 075-02-2020-1478. The authors are thankful to all the participants of the International Seminar on Computational Intelligence ISCI’2021 (Tijuana, Mexico, August 17–19, 2021), especially to Oscar Castillo and Patricia Melin, for valuable discussions.
References 1. Belohlavek, R., Dauben, J.W., Klir, G.J.: Fuzzy Logic and Mathematics: A Historical Perspective. Oxford University Press, New York (2017) 2. de Boor, C.: A Practical Guide to Splines. Springer, New York (2001) 3. Eilers, P.H.C., Marx, B.D.: Practical Smoothing: The Joys of P-splines. Cambridge University Press, Cambridge, UK (2021)
Localized Learning: A Possible Alternative to Current Deep …
473
4. Goodfellow, I., Bengio, Y., Courville, A.: Deep Leaning. MIT Press, Cambridge, Massachusetts (2016) 5. Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic. Prentice Hall, Upper Saddle River, New Jersey (1995) 6. Kreinovich, V., Kosheleva, O.: Optimization under uncertainty explains empirical success of deep learning heuristics. Machine Learning and No-Free Lunch Theorems. In: Pardalos, P., Rasskazova, V., Vrahatis, M.N. (eds.) Black Box Optimization, pp. 195–220. Springer, Cham, Switzerland (2021) 7. Mendel, J.M.: Uncertain Rule-Based Fuzzy Systems: Introduction and New Directions. Springer, Cham, Switzerland (2017) 8. Micula, G., Micula, S.: Handbook of Splines. Springer, Dordrecht (2008) 9. Nguyen, H.T., Walker, C.L., Walker, E.A.: A First Course in Fuzzy Logic. Chapman and Hall/CRC, Boca Raton, Florida (2019) 10. Novák, V., Perfilieva, I., Moˇckoˇr, J.: Mathematical Principles of Fuzzy Logic. Kluwer, Boston, Dordrecht (1999) 11. Sheskin, D.J.: Handbook of Parametric and Non-Parametric Statistical Procedures. Chapman & Hall/CRC, London, UK (2011) 12. Viaña, J., Cohen, K.: Fuzzy-based, noise-resilient, explainable algorithm for regression. In: Proceedings of the 2021 Annual Conference of the North American Fuzzy Information Processing Society NAFIPS’2021. West Lafayette, Indiana, June 7–9, (2021) 13. Viaña, J., Raslecu, S., Cohen, K., Ralescu, A., Kreinovich, V.: Extension to multidimensional problems of a fuzzy-based explainable & noise-resilient algorithm. In: Proceedings of the 2021 International Workshop on Constraint Programming and Decision Making CoProD’2021, Szeged, Hungary, September 12, (2021) 14. Zadeh, L.A.: Fuzzy sets. Inform. Control 8, 338–353 (1965)
What is a Reasonable Way to Make Predictions? Leonardo Orea Amador and Vladik Kreinovich
Abstract Predictions are usually based on what is called laws of nature: many times, we observe the same relation between the states at different moments of time, and we conclude that the same relation will occur in the future. The more times the relation repeats, the more confident we are that the same phenomenon will be repeated again. This is how Newton’s laws and other laws came into being. This is what is called inductive reasoning. However, there are other reasonable approaches. For example, assume that a person speeds and is not caught. This may be repeated two times, three times—but here, the more times this phenomenon is repeated, the more confident we become that next time, he/she will be caught. Let us call this anti-inductive reasoning. So which of the two approaches shall we use? This is an example of a question that we study in this paper.
1 Formulation of the Problem 1.1 Making Predictions is Important One of the main objectives of science is to predict what will happen in the future. Another important objective is to make the future more beneficial. This objective also requires predicting how different strategies will affect the future of the world. Prediction is one of the main objectives of science. So in long run, this is one of the main objectives of all the tools that science uses—including AI tools. So, to make these tools more efficient, it is important to understand: • how we make predictions, and • how we should make predictions. L. O. Amador · V. Kreinovich (B) Department of Computer Science, University of Texas at El Paso, 500 W. University, El Paso, Texas 79968, USA e-mail: [email protected] L. O. Amador e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 O. Castillo and P. Melin (eds.), New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, Studies in Computational Intelligence 1050, https://doi.org/10.1007/978-3-031-08266-5_30
475
476
L. O. Amador and V. Kreinovich
1.2 At First Glance, the Answer to These Questions is Straightforward Predictions are usually based on what is called laws of nature: • many times, we observe the same relation between the states at different moments of time, and • we conclude that the same relation will occur in the future. The more times the relation repeats, the more confident we are that the phenomenon will repeat again. This is how Newton’s laws and other laws came into being. This is what is called inductive reasoning; see, e.g., [1–3]. Comment. It is important not to confuse: • inductive reasoning—where we make a prediction based on a finite number of observations, and • mathematical induction, when we prove a statement ∀n P(n) by proving that P(0) is true and for every n, P(n) implies P(n + 1).
1.3 Situation is Not so Simple However, there are other reasonable approaches. For example, assume that a person speeds and is not caught. This may be repeated two times, three times. Here: • the more times this phenomenon is repeated, • the more confident we become that next time, he will be caught. This is why gamblers continue to gamble after losing. This is why entrepreneurs try again after failing several times. Let us call this anti-inductive reasoning; see, e.g., [3]. So which of the two approaches shall we use?
1.4 This Should be Decided by an Experiment We are accustomed to the fact that everything is decided by experiments. So, a natural way to select one of these two approaches is to compare them with the experimental data. But how do we decide, based on this data, which approach is better? For this decision: • a scientist will use inductive reasoning, while • another person will use anti-inductive reasoning. What will happen? This is one of the questions that we analyze in this paper.
What is a Reasonable Way to Make Predictions?
477
2 Analysis of the Problem on a Simplified Case 2.1 Simplified Case: A Description For simplicity, let us fix some natural number n, and consider the following simplified versions of the two approaches. • The first approach is that: – if something repeats n times (or its negation repeats n times), – we predict that this will happen the next time. • The second approach is that if something happens n times, the opposite will happen the next time.
2.2 Case Study Suppose that we have a phenomenon—e.g., Sun rising in the morning—that holds for 2n + 1 moments of time. In the first approach: • After the first n cases, we predict that the Sun will rise again—and it does. • We do a similar prediction for moment n + 2—and again, our prediction turns out to be correct. • For n moments in a row, predictions based on our reasoning are correct. • So, by applying inductive reasoning to these n cases, we conclude that inductive reasoning is a valid approach. But what if we use the second approach? • • • •
We predict that at the moment n + 1, the Sun will not rise—but it rises! This repeats n times, so n times, are predictions are wrong. We are then applying the same anti-induction to select the approach. Since our approach failed n times, we conclude that next time, it will work.
2.3 Surprising Conclusion So, in this case, no matter how many experiments we perform: • the proponents of both approaches will remain convinced • that their approach will work the next time around.
478
L. O. Amador and V. Kreinovich
2.4 What We Discuss in this Paper In this paper, we describe this situation in precise terms. This is still the beginning of this research. We will present more challenges than results. However, we will formulate these important challenges in precise terms, so these challenges become: • not just vague philosophical ideas, • but precisely formulated mathematical questions.
3 General Case 3.1 Let us Describe the Situation in Precise Terms We want to predict whether some property P will hold. To make this prediction, we used previous observations. Let us assume that we observed similar situations N times. For each i from 1 to N , we define si as follows: • if the property P was satisfied in the i-th observation, we take si = T ; • if the property P was not satisfied in the i-th observation, we take si = F; • if it is unknown whether P was satisfied, we take si = U. The set of all such sequences will be denoted by {T , F, U }∗ . Prediction rule M(s) means that, for each such sequence s, we predict: • either that P will be satisfied at the next moment of time: M(s) = T , • or that P will not be satisfied: M(s) = F, • or we do not have enough data for predictions: M(s) = U. So, a prediction rule M is a mapping M : {T , F, U }∗ → {T , F, U }.
3.2 Prediction Rule Must be Fair A priori, we have no reason to prefer P or its negation ¬P. So, we should make the same prediction whether we consider P or ¬P. We call this absence of a priori preference fairness. So, for observations of ¬P, we should get the same conclusion. Let us describe fairness in precise terms. • For each observation s = (s1 , . . . , sn ) of P, the observation of ¬P is ¬s = def (¬s1 , . . . , ¬sn ), where ¬u = u. • Similarly, prediction M(s) for P means predicting ¬M(s) for ¬P. Thus, fairness means that M(¬s) = ¬M(s) for all s.
What is a Reasonable Way to Make Predictions?
479
3.3 Meta-Analysis: Using Prediction Rule to Select Prediction Rule Let s = (s1 , . . . , s N ) be a sequence of observations. • • • •
Based on s1 , we form a prediction M(s1 ) for s2 . Based on (s1 , s2 ), we form a prediction M(s1 , s2 ) for s3 . In general, based on (s1 , . . . , si ), we form a prediction M(s1 , . . . , si ) for si+1 . Finally, based on (s1 , . . . , s N −1 ), we form a prediction M(s1 , . . . , s N −1 ) for s N .
For each i, we check whether predictions were correct: • if si+1 and/or M(s1 , . . . , si ) are unknown, we take ci = U ; • otherwise, we take ci = T if M(s1 , . . . , si ) = si+1 and ci = F if M(s1 , . . . , si ) = si+1 . This way, we get a sequence c = (c1 , . . . , c N −1 ) of truth values describing how well prediction rule M worked. We can now apply the rule M to the sequence c to predict whether M will work the next time. • If M(c) = F, this means that our own induction rule requires us to reject this rule. So, if M(c) = F, we say that M is inconsistent with the observations s. • Otherwise, we say that M is consistent with s.
3.4 Induction Versus Anti-induction Revisited Induction rule M I means that: • if the last n elements of s are T , then M I (s) = T ; • if the last n elements of s are F, we take M I (s) = F; • otherwise, M I (s) = U . Anti-induction rule M A means that: • if the last n elements of s are T , then M A (s) = F; • if the last n elements of s are false (F), we take M A (s) = T ; • otherwise, we take M A (s) = U . Here, for all s, we have M A (s) = ¬M I (s), i.e., M A = ¬M I .
3.5 General Result We had an example of a sequence s with which both M I and M A were consistent. What happens in the general case? Was it a weird example or is it a general phenomenon?
480
L. O. Amador and V. Kreinovich
We prove that this is a general phenomenon. Proposition 1 For each fair prediction rule M and for each sequence s: M is consistent with s ⇔ ¬M is consistent with s. Proof Let us denote sequences c corresponding to M and ¬M by c M and c¬M . Here, ciM = T if M(s1 , . . . , si ) = si+1 . This is exactly when we have ¬M(s1 , . . . , si ) = si+1 , i.e., when ci¬M = F. Thus, ci¬M = ¬ciM . Based on the sequence c¬M , the rule ¬M will predict ¬M c¬M = ¬M ¬c M . Since M is fair, we have M ¬c M = ¬M c M . Thus: ¬M c¬M = ¬M ¬c M = ¬¬M c M = M c M . So, indeed, M and ¬M are consistent or inconsistent simultaneously. The proposition is proven.
4 Rules Must be Falsifiable 4.1 An Example Where a Reasonable Prediction Rule is Inconsistent The fact that both induction and anti-induction rules are consistent with the same observations makes one think that maybe all reasonable rules are always consistent with all the observations. That would be bad, because if something cannot be disproved by experiment, this does not sound very scientific. Let us show that this is not the case. Indeed, a natural rule Mm is to go by majority: • if in s, we had more T than F, we predict T ; • if in s, we had more F than T , we predict F; • otherwise, we predict U . What happens if we apply this rule to a periodic sequence s = (T , F, T , F, . . .) for which: • we have s2k = F for all k, and • we have s2k+1 = T for all k. Here: • For even i = 2k, we have equally many T s and Fs, so M(s1 , . . . , si ) = U , thus ci = U .
What is a Reasonable Way to Make Predictions?
481
• For odd i = 2k + 1, we have more T s than Fs, so M(s1 , . . . , si ) = T . For i = 2k + 1, we have si+1 = s2k+2 = F, so c2k+1 = F. So, the sequence ci has only Fs and U s. Thus M(c) = F. In other words, the majority rule is inconsistent with this sequence.
4.2 A Problem with Simple Induction Let us show that, somewhat unexpectedly, the simple induction M I —as described in the previous sections—cannot be falsified and is, thus, not a very scientific approach. Proposition 2 For n > 1, no sequence s is inconsistent with the prediction rule M I Proof The only way to show that the observation sequence s is inconsistent with M I is when the corresponding sequence c contains n false values in a row, i.e., if n times in a row, the prediction rule M I did not work: c N = . . . = c N +n−1 = F. When it did not work the first time, this means that we have s N +1 = M I (s N −n+1 , . . . , s N ). By definition of the simple induction rule M I , this can happen in two possible situations: • either we have s N +1 = F and s N −n+1 = . . . = s N = T , • or we have s N +1 = T and s N −n+1 = . . . = s N = F. Let us first consider the first situation. In this case, at the moment N + 1, the last n values of the sequence s are: • several (namely, n − 1) T -values s N −n+2 = . . . = s N = T , • followed by an F-value s N +1 = F. In this situation, the simple induction rule M I does not predict anything at all, so we have c N +1 = U (“unknown”), and we cannot have c N +1 = F. Similarly, in the second situation, the last n values of the sequence s are: • several (namely, n − 1) F-values s N −n+2 = . . . = s N = T , • followed by a T -value s N +1 = F. In this situation, the simple induction rule M I also does not predict anything at all, so we have c N +1 = U (“unknown”), and we cannot have c N +1 = F. In both situations, we cannot have c N = c N +1 = . . . = F and thus, the simple prediction rule indeed cannot be falsified. The proposition is proven. The situation is not better for the simple anti-induction principle M A either: Corollary 1 For n > 1, no sequence s is inconsistent with M A .
482
L. O. Amador and V. Kreinovich
Proof This immediately follows from Proposition 2 if we take into account Proposition 1, according to which any sequence s is consistent with the prediction rule M I if and only if it is consistent with its negation M A = ¬M I .
5 Conclusions and Future Work 5.1 Predictions: Naive Idea How do we make predictions? At first glance, the situation sounds straightforward: if we observe some phenomenon sufficiently many (n) times, then we naturally conclude that the same phenomenon (e.g., rising of the sun) will happen again the next time. This argument is known as inductive reasoning.
5.2 What We Show: Situation is More Complex That it May Appear However, in principle, we can consider the opposite rule: if something happens sufficiently many times, then we expect that the opposite will happen the next time. For example, if someone was speeding many times and never got taught, we expect that he/she will eventually get caught by the police. So, which principle should we use for prediction: inductive reasoning or the abovedescribed “anti-inductive” reasoning? A natural idea is to use the same principle to select the prediction principle itself. For example, if we believe in inductive reasoning, then if this principle led to good predictions n times, we expect it to be working the next time as well. Similarly, if we believe in anti-inductive reasoning, then: • if this principle does not lead to good predictions n times in a row, we expect it to be working next time—and, • vice versa, if anti-indiction reasoning led to good predictions n times in a row, we expect this principle to fail next time. This seems to provide an experimental way to test which principle better suits the observations. Somewhat unexpectedly, we show that it is not possible to experimentally distinguish between the two principle: each sequence of observations which is consistent with induction is also consistent with anti-induction, and vice versa. Moreover, we show that neither of these two principle can be falsified at all—so both principles are dubious from the scientific viewpoint, according to which scientific laws and techniques must be, in principle, falsifiable by experiments.
What is a Reasonable Way to Make Predictions?
483
5.3 Future Work The above results are just the beginning. We need to analyze more realistic formulations of the induction rule this way, as well as other possible rules. We need some experiments: what will happen if we apply different rules to different sequences of observations? From the more theoretical viewpoint: can we algorithmically (and feasibly algorithmically) check whether a given prediction rule is falsifiable? We hope that our work will inspire others analyze to these important methodological questions. Acknowledgements This work was supported in part by the National Science Foundation grants: • 1623190 (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science); • HRD-1834620 and HRD-2034030 (CAHSI Includes). It was also supported by the program of the development of the Scientific-Educational Mathematical Center of Volga Federal District No. 075-02-2020-1478. The authors are thankful to all the participants of the International Seminar on Computational Intelligence ISCI’2021 (Tijuana, Mexico, August 17–19, 2021), especially to Oscar Castillo and Patricia Melin, for valuable discussions.
References 1. Holland, J.H., Holyoak, K.J., Nisbett, R.E., Thagard, P.R.: Induction: Processes of Inference, Learning, and Discovery. MIT Press, Cambridge (1989) 2. Holyoak, K., Morrison, R.: The Cambridge Handbook of Thinking and Reasoning. Cambridge University Press, New York (2005) 3. Kosheleva, O.M., Kreinovich, V.Y., Zakharevich, M.I.: Why induction, not antiinduction? Notices Am. Math. Soc. 26(7), A-619 (1979)