335 29 38MB
English Pages XXXI, 480 [500] Year 2021
Advances in Intelligent Systems and Computing 1222
Valentina Emilia Balas Lakhmi C. Jain Marius Mircea Balas Shahnaz N. Shahbazova Editors
Soft Computing Applications Proceedings of the 8th International Workshop Soft Computing Applications (SOFA 2018), Vol. II
Advances in Intelligent Systems and Computing Volume 1222
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/11156
Valentina Emilia Balas Lakhmi C. Jain Marius Mircea Balas Shahnaz N. Shahbazova •
•
•
Editors
Soft Computing Applications Proceedings of the 8th International Workshop Soft Computing Applications (SOFA 2018), Vol. II
123
Editors Valentina Emilia Balas “Aurel Vlaicu” University of Arad, Faculty of Engineering Arad, Romania Marius Mircea Balas Faculty of Engineering “Aurel Vlaicu” University of Arad Arad, Arad, Romania
Lakhmi C. Jain Faculty of Engineering and Information Technology University of Technology Sydney, Centre for Artificial Intelligence Sydney, Australia Faculty of Science Liverpool Hope University Liverpool, UK KES International Shoreham-by-Sea, UK Shahnaz N. Shahbazova Department of Information Technology and Programming Azerbaijan Technical University Baku, Azerbaijan
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-52189-9 ISBN 978-3-030-52190-5 (eBook) https://doi.org/10.1007/978-3-030-52190-5 © Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
These two volumes constitute the Proceedings of the 8th International Workshop on Soft Computing Applications, or SOFA 2018, held on September 13–15, 2018, in Arad, Romania. This edition was organized by Aurel Vlaicu University of Arad, Romania, in conjunction with Institute of Computer Science, Iasi Branch of the Romanian Academy, IEEE Romanian Section, Romanian Society of Control Engineering and Technical Informatics (SRAIT)-Arad Section, General Association of Engineers in Romania-Arad Section and BTM Resources Arad. Soft computing concept was introduced by Lotfi Zadeh in 1991 and serves to highlight the emergence of computing methodologies in which the accent is on exploiting the tolerance for imprecision and uncertainty to achieve tractability, robustness and low solution cost. Soft computing facilitates the use of fuzzy logic, neurocomputing, evolutionary computing and probabilistic computing in combination, leading to the concept of hybrid intelligent systems. Combining of such intelligent systems tools and a large number of applications can show the great potential of soft computing in all domains. The volumes cover a broad spectrum of soft computing techniques, and theoretical and practical applications find solutions for industrial world, economic and medical problems. The conference papers included in these proceedings, published post conference, were grouped into the following area of research: • Soft computing and conventional techniques in power engineering methods and applications in electrical engineering • Modeling, Algorithms. Optimization, reliability and applications • Machine learning, NLP and applications • Business process management • Knowledge-based technologies for Web applications, cloud computing, security algorithms and computer networks, smart city • Fuzzy applications, theory, expert systems, fuzzy and control • Biomedical applications • Image, text and signal processing
v
vi
Preface
• Computational intelligence techniques, machine learning and optimization methods in recent applications • Methods and applications in engineering and games • Wireless sensor networks, cloud computing, IoT In SOFA 2018, we had five eminent keynote speakers: Professor Michio Sugeno (Japan), Professor Oscar Castillo (Mexico), Academician Florin G. Filip (Romania), Professor Valeriu Beiu (Romania) and Professor Jeng-Shyang Pan (China). Their summary talks are included in this book. We especially thank the honorary chair of SOFA 2018, Prof. Michio Sugeno, who encouraged and motivated us, like to all the other SOFA editions. A special keynote was presented by Professor Shahnaz Shahbazova (Azerbaijan)–“In memoriam Lotfi A. Zadeh”, dedicated to the renowned founder of fuzzy set theory and in the same time the honorary chair of SOFA conferences, who passed away in September 2017. In fact, the whole conference was dedicated to the memory of Professor Zadeh. We all remembered in our presentations and discussions his great personality and how he influenced our lives. We are thankful to all the authors who have submitted papers for keeping the quality of the SOFA 2018 conference at high levels. The editors of this book would like to acknowledge all the authors for their contributions and the reviewers. We have received an invaluable help from the members of the International Program Committee and the Chairs responsible for different aspects of the workshop. We also appreciate the role of special sessions' organizers. Thanks to all of them we had been able to collect many papers of interesting topics, and during the workshop, we had remarkably interesting presentations and stimulating discussions. For their help with organizational issues of all SOFA editions, we express our thanks to TRIVENT Company, Mónika Jetzin and Teodora Artimon for having customized the Software Conference Manager, registration of conference participants and all local arrangements. Our special thanks go to Janus Kacprzyk (Editor-in-Chief, Springer, Advances in Intelligent Systems and Computing Series) for the opportunity to organize this guest edited volume. We are grateful to Springer, especially to Dr. Thomas Ditzinger (Senior Editor, Applied Sciences & Engineering Springer-Verlag), for the excellent collaboration, patience and help during the evolvement of this volume. We hope that the volumes will provide useful information to professors, researchers and graduated students in the area of soft computing techniques and applications, and all will find this collection of papers inspiring, informative and useful. We also hope to see you at a future SOFA event. Valentina Emilia Balas Lakhmi C. Jain Marius Mircea Balas Shahnaz N. Shahbazova
Invited Speakers
DSS, Classifications, Trends and Enabling Modern Information and Communication Technologies Florin Gheorghe Filip The Romanian Academy and INCE ffi[email protected]
Abstract. A decision support system (DSS) can be defined as an anthropocentric and evolving information system which is meant to implement the functions of a human support system that would otherwise be necessary to help the decision-maker to overcome his/her limits and constraints he/she may encounter when trying to solve complex and complicated decision problems that count (Filip, 2008). The purpose of the talk is to present the impact of modern Information and Communication Technologies (I&CT) on DSS domain with emphasis on the systems that support collaborative decision-making activities. Consequently, the talk is composed of three parts as it follows. In the first part, several basic aspects concerning decisions and decision makers are reviewed in the context of modern business models and process and management automation solutions, including Intelligent Process Automation (IPA), which is meant to liberate the human of “robot-type” operations. The evolution of models of human–automation device systems, from “either/or automation” to “shared and cooperative” control solutions (Flemisch et al. 2012), receives particular attention together with the explanation of causes for wrong decisions (Power, Mitra, 2016). The second part of the talk addresses several aspects concerning the DSS domain such as basic concepts, classifications and evolutions. Several classifications made in accordance with attributes such as purpose, dominant technology, number of users and real-time usage in crisis situations are presented. Collaborative systems (Nof, 2017; Filip at all, 2017) and “mixt knowledge” (Filip, 2008) solutions are described in detail. In the third part of the talk, several I&C technologies, such as big data (Shi, 2015), cloud and mobile computing, and cognitive systems (High, 2012; Tecuci et al. 2016), are presented as viewed from the perspective of their relevance to modern computer-supported collaborative decision making. Two application examples are presented with a view to illustrating the usage of big data, and cloud computing and service-oriented architectures, respectively. A list of concerns and open problems regarding the impact of new I&C technologies on human being’s personal and professional life is eventually evoked. Selected References Filip F.G. (2008) Decision support and control for large-scale systems. Annual Reviews in Control, 32(1), p. 62–70. Filip F G, Zamfirescu C B, Ciurea C (2017) Computer ix
x
F. G. Filip
Supported Collaborative Decision-Making. Springer Flemisch F, Heesen M, Hesse T. et al. (2012) Towards a dynamic balance between humans and automation: authority, ability, responsibility and control in shared and cooperative control situations. Cognition, Technology & Work,14(1), p. 3–8. High R (2012) The Era of Cognitive Systems: An Inside Look at IBM Watson and How It Works Nof S Y (2017) Collaborative control theory and decision support systems. Computer Science Journal of Moldova, 25 (2), 15–144 Power D J, Mitra A (2016) Reducing “Bad” Strategic Business Decisions. Drake Management Review, 5 (1/2), p. 15–21Shi (2015) Challenges to Engineering Management in the Big Data Era. Frontiers of Engineering Management, 293–303 TecuciG, Marcu D, Boicu M, Schum DA (2016) Knowledge Engineering: Building Cognitive Assistants for Evidence-based Reasoning. Cambridge University Press.
Florin Gheorghe Filip
Brief Bio Sketch: Florin Gheorghe Filip was born on July 25, 1947. He became corresponding member of the Romanian Academy (in 1991, when he was only 44 years old), and at 52 years old (1999) becomes full member in the highest cultural and scientific forum of Romania. For 10 years, during 2000–2010, he was Vice president of the Romanian Academy (the National Academy of Sciences) and in 2010, he was elected President of the 14th Section “Information Science and Technology” of the Academy (re-elected in 2015). He was the managing director of National Institute for R&D in Informatics-ICI (1991–1997). He has been a part-time researcher and member of the Scientific Council of INCE (the Nat. Institute for Economic Researches) of the Academy since 2004. His main scientific interests are optimization and control of complex systems, decision support systems, technology management and foresight, and IT applications in the cultural sector. He authored/coauthored over 300 papers published in international journals (IFAC J Automatica, IFAC J Control Engineering Practice, Annual Reviews in Control, Computers in Industry, System Analysis Modeling Simulation, Large Scale Systems, Technological and Economical Development of Economy and so on) and contributed volumes printed by international publishing houses (Pergamon Press, North Holland, Elsevier, Kluwer, Chapmann & Hall, etc). He is also the author/coauthor of thirteen monographs (published by Editura Tehnica,
DSS, Classifications, Trends and Enabling Modern Information
xi
Bucuresti, Hermes-Lavoisier Paris, J. Wiley & Sons, London, Springer) and editor/coeditor of 25 volumes of contributions (published by Editura Academiei Romane, Elsevier Science, Institute of Physics. Melville, USA, IEEE Computer Society, Los Alamitos, USA). He was an IPC member of more than 50 international conferences held in Europe, USA, South America, Asia and Africa and gave plenary papers at scientific conferences held in Brazil, Chile, China, France, Germany, Lithuania, Poland, Portugal, Republic of Moldova, Spain, Sweden, Tunisia and UK. F.G Filip was the chairman of IFAC (International Federation of Automatic Control) Technical Committee “Large Scale Complex Systems” (1991– 1997). He is Founder and Editor-in-Chief of Studies in Informatics and Control journal (1991), cofounder and Editor-in-Chief of International Journal of Computers Communications & Control (2006). He has received Doctor Honoris Causa title from “Lucian Blaga” University of Sibiu (2000), “Valahia” University, Targoviste (2007), “Ovidius“ University, Constanta (2007), Ecolle Centrale de Lille (France) (2007), Technical University “Traian Vuia.” Timisoara (2009), “Agora” University of Oradea (2012), Academy of Economic Studies, Bucharest (2014), University of Pitesti (2017), and “Petrol-Gaz “University of Ploiesti (2017). He is a honorary member of the Academy of Sciences of Republic of Moldova (2007) and Romanian Academy of Technical Sciences (2007). More details can be found at: http://www. academiaromana.ro/sectii/sectia14_informatica/sti_ FFilip.htm and http://univagora.ro/jour/index.php/ ijccc/article/view/2960/1125.
Distorted Statistics based on Choquet Calculus Michio Sugeno Tokyo Institute of Technology [email protected]
Abstract. In this study, we discuss statistics with distorted probabilities by applying Choquet calculus which we call “distorted statistics.” To deal with distorted statistics, we consider distorted probability space on the non-negative real line. A (non-additive) distorted probability is derived from an ordinary additive probability by the monotone transformation with a generator. First, we explore some properties of Choquet integrals of non-negative, continuous and differentiable functions with respect to distorted probabilities. Next, we calculate elementary statistics such as the distorted mean and variance of a random variable for exponential and Gamma distributions. In addition, we introduce the concept of density function for distorted exponential distribution. Further, we deal with Choquet calculus of real-valued functions on the real line and explore their basic properties. Then, we consider distorted probability pace on the real line. We also calculate elementary distorted statistics for uniform and normal distributions. Finally, we compare distorted statistics with conventional skew statistics.
Michio Sugeno
Biography: After graduating from the Department of Physics, the University of Tokyo, he worked at a company for three years. Then, he served the Tokyo Institute of Technology as Research Associate, Associate Professor and Professor from 1965 to 2000. After retiring from the Tokyo Institute of Technology, he worked as Laboratory Head at the Brain Science Institute, RIKEN from 2000 to 2005 and then as Distinguished Visiting Professor at Doshisha University from 2005 to 2010. Finally, he worked as Emeritus Researcher at the European Centre for Soft Computing in Spain from 2010 to 2015. He is Emeritus Professor at the Tokyo Institute of Technology. He was
xiii
xiv
M. Sugeno
President of the Japan Society for Fuzzy Theory and Systems from 1991 to 1993, and also President of the International Fuzzy Systems Association from 1997 to 1999. He is the first recipient of the IEEE Pioneer Award in Fuzzy Systems with Zadeh in 2000. He also received the 2010 IEEE Frank Rosenblatt Award and Kampéde Feriét Award in 2012.
Overview of QUasi-Affine TRansformation Evolutionary (QUATRE) Algorithm Jeng-Shyang Pan Fujian University of Technology, Harbin Institute of Technology [email protected]
Abstract. QUasi-Affine TRansformation Evolutionary (QUATRE) algorithm is a swarm-based algorithm and uses quasi-affine transformation approach for evolution. This talk discusses the relation between QUATRE algorithm and other kinds of swarm-based algorithms including particle swarm optimization (PSO) variants and differential evolution (DE) variants. Several QUATRE variants are described in this talk. Comparisons and contrasts are made among the proposed QUATRE algorithm, state-of-the-art PSO variants and DE variants under several test functions. Experimental results show that the usefulness of the QUATRE algorithm is not only on real-parameter optimization but also on large-scale optimization. Especially QUATRE algorithm can reduce the time complexity and has the excellent performance not only on uni-modal functions, but also on multi-modal functions even on higher-dimensional optimization problems.
Jeng-Shyang Pan
Biography: Jeng-Shyang Pan, Assistant President, Fujian University of Technology Professor, Harbin Institute of Technology Jeng-Shyang Pan, received the B.S. degree in Electronic Engineering from the National Taiwan University of Science and Technology in 1986, the M.S. degree in Communication Engineering from the National Chiao Tung University, Taiwan in 1988, and the PhD degree in Electrical Engineering from the University of Edinburgh, UK, in 1996. Currently, he is Assistant President and Dean of the College of Information Science and Engineering in Fujian University of Technology. He is also Professor in the Harbin Institute of Technology. He has published more than 600 papers in which 250 papers are indexed by SCI,
xv
xvi
J.-S. Pan
the H-Index is 41, and the total cited times are more than 7900. He is IET Fellow, UK, and has been Vice Chair of IEEE Tainan Section. He was awarded Gold Prize in the International Micro Mechanisms Contest held in Tokyo, Japan, in 2010. He was also awarded Gold Medal in the Pittsburgh Invention & New Product Exposition (INPEX) in 2010; Gold Medal in the International Exhibition of Geneva Inventions in 2011; and Gold Medal of the IENA, International “Ideas–Inventions–New products”, Nuremberg, Germany. He was offered Thousand Talent Program in China in 2010. He is on the editorial board of Journal of Information Hiding and Multimedia Signal Processing, and Chinese Journal of Electronics. His current research interests include soft computing, robot vision and big data mining.
Nature-Inspired Optimization of Type-2 Fuzzy Logic Controllers Oscar Castillo Tijuana Institute of Technology Tijuana, Mexico [email protected]
Abstract. The design of type-2 fuzzy logic systems is a complex task, and in general, achieving an optimal configuration of structure and parameters is time-consuming and rarely found in practice. For this reason, the use of nature-inspired meta-heuristics offers a good hybrid solution to find near-optimal designs of type-2 fuzzy logic systems in real-world applications. Type-2 fuzzy control offers a real challenge because the problems in this area require very efficient and accurate solutions; in particular, this is the case for robotic applications. In this talk, we present a general scheme for optimizing type-2 fuzzy controllers with nature-inspired optimization techniques, like ant colony optimization, the chemical reaction algorithm, bee colony optimization and others.
Oscar Castillo
Biography: Oscar Castillo holds the Doctor in Science degree (Doctor Habilitatus) in Computer Science from the Polish Academy of Sciences (with the Dissertation “Soft Computing and Fractal Theory for Intelligent Manufacturing”). He is Professor of computer science in the Graduate Division, Tijuana Institute of Technology, Tijuana, Mexico. In addition, he is serving as Research Director of Computer Science and Head of the research group on Hybrid Fuzzy Intelligent Systems. Currently, he is President of HAFSA (Hispanic American Fuzzy Systems Association) and Past President of IFSA (International Fuzzy Systems Association). Prof. Castillo is also Chair of the Mexican Chapter of the Computational Intelligence Society (IEEE). He also belongs to the Technical Committee on Fuzzy Systems of IEEE and to the Task Force on “Extensions to Type-1 Fuzzy Systems.” He is
xvii
xviii
O. Castillo
also a member of NAFIPS, IFSA and IEEE. He belongs to the Mexican Research System (SNI Level 3). His research interests are in type-2 fuzzy logic, fuzzy control, neuro-fuzzy and genetic-fuzzy hybrid approaches. He has published over 300 journal papers, 7 authored books, 30 edited books, 200 papers in conference proceedings and more than 300 chapters in edited books, in total more 740 publications according to Scopus and more than 840 according to Research Gate. He has been Guest Editor of several successful special issues in the past, like in the following journals: Applied Soft Computing, Intelligent Systems, Information Sciences, Non-Linear Studies, Fuzzy Sets and Systems, JAMRIS and Engineering Letters. He is currently Associate Editor of the Information Sciences Journal, Applied Soft Computing Journal, Granular Computing Journal and the IEEE Transactions on Fuzzy Systems. Finally, he has been elected IFSA Fellow and MICAI Fellow member last year. He has been recognized as Highly Cited Researcher in 2017 by Clarivate Analytics because of having multiple highly cited papers in Web of Science.
Seeing Is Believing “It is very easy to answer many of these fundamental biological questions; you just look at the thing!” Richard P. Feynman, “There’s Plenty of Room at the Bottom,” Caltech, December 29, 1959 Valeriu Beiu Aurel Vlaicu University of Arad, Romania [email protected]
Abstract. This presentation is geared toward the latest developments in imaging platforms that are able to tackle biological samples. Visualizing living cells, single molecules and even atoms are crucially important, but unfortunately excruciatingly difficult. Still, recent progress reveals that a wide variety of novel imaging techniques have reached maturity. We will recap here the principles behind techniques that allow imaging beyond the diffraction limit and highlight both historical and fresh advances in the field of neuroscience (as a result of such imaging technologies). As an example, single-particle tracking is one of several tools able to study single molecules inside cells and reveal the dynamics of biological processes (receptor trafficking, signaling and cargo transport). Historically, the first venture outside classical optics was represented by X-ray and electron-based techniques. Out of these, electron microscopy allows higher resolution by far. In time, this has diverged into transmission electron microscopy (TEM), scanning electron microscopy (SEM), reflection electron microscopy (REM) and scanning transmission electron microscopy (STEM), while lately these have started to merge with digital holography (scanning transmission electron holography, atomic-resolution holography and low-energy electron holography). Electron microscopy allows resolutions down to 40pm, while it is not trivial to use such techniques on biological samples. The second departure from classical optics was represented by scanning probe techniques like atomic force microscope (AFM), scanning tunneling microscope (STM), photonic force microscope (PFM) and recurrence tracking microscope (RTM). All of these rely on the physical contact of a solid probe tip which scans the surface of an object (which is supposed to be quite flat). The third attempt has come full circle and is represented by super-resolution microscopy which won the Nobel Prize in 2014.
xix
xx
V. Beiu
The presentation will start from basic principles, emphasizing the advantages and disadvantages of different bio-imaging techniques. The development of super-resolution microscopy techniques in the 1990’s and 2000’s (https://en.wikipedia.org/wiki/Superresolution_microscopy) has allowed researchers to image fluorescent molecules at unprecedentedly small scales. This significant boost was properly acknowledged by replacing the term “microscopy” with “nanoscopy” which was coined by Stefan Walter Hell in 2007. It distinguishes novel diffraction-unlimited techniques from conventional approaches, e.g., confocal or wide-field microscopy. An incomplete list includes (among others): binding-activated localization microscopy (BALM), cryogenic optical localization in 3D (COLD), fluctuation-assisted BALM (fBALM), fluorescence photo-activation localization microscopy (FPALM), ground-state depletion microscopy (GSDIM), Light sheet fluorescence microscopy (LSFM), photo-activated localization microscopy (PALM), structured illumination microscopy (SIM), including both linear and nonlinear, stimulated emission depletion (STED), stochastic optical reconstruction microscopy (STORM), single molecule localization microscopy (SMLM), scanning near-field microscopy (SNOM) and total internal reflection fluorescence (TIRF). Obviously, with such improvements in resolving power, new avenues for studying synapses and neurons more generally are being opened, and a few of the latest experiments that highlight unique capabilities will be enumerated, briefly reviewed and compared.
Valeriu Beiu
Biography: VALERIU BEIU (S’92–M’95–SM’96) received the MSc in computer engineering from the University “Politehnica” Bucharest in 1980, and the PhD summa cum laude in electrical engineering from the Katholieke Universiteit Leuven in 1994. Since graduating in 1980, he has been with the Research Institute for Computer Techniques, University “Politehnica” Bucharest, Katholieke Universiteit Leuven, King’s College London, Los Alamos National Laboratory, Rose Research, Washington State University, United Arab Emirates University, and currently is with “Aurel Vlaicu” University of Arad. His research interests have constantly been on biological-inspired nano-circuits and brain-inspired nano-architectures for VLSI-efficient designs
Seeing Is Believing
xxi
(ultra-low power and highly reliable), being funded at over US$ 51M. On such topics, he gave over 200 invited talks, organized over 120 conferences, chaired over 60 sessions, has edited two books and has authored over 230 journal/conference articles (30 invited), as well as 8 chapters and 11 patents. Dr. Beiu has received five fellowships and seven best paper awards, and is a senior member of the IEEE as well as a member of: ACM, INNS, ENNS and MCFA. He was a member of the SRC-NNI Working Group on Novel Nano-architectures, the IEEE CS Task Force on Nano-architectures and the IEEE Emerging Technologies Group on Nanoscale Communications, and has been an Associate Editor of the IEEE Transactions on Neural Networks (2005–2008), of the IEEE Transactions for Very Large Scale Integration Systems (2011–2015) and of the Nano Communication Networks (2010–2015).
Contents
Biomedical Applications An Overview on Computer Processing for Endoscopy and Colonoscopy Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mihaela Luca, Tudor Barbu, and Adrian Ciobanu
3
Fuzzy System for Classification of Nocturnal Blood Pressure Profile and Its Optimization with the Crow Search Algorithm . . . . . . . . . . . . . Ivette Miramontes, Patricia Melin, and German Prado-Arechiga
23
A Survey of Modern Gene Expression Based Techniques for Cancer Detection and Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hafiz ur Rahman, Muhammad Arif, Sadam Al-Azani, Emad Ramadan, Guojun Wang, Jianer Chen, Teodora Olariu, and Iustin Olariu Recognition of Skin Diseases Using Curvelet Transforms and Law’s Texture Energy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jyotismita Chaki, Nilanjan Dey, V. Rajinikanth, Amira S. Ashour, and Fuqian Shi
35
51
Fuzzy Applications, Theory, Expert Systems and Fuzzy and Control Intelligent Roof-Top Greenhouse Buildings . . . . . . . . . . . . . . . . . . . . . . Marius Mircea Balas, Mihaela Popa, Emanuela Valentina Muller, Daniel Alexuta, and Luana Muresan
65
Human-Plant Symbiosis by Integrated Roof-Top Greenhouses . . . . . . . Marius M. Balas, Ramona Lile, Lucian Copolovici, Anca Dicu, and Kristijan Cincar
76
xxiii
xxiv
Contents
Fuzzy Scoring Theory Applied to Team-Peer Assessment: Additive vs. Multiplicative Scoring Models on the Signed or Unsigned Unit Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Paul Hubert Vossen and Suraj Ajit
84
On Image Compression for Mobile Robots Using Feed-Forward Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Viorel Nicolau and Mihaela Andrei Image, Text and Signal Processing LeapGestureDB: A Public Leap Motion Database Applied for Dynamic Hand Gesture Recognition in Surgical Procedures . . . . . . 125 Safa Ameur, Anouar Ben Khalifa, and Med Salim Bouhlel Experiments on Phonetic Transcription of Audio Vowels . . . . . . . . . . . 139 Ioan Păvăloi and Anca Ignat Experiments on Iris Recognition Using Partially Occluded Images . . . . 153 Ioan Păvăloi and Anca Ignat Feature Extraction Techniques for Hyperspectral Images Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Asma Fejjari, Karim Saheb Ettabaa, and Ouajdi Korbaa Multi-neural Networks Object Identification . . . . . . . . . . . . . . . . . . . . . 189 Nicolas Park, Daniela López De Luise, Daniel Rivera, Leonardo M. Bustamante, Jude Hemanth, Teodora Olariu, and Iustin Olariu Development of a Testbed for Automatic Target Recognition and Object Change Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Gangavarapu Vigneswara Ihita and Vijay Rao Duddu Comparative Analysis of Various Image Splicing Algorithms . . . . . . . . 211 Hafiz ur Rhhman, Muhammad Arif, Anwar Ullah, Sadam Al-Azani, Valentina Emilia Balas, Oana Geman, Muhammad Jalal Khan, and Umar Islam Classification of Plants Leave Using Image Processing and Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Hashem Bagherinezhad, Marjan Kuchaki Rafsanjani, Valentina Emilia Balas, and Ioan E. Koles
Contents
xxv
Computational Intelligence Techniques, Machine Learning and Optimization Methods in Recent Applications A Comparative Study of Audio Encryption Analysis Using Dynamic AES and Standard AES Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Amandeep Singh, Praveen Agarwal, and Mehar Chand Feasibility Study of CSP Tower with Receiver Installed at Base . . . . . . 250 Chirag Rawat, Sarth Dubey, Dipankar Deb, and Ajit Kumar Parwani Bi-Level Optimization Using Improved Bacteria Foraging Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Gautam Mahapatra, Soumya Banerjee, and Ranjan Chattaraj Cubic Hesitant Fuzzy Heronian Mean Operators and Their Application in Multi Criteria Decision Making . . . . . . . . . . . . . . . . . . . 276 Faisal Mehmood, Khizar Hayat, Tahir Mahmood, and Muhammad Arif A Comparative Study of Fuzzy Logic Regression and ARIMA Models for Prediction of Gram Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Shafqat Iqbal, Chongqi Zhang, Muhammad Arif, Yining Wang, and Anca Mihaela Dicu Identification of Spatial Relationships in Arabic Handwritten Expressions Using Multiple Fusion Strategies . . . . . . . . . . . . . . . . . . . . 300 Ibtissem Hadj Ali and Mohamed Ali Mahjoub Information Retrieval in Restricted Domain for ChatterBots . . . . . . . . . 312 Daniela López De Luise, Andrés Pascal, Claudia Alvarez, Marcos Tournoud, Carlos Pankrac, and Juan Manuel Santa Cruz Application of Single-Valued Neutrosophic Power Maclaurin Symmetric Mean Operators in MADM . . . . . . . . . . . . . . . . . . . . . . . . . 328 Qaisar Khan, Tahir Mahmood, Khizar Hayat, Muhammad Arif, Valentina Emilia Balas, and Oana Geman Methods and Applications in Engineering and Games PI Plus Feed-Forward Control of Water Submersible Pump Specially Used in Ground Water Shortage Areas . . . . . . . . . . . . . . . . . . . . . . . . . 357 Anil Gojiya, Ravi Patel, and Dipankar Deb Slip Effects on Fe3O4-Nanoparticles in a Nanofluid Past a Nonlinear Stretching Surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 Anwar Shahid, Zhan Zhou, Muhammad Mubashir Bhatti, Muhammad Arif, and Muhammad Faizan Khan
xxvi
Contents
A Learners Experience with the Games Education in Software Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Muhammad Imran Tariq, Jorge Diaz-Martinez, Shariq Aziz Butt, Muhammad Adeel, Emiro De-la-Hoz-Franco, and Anca Mihaela Dicu Wireless Sensor Networks, Cloud Computing, IoT An Anomaly Detection System Based on Clustering and Fuzzy Set Theory in VANETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Marjan Kuchaki Rafsanjani, Hamideh Fatemidokht, Valentina Emilia Balas, and Ranbir Singh Batth A Framework for Artificial Intelligence Assisted Smart Agriculture Utilizing LoRaWAN Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . 408 Ala’ Khalifeh, Abdullah AlQammaz, Khalid A. Darabkh, Bashar Abu Sha’ar, and Omar Ghatasheh Multi-task Scheduling Algorithm Based on Self-adaptive Hybrid ICA–PSO Algorithm in Cloud Environment . . . . . . . . . . . . . . . 422 Hamed Tabrizchi, Marjan Kuchaki Rafsanjani, and Valentina Emilia Balas Fuzzy Graph Modelling of Anonymous Networks . . . . . . . . . . . . . . . . . 432 Vasisht Duddu, Debasis Samanta, and D. Vijay Rao A Backstepping Direct Power Control of Three Phase Pulse Width Modulated Rectifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Arezki Fekik, Ahmad Taher Azar, Hakim Denoun, Nashwa Ahmad Kamal, Mohamed Lamine Hamida, Dyhia Kais, and Karima Amara Portable Non-invasive Device for Measuring Saturated Oxygen of the Blood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 Oana Geman, Iuliana Chiuchisan, Valentina Balas, Guojun Wang, Muhammad Arif, Haroon Elahi, and Peng Tao “Smart” Footwear for the Visually-Impaired People Based on Arduino Platform and Ultrasonic Sensors . . . . . . . . . . . . . . . . . . . . . 463 Iuliana Chiuchisan, Oana Geman, Valentina Balas, Guojun Wang, Muhammad Arif, Haroon Elahi, and Peng Tao Gain Time Based Optimization of Small Platforms for IoT Solutions: A Case of 8-Bit AVR Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Hamza Ahmed, Muhammad Naeem Shehzad, and Muhammad Naeem Awais Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Short CVs of Guest Editors
Valentina Emili Balas is currently Full Professor in the Department of Automatics and Applied Software at the Faculty of Engineering, “Aurel Vlaicu” University of Arad, Romania. She holds a PhD in applied electronics and telecommunications from Polytechnic University of Timisoara. Dr. Balas is author of more than 300 research papers in refereed journals and international conferences. Her research interests are in intelligent systems, fuzzy control, soft computing, smart sensors, information fusion, modeling and simulation. She is the Editor-in-Chief to International Journal of Advanced Intelligence Paradigms (IJAIP) and to International Journal of Computational Systems Engineering (IJCSysE), member in Editorial Board member of several national and international journals and is evaluator expert for national, international projects and PhD thesis. Dr. Balas is Director of Intelligent Systems Research Centre in Aurel Vlaicu University of Arad and Director of the Department of International Relations, Programs and Projects in the same university. She served as General Chair of the International Workshop Soft Computing and Applications (SOFA) in eight editions 2005-2020 held in Romania and Hungary. Dr. Balas participated in many international conferences as Organizer, Honorary Chair, Session Chair and member in Steering, Advisory or International Program Committees. xxvii
xxviii
Short CVs of Guest Editors
She is a member of EUSFLAT, SIAM and Senior Member IEEE, member in Technical Committee– Fuzzy Systems (IEEE CIS), chair of the Task Force 14 in Technical Committee–Emergent Technologies (IEEE CIS), member in Technical Committee–Soft Computing (IEEE SMCS). Dr. Balas was past Vice-President (Awards) of IFSA International Fuzzy Systems Association Council (2013-2015) and is a Joint Secretary of the Governing Council of Forum for Interdisciplinary Mathematics (FIM)-A Multidisciplinary Academic Body, India. Professor Valentina Emili Balas, PhD Faculty of Engineering Aurel Vlaicu University of Arad B-dul Revolutiei 77 310130 Arad, Romania [email protected] Lakhmi C. Jain, BE(Hons), ME, PhD, Fellow (Engineers Australia), served as Visiting Professor in Bournemouth University, UK, until July 2018 and presently serving the University of Technology Sydney, Australia, and Liverpool Hope University, UK. Dr. Jain founded the KES International for providing a professional community the opportunities for publications, knowledge exchange, cooperation and teaming. Involving around 5,000 researchers drawn from universities and companies worldwide, KES facilitates international cooperation and generate synergy in teaching and research. KES regularly provides networking opportunities for professional community through one of the largest conferences of its kind in the area of KES. http://www.kesinternational.org/organisation.php His interests focus on the artificial intelligence paradigms and their applications in complex systems, security, e-education, e-healthcare, unmanned air vehicles and intelligent systems design.
Short CVs of Guest Editors
xxix
Professor Lakhmi C. Jain PhD|ME|BE (Hons)| Fellow (Engineers Aust), Founder KES International | http://www. kesinternational.org/organisation.php Visiting Professor | Liverpool Hope University, UK University of Technology Sydney, Australia Email – [email protected] Email – [email protected] Professor Lakhmi C. Jain, PhD Faculty of Engineering and Information Technology Centre for Artificial Intelligence University of Technology Sydney Broadway, NSW 2007 Australia [email protected] and Faculty of Science Liverpool Hope University Hope Park Liverpool, L16 9JD UK [email protected] and KES International PO Box 2115 Shoreham-by-Sea, BN43 9AF UK [email protected]
Marius Mircea Balas is currently Full Professor in the Department of Automatics and Applied Software at the Faculty of Engineering, University “Aurel Vlaicu” Arad (Romania). He holds Doctorate in Applied Electronics and Telecommunications from the Politehnica University of Timisoara. Dr. Balas is an IEEE Senior Member.
xxx
Short CVs of Guest Editors
He is the author of more than 150 papers in journals and conference proceedings and 7 invention patents. His research interests are in intelligent and fuzzy systems, soft computing, electronic circuits, modeling and simulation, adaptive control and intelligent transportation. The main original concepts introduced by Prof. Marius M. Balas are: the fuzzy-interpolative systems, the passive green-house, the constant time to collision optimization of the traffic, the imposed distance braking, the internal model bronze casting, PWM inverter for railway coaches in tropical environments, the rejection of the switching controllers effect by phase trajectory analysis, the Fermat neuron, etc. He has been mentor for many student research teams and challenges, awarded by Microsoft Imagine Cup, GDF Suez, etc. Professor Marius Mircea Balas, PhD Faculty of Engineering Aurel Vlaicu University of Arad B-dul Revolutiei 77 310130 Arad, Romania [email protected] Shahnaz N. Shahbazova received her Candidate of Technical Sciences degree in 1995 and has been Associate Professor since 1996. She has served for more than 34 years at the “Information Technology and Programming” Department of Azerbaijan Technical University. She is an academician of the International Academy of Sciences named after Lotfi A. Zadeh, 2002 and Vice-President of the same academy, 2014. Since 2011, she has served as General Chair and Organizer of World Conference on Soft Computing (WCSC), dedicated to preserving and advancing the scientific heritage of Professor Lotfi A. Zadeh. She is Honorary Professor of Obuda University, Hungary and Arad University in Romania, Honorary Doctor of Philosophy in Engineering Sciences, International Personnel Academy of UNESCO. She is International Expert of UNESCO in the implementation of Information Communication Technology (ICT) in educational environments in
Short CVs of Guest Editors
xxxi
Azerbaijan. She is a member of the Editorial Board of International Journal of Advanced Intelligence Paradigm (Romania), Journal of Pure and Applied Mathematics (Baku) and Journal of Problems of İnformation Technology (Baku). She was invited to serve as Program Committee member for over 30 international conferences and as a reviewer for nearly 40 international journals. Her awards are as follows: India-3 months (1998); Germany (DAAD), 3 months (1999); Germany (DAAD), 3 months (2003); USA, California (Fulbright), 10 months (2007-2008); Germany (DAAD)-3 months (2010); USA, UC Berkeley-6 months (2012, 2015, 2016, 2017); and USA, UC, Berkeley, 12 months (2018). She is the author of more than 152 scientific articles, 8 method manuals, 6 manuals, 1 monograph and 5 Springer publications. Her research interests include artificial intelligence, soft computing, intelligent system, machine learning techniques to decision making and fuzzy neural network. Shahbazova is a member of Board of Directors of North American Fuzzy Information Processing Society (NAFIPS); the Berkeley Initiative in Soft Computing group (BISC); the New York Academy of Sciences, the Defined Candidate Dissertation Society in Institute of Applied Mathematics at Baku State University. She is also a member of Defined Candidate Dissertation Society at Institute of Management System of National Academy of Sciences (NAS), Azerbaijan, a member of IEEE, and of the International Women Club, Azerbaijan. Professor Shahnaz N. Shahbazova, PhD Department of Information Technology and Programming Azerbaijan Technical University Baku, Azerbaijan [email protected]
Biomedical Applications
An Overview on Computer Processing for Endoscopy and Colonoscopy Videos Mihaela Luca(&), Tudor Barbu, and Adrian Ciobanu Institute of Computer Science, Romanian Academy, Iaşi Branch, Str. Codrescu Nr. 2, 700481 Iaşi, Romania {mihaela.luca,tudor.barbu, adrian.ciobanu}@iit.academiaromana-is.ro
Abstract. A state of the art on the actual methods for automatic video processing is necessary, toward further development of research methods in computer assisted diagnosis and image analysis regarding the pathology of the lower gastrointestinal tract. Automatic analysis of these types of video frames might be useful in the evaluation of the correctness of the procedure, in diagnosis confirmation or verification, in e-learning, in computing statistics regarding the recurrence in time of the malignant polyps, after colonoscopy investigations and eventually, polyp or adenoma resection. New technologies were developed such as autofluorescence imaging, chromoendoscopy, narrow band imaging, etc., based on the different proprieties of the hemoglobin, blood vessels or conspicuous textures of membranes to reflect certain wavelengths of light. A discussion on the new implementations in this domain is necessary before any other attempt of new research. Keywords: Colonoscopy Chromoendoscopy CT colonography Narrow band imaging Automatic analysis Polyp detection Image processing Deep learning
1 Introduction According to the World Health Organization (WHO), colorectal cancer (CRC) is the third most commonly diagnosed malignancy and the fourth leading cause of deaths in the world (774000 deaths in 2015) [1], expecting its incidence to increase by 60% until 2030 [2]. It is the second most commonly occurring cancer in women and the third most commonly occurring cancer in men [3]. There were almost 1.4 million new cases of colorectal cancer diagnosed in 2012, worldwide [4]. “In 2014, almost 153 000 people died from colorectal cancer in the EU-28, equivalent to 11.3% of all deaths from cancer” [5]. In Romania, a number of 5869 deaths from malignant colorectal neoplasms was declared, with a death rate of 32.4/100.000 inhabitants, for 2014, reported in [5], in 2017. Human expert survey of the endoscopy and colonoscopy videos is timeconsuming and requires intense labor, well trained physicians, the diagnosis is subjective depending on each expert’s experience. Comparative evaluation is difficult, conducting to interpretational variations [6–8].
© Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 3–22, 2021. https://doi.org/10.1007/978-3-030-52190-5_1
4
M. Luca et al.
Therefore, automatic analysis of these videos [6–8], is meant to evaluate the accuracy of the procedure. It might be useful for students and young physicians in learning to assess the diagnosis, and help physicians to get rid of the time consuming frames containing no or very few information, detecting the main frames which containing relevant details for diagnosis. Computer-aided diagnosis might improve the quantitative characterization of abnormalities, objectively comparing images referring to the evolution in time of a certain issue, characterizing the abnormalities in a quantifiable manner.
2 Necessity of Objective Assessment Colorectal polyps have complex shapes and different dimensions being managed accordingly. The so called “Paris classification” is a standardized system used to describe the polyps. Characteristics and size are decisive for curative endoscopic resection or surgery. In [9], an online survey based on the same videos, meant to assess the divergence of opinions among different gastrointestinal medical experts, from surgeons to young specialists of academic centers, was conducted. Evaluation, characteristics, dimensions, malignancy suspicion and indications for surgery or endoscopic resection of complex polyps were questioned. The survey addressed a group formed of “78% attending physicians and 22% GI trainees”, with 16% specialists in complex polypectomy [9]. The most part of specialists in complex polypectomy identified more accurately the malignant lesions. Yet, “accurate estimation of polyp size was poor… with moderate inter-observer agreement… Accuracy for Paris classification was 47.5%, with moderate inter-observer agreement”. [9] In this study, the physician’s specialty was closely associated with accurate polyp characterization and recommendations. Surgeons mainly recommended surgical resections of the complex nonmalignant colorectal polyps while specialists in complex polypectomy opted for endoscopic procedures. Thus, subjectivity and less experienced opinions makes teaching using objective methods more than necessary to ensure optimal solutions for lesions especially in case of nonmalignity and mainly for the new, attending physicians [9]. In a similar study [10], the risk factors and the frequency of the surgical resection of nonmalignant colorectal polyps were evaluated on more than 4000 patients with colorectal polyps, out of which 4.1% had colorectal surgery. Main risk factors for surgery were considered size, proximal polyp location, advanced histology (villous or high grade dysplasia). Surprisingly, the endoscopist was identified as an additional risk factor. “Referrals to surgery ranged from 0 to 46.6% per endoscopist for polyps 20 mm… with 24% surgical complications; one patient died after surgery [10]. We note an important remark: in order to decrease the rate of inappropriate surgeries in patients, “endoscopists should refer their patients with large or difficult polyps to expert endoscopists prior to surgery” [10]. The limit among less or “too much colonoscopy” is assessed in more national surveys in US [11], Korea [12], Japan [13], etc.
An Overview on Computer Processing for Endoscopy and Colonoscopy Videos
5
The survey studies conducted by the National Cancer Institute concluded that sometimes surveillance colonoscopy might be inappropriately performed, in excess for hyperplastic polyps and low-risk lesions, as small adenomas. A 30-days postoperative study, part of the US National Surgical Quality Improvement Program, research conducted in [14] is stating that despite the evidence that most nonmalignant colorectal polyps can be managed endoscopically, such patients are often sent to surgery. Morbidity, mortality and risk factors in patients undergoing surgical resection for nonmalignant colorectal polyps are evaluated in [15]. Due to the importance of this recent study conducted on “12,732 patients who underwent elective surgery for a nonmalignant colorectal polyp from 2011 through 2014” [14] we will cite selectively from this paper: “Thirty-day mortality was 7%. The risk of a major postoperative adverse event was 14%… and 7.8% of patients were readmitted while 3.6% of patients had a second major surgery within the 30 days…”, and “surgery for a nonmalignant colorectal polyp is associated with significant morbidity and mortality”, [14]. In Japan, due to improvement of endoscopic diagnosis and new techniques, no standard post-polypectomy colonoscopic surveillance is imposed, this being entrusted to each institute or each gastroenterologist [13]. Risks and benefits evaluation of nonmalignant colorectal polyps’ surgery is desirable. This might be realized by further discussions and analysis with skilled experts or using improved techniques of colorectal imagery, developing tools for computer assisted diagnosis in order to diminish the subjectivity of the physicians’ opinions as we already have stated in [15].
3 New Image Techniques in Colonoscopy Less than half a century ago, in 1971 the first “Colonfiberoscopy” was reported by its authors, Watanabe, Narasaka and Uezu [16]. Colonoscopy, still considered to be the gold standard investigation for the colon, presents a “certain amount of risks and contains an inherent miss rate of up to 25% for the detection of polyps and cancer” [6, 7]. It needs precise indications mainly when the persons do not bear suspicious polyps. Knowing the place, the size, the characteristics, not missing small or flat adenomas that could subsequently develop colorectal cancer, is a challenge. In order to increase colorectal adenoma detection rate, new imaging techniques emerged: narrow band imaging (NBI), autofluorescence imaging (AFI), chromoendoscopy and virtual chromoendoscopy, computer tomograph colonography (CTC), the capsule colon endoscopy and the dual capsule colon endoscopy, which are compared to the classic colonoscopy and its improved versions, using additional devices to attach to the colonoscope. NBI is relying on light penetration properties, directly proportional to the wavelength. Blue light (415 nm) enhances the visualization of superficial mucosal capillaries while green light (540 nm) increases the visibility of submucosal and mucosal vessels” [17]. “NBI filter placed in front of the xenon arc lamp produces the two narrow bands of light centered at the specific wavelengths of blue and green. These two wavelengths correspond to the primary and secondary light absorption peaks of
6
M. Luca et al.
hemoglobin, respectively” [18, 19]. “Capillaries in the superficial mucosa appear brown in 415 nm wavelength. The longer 540-nm wavelength penetrates slightly more deeply into the mucosa and submucosa and makes the deeper veins appear blue-green (cyan). Because most of the NBI light is absorbed by the blood vessels in the mucosa, the resulting images emphasize the blood vessels in sharp contrast with the nonvascular structures in the mucosa” [19, 20]. Noise reduction, light and color adjustment, were improved in the system developed by Olympus Medical Systems Corp., Tokyo, Japan (2006) [21–23]. It was mainly stated that NBI is better compared to standard colonoscopy for improving detection rates in average-risk populations, but it is giving similar results when compared to high definition colonoscopy, expert’s opinions still diverging [17, 24–26]. Autofluorescence (AFI) “is the natural emission of light by biological structures such as mitochondria and lysosomes when they have absorbed light, and is used to distinguish the light originating from artificially added fluorescent markers (fluorophores)” [27, 28]. “Autofluorescence imaging (AFI, Olympus, Tokyo, Japan), produces real-time pseudo-colour images by a rotating filter that produces short-wavelength light. Tissue exposure to this light leads to excitation of endogenous substances and subsequent emission of fluorescent light” tells Wee Sing Ngu in [17]. The AFI raises adenoma detection rate ADR, this technique being very suitable for training attending physicians [17, 28]. Chromoendoscopy, also known as chromoscopy and chromocolonoscopy consists in coloring the surface of the colon (usually spraying dye on it) making polyps more visible. It enhances the ability of colonoscopy to detect small polyps flat or depressed lesions, otherwise difficult to detect [29–36]. In 2006 Su said: “The NBI system identified morphological details that correlate well with polyp histology by chromoendoscopy” [30]. “Comparing chromoendoscopy with standard colonoscopy in high-risk patients (excluding inflammatory bowel disease), a Cochrane review found significantly higher rates of adenoma detection and rates of 3 or more adenomas with chromoendoscopy than with standard colonoscopy” stipulates the Medical Policy 904, Blue Cross Massachusetts [34]. Adjunct endoscopic techniques, as computed virtual chromoendoscopy [35], enhance the sensitivity of colonoscopy. Computed virtual chromoendoscopy with the Fujinon intelligent colour enhancement (FICE) system is a new dyeless imaging technique that might allow higher rates of adenoma detection [35, 36]. “Flexible spectral imaging color enhancement FIE [37] is post-processing whitelight endoscopic images to enhance certain wavelengths. “The three single-wavelength images are selected and assigned to the red, green, blue monitor inputs, respectively, to display a composite color-enhanced image in real time” [19]. Computed tomography (CT) colonography (CTC) [38–40] is suitable for patients wishing or being advised to avoid colonoscopy, for the surveillance of colorectal small polyps with no immediate risks. It is important to remark that CTC detects more cancers among aged persons without no symptom compared to younger ones [39]. A registration algorithm was designed to co-register the coordinates of endoluminal colonic surfaces on images from prone and supine CT colonographic acquisitions in
An Overview on Computer Processing for Endoscopy and Colonoscopy Videos
7
order to match polyps in sequential studies [40]. CTC is safe, accurate and better tolerated than Barium enema [41]. “i-SCAN™ (Pentax, Tokyo, Japan), is another virtual chromoendoscopy” [17]. It enhances tissue texture and vascular pattern for better diagnosis. “It has three modes of image enhancement: surface enhancement, contrast enhancement and tone enhancement” [17] and might improve up to 25% the adenoma detection rate (ADR) compared to colonoscopy. “Endoscopic trimodal imaging (ETMI) (Olympus, Tokyo, Japan) combines the use of high-definition endoscopy, autofluorescence imaging and narrow-band imaging during colonoscopy” [17]. There are different types of devices for colonoscopies. Thus, he Full Spectrum Endoscopy® (FUSE) (EndoChoice Inc., Alpharetta, GA, USA), [17] permits a 330° high resolution view of the colonic lumen. The video colonoscope contains three imagers and LED groups located at front and both sides of the flexible tip and the three cameras transmit images on three monitors, continuously. Third-eye® Retroscope® (TEC) (Avantis Medical Systems, Inc., Sunnyvale, CA, USA), enhance the images of proximal colonic folds and provides a 135° retrograde view of the colon, giving an additional detection rate of 30% for polyps and 23% for adenomas [17]. In the Aer-O-Scope™ colonoscope (GI-View Ltd, Ramat Gan, Israel), [17] the lens head enables 360° panoramic, omnidirectional, visualization displayed on a single screen. NaviAid™ G-EYE™ Balloon Colonoscope (SMART Medical Systems, Ra’anana, Israel) [17] comprises a standard colonoscope with a permanently integrated, reusable balloon at the distal end of the colonoscope. Among the additional devices to attach to the colonoscope, we can mention: Capassisted colonoscopy, Endocuff™ and Endocuff Vision™ (Arc Medical Design Ltd, Leeds, UK), EndoRings™ (EndoAid Ltd, Caesarea, Israel) [17]. The colon capsule endoscopy (CCE) was introduced to complete the colonoscopies, in order to diminish the missing rate of different lesions and small polyps [41]. The advantages of wireless video capsules for endoscopy are flexibility, accuracy, pain free and reasonable cost [41]. At the beginning it was produced by Given Images 2000 (Israel) and upgraded to PillCam™ capsule endoscopy platform, by Medtronic [42]. This noninvasive technique produces high quality images of the GI tract (esophagus, stomach, small bowel and colon), revealing and assessing the evolution of the GI abnormalities. Special software and recorder is associated. Reviews on capsule colonoscopy and current status were published in 2016 in [43], CE being evaluated by more groups [44]. “IT and robotics researchers [45, 46], try to solve problems as “detecting hemorrhage and lesions, reducing review time, localizing the capsule or lesion, assessing intestinal motility, enhancing video quality, imaging, power source, energy management, magnetic control, localization and locomotion mechanism, drug delivery, therapy, biopsy” [47], towards robotics in colonoscopy [48]. A pioneer approach is the “EndoVESPA project (Endoscopic Versatile robotic guidancE, diagnoSis and theraPy of magnetic-driven soft-tethered endoluminAl robots)” [45], intending to develop an “integrated robotic platform for the navigation
8
M. Luca et al.
of a soft-tethered colonoscope capable of performing painless diagnosis and treatment… having a “front-wheel” magnetic–driven approach for active and smooth navigation” [46]. With the new techniques for acquiring colorectal images their quality increased a lot, facilitating different automatic processing approaches. Diagnoses are keenly explained in some Gastrointestinal or Colon Atlases [49–51], and more representative images on colon diseases are available on sites showing impressive instances [52] https://www.endoscopy-campus.com/en/image-of-the-week/. The most recent papers published in this domain, using computer processing on colorectal video frames, in order to assist gastro-intestinal diagnosis, will be commented in the next section.
4 Automatic Video Analysis Needing long-time, constant, focused attention, colon investigations might especially benefit by on-line and off-line software assistance. Many dilemmas encountered in gastro-intestinal investigations, might be approached using computer assisted evaluations: on-line and off-line polyps detection and their automatic classification upon size, shape and structure, comparison with expert’s diagnosis, texture detection of GI lesions, cancer or not-cancer optical detection in vivo, before resection, and the degree of colon cleansing in order to validate or not the whole procedure. The approached techniques in order to analyze the videos depend on the physical modality of acquiring the images, the artificial intelligence methods employed being special designed for each case. Researches in this domain have developed a lot in the last two decades. Our review will mainly focus on automatic analysis of different aspects of colonoscopy and endoscopy videos in the last decade. An interesting previous study, describing different methods for automatic polyp detection in colonoscopy videos, was published in 2008 [6, 7]. More recent overviews are focusing on the problems posed by the capsule endoscopy. Papers as “Frontiers of robotic endoscopic capsules: a review”, Ciuti G. et al. [47], and “Polyp Detection and Segmentation from Video Capsule Endoscopy: A Review”, Surya Prasath V. B. [53], are conducting to a very vast recent literature to browse, as the problems approached when using capsule endoscopy are similar to those arising in optical colonoscopy. • Knowing that hyperplastic polyps are hardly detectable compared to adenomatous polyps in CTC, in 2009, Ronald M. Summers and his colleagues published a paper centered on polyp size detection [54]. He focused on automated measurement of colorectal polyp height, as hyperplastic polyps are flatter than adenomatous polyps of comparable width. “To assess flatness, the heights of the polyps at CTC were measured using a validated automated software program. Heights and height-towidth ratios of the hyperplastic polyps were compared to those of the adenomatous polyps using a t-test (two-tailed, unpaired, unequal variance).” [54]. Results displayed in more tables with polyps sizes and the discussions about shapes are useful for further research.
An Overview on Computer Processing for Endoscopy and Colonoscopy Videos
9
• Another paper containing a section focused on the computing methods previously used is “Computer-Aided Detection of Polyps in CT Colonography Using Logistic Regression” [55] published in 2010. Dealing with CTC, the authors are employing a linear logistic classifier (logistic regression) for computer-aided detection (CAD) of polyps, to detect the candidates and to order them. The system is based on carefully chosen features, two of them being “the protrusion of the colon wall, and the mean internal intensity” [55]. Features related to lesions size are mapped with the Mahalanobis distance to the target class mean, the classification task being approached as a regression problem. “Features are ordered according to relevance. A mechanism is introduced to map features that are not ordered as such, into features that do have the ordering property” [55]. • In 2010, Bashar et al. [56] propose a method to diminish the size of videos by detecting informative frames from original videos using colour and texture. The program discards the frames contaminated by fluids, stool and residues taking into account the specific range of colors and bubble-like texture patterns. The authors use SVM classifier, a multi-steps classifier and a “Gauss Laguerre transform (GLT) based multiresolution texture feature… this mixing reducing computation costs, producing better detection accuracies by minimizing visual-selection errors, especially when processing large numbers of WCE videos” [56]. • Automatic detection and segmentation of polyps and lesions on the video frames, acquired during colon inspection, continues to be a challenge. Wijk et al. in 2010 proposes several solutions for automatic “detection and segmentation of colonic polyps on implicit isosurfaces” [57] using second principal curvature flow [57]. In this approach, the detection of a candidate object depends only on the amount of protrusion. “Additionally, the method yields correct polyp segmentation, without the need of an additional segmentation step” [57]. Features regarding size, shape and intensity are used for the supervised pattern recognition. Objects ranking is accomplished with the Mahalanobis transformation and a logistic classifier”. • Even if detecting polyps and adenoma is the most important aspect in colonoscopy, this will not be possible without a correct bowls preparation. More than 50% of images selected from the video frames covered by stool, intestinal residue or unclear fluid results in a colon investigation that could not be validated for diagnosis. Thus it is important to evaluate from the first sight the quality of the medical exploration. This was the subject of more papers published by Muthukudage J.K. in 2011, 2013 [58, 59]. Characteristic colors displayed in the RGB cub, were classified using SVM. Images were first manually selected in order to well identify the objects which are present on the colon surface and all these research results were developed and discussed in the doctoral theses “Automated real-time objects detection in colonoscopy videos for quality measurements” [59]. • “One of the main goals of automated polyp detection is to drastically decrease the amount of video frames that require manual inspection” [60]. The colon capsule endoscopy (CCE) is a secure modality of obtaining images from digital cameras placed on a small capsule ingested by the patient. It is minimally invasive, avoiding pain or discomfort. Colorectal video analysis is a good subject for computer-aided identification of polyps and lesions on the gastro-intestinal tissue, as early detection
10
M. Luca et al.
of the polyps is essential in preventing colon cancer. Yet, even if we would have an algorithm perfectly detecting polyps in a video, the actual positioning of a polyp in the colon still has to be estimated (even more when dealing with the irregular motion of the capsule). Mamonov proposes either to try to reconstruct the capsule’s motion from the changes in the subsequent video frames or to imagine a coalignment method of the optical colonoscopy with another functional exploration, for example CTC. In 2014, Mamonov [60] describes a binary classifier with pre-selection meant to automatically analyze the video sequence frames, converted to grayscale before processing. The non-informative frames are filtered out. The other frames are labeled as containing polyps or not, the automatic algorithm using the “geometrical analysis and the texture content of the frame” [60]. The image is segmented with a mid-pass filter, and the extracted features are classified on the assumption that the polyps are mainly round protrusions. The decision parameter for the binary classifier is the best fit ball radius. The problems that arise are: “the presence of trash liquids and bubbles, vignetting due to the use of a non-uniform light source, high variability of possible polyp shapes and the lack of a clear cut between the geometry of the polyps and the folds of a healthy mucosal tissue” [60]. The procedure was applied on “18900 frames from the endoscopic video sequences of five adult patients” [60]. Finally, on a video sequence, almost 10% false positive frames need to be inspected by a human operator. “The algorithm demonstrates high per polyp sensitivity and, equally important, displays a high per patient specificity, i.e. a consistently low false positive rate per individual patient” [60]. The authors wish to develop their approach using color information and improving the pre-selection criterion using specular reflection detection (bubbles typically produce strong specular reflections). • Even if colonoscopy is still the gold standard for colon cancer screening and prevention, by finding and removing colonic polyps, and preventing against cancer, recent clinical studies report a significant polyp miss due to the inappropriate quality of colonoscopy. In the guidelines for a “good” colonoscopy, the minimum withdrawal time has to be 6 min, thus, the quality of colonoscopy is influenced. To objectively measure the quality of an examination, in 2014, it was proposed a procedure assessing the informativeness of colonoscopy images with a normalized quality score to each video colonoscopy frame [61]. The averaged computed scores give the overall quality of colonoscopy [61]. • The same team [62], worked on automated polyp detection in colonoscopy videos using shape to reliably localize polyps and context information to remove nonpolyp structures [62]. On the edge map selected for each frame, using a feature extractor and an edge classifier, the edges not belonging to a polyp are cut. Polyp candidates are identified with a voting scheme and probabilistic confidence weight [62]. The polyp image databases CVC-ColonDB [63], and their special built ASUMayo database [64] were used in [62]. • Different pattern recognition procedures are used for computer aided detection of colorectal polyps and incipient (precursor) lesions in colonoscopy, as Local Binary Patterns (LBPs) – which are perceived as “strong illumination invariant texture
An Overview on Computer Processing for Endoscopy and Colonoscopy Videos
•
•
•
•
11
primitives” [65] and “histograms of binary patterns computed across regions, to describe textures” [65]. Color, discrete cosine transform (DCT) and LBP were used for features selection, in this paper. Colorectal polyps have either pedunculated or sessile shape and are typically protrusions in folds. “Curvature techniques may be employed for singling out the polyp” [65]. For every pixel, the difference (contrast) relative to the gray level of each neighbor is computed. In Geetha’s study, polyps are detected using “classification via J48 and Fuzzy” [65]. The reduction of video review time (analyzing thousands of frames for each video) and the selection of a bag of best features for automatic polyp detection are also issues approached by Yuan Y. et al. in [66], and earlier in [67]. An “improved bag of feature (BoF) method is proposed to assist classification of polyps in WCE images… Different textural features are selected from the neighborhoods of the key points and integrate them together as synthetic descriptors to carry out classification tasks” [66]. They discuss the importance of the “number of visual words, the patch size and different classification methods in terms of classification performance” [66]. A high classification accuracy (over 93%) is obtained taking features around the neighborhoods of the key points and integrating them together, using SIFT, complete local binary pattern (CLBP), visual words length of 120, patch size of 8 * 8, and SVM [66]. Characteristics extraction for lesions and mucosal inflammations detection is approached conceiving a HAFDLac structure [68] by Charisis in 2016. “A Hybrid Adaptive Filtering (HAF) extract lesion-related structural/textural features, employing Genetic Algorithms to the Curvelet-based representation of image. Differential Lacunarity (DLac) analysis was applied for feature extraction from the HAF-filtered images” [68]. SVM is used for lesion recognition. HAFDLac was trained and tested on an 800-image database, abnormal lesion patterns being grouped into mild and severe, outperforming other systems in automated lesion detection. Another previous paper with the same first author, Charisis, in 2013 [69], for ulcer regions detection, is focusing on color rotation and texture features. “WCE data are color-rotated in order to boost the chromatic attributes of ulcer regions. Then, texture information is extracted by utilizing the LBP operator” [68]. They obtain a classification accuracy of about 91%. A comprehensive recent doctoral thesis, sustained by Manivannan S. in 2015 [70] is developing actual problems to be solved in medical image processing. Normal/abnormal frames classification, different feature selection methods, classification using multi-resolution local patterns and SVM, image processing on the frames selected from colonoscopy videos, are only some of the subjects developed in [70]. Interesting mainly due to the visual features descriptors which are compared in an empirical assessment for ulcer recognition in wireless capsule endoscopy (WCE) video this very recent study [71], uses a Support Vector Machine SVM classifier. WCE video frames are analyzed to detect the presence of areas corresponding to the selected descriptors: Local Binary Pattern (LBP), Curvelet Transform (CT), Chromaticity Moments Color (CMC), Color Coherence Vector (CCV), Homogenous Texture Descriptor (HTD), Scalable Color Descriptor (SCD), YCBCR Color Histogram, CIE_Lab Color Histogram, HSV Color Histogram.
12
M. Luca et al.
The comparison intends to determine which visual descriptor represents better WCE frames, detecting the searched areas. • Polyp miss rate and the challenge for visual estimation of polyp malignancy are issues approached in [72], trying to design a decision support system (DSS) to provide an endoluminal scene segmentation. They are training standard fully convolutional networks (FCNs) for semantic segmentation on a dataset consisting of 4 relevant classes for the endoluminal scene, thus introducing an extended benchmark of colonoscopy image segmentation [72]. This paper [72] also contains a state-ofthe-art in colonoscopy image segmentation. The authors combine CVC-ColonDB and CVC-ClinicDB into a new dataset (CVC-EndoSceneStill) composed of 912 images obtained from 44 video sequences acquired from 36 patients. We have to underline that this team of researchers (Vazquez D., Bernal J., Sanchez F.J.), has a remarkable number of papers [72–76], trying to find original solutions to a vast number of problems arising in the colorectal image processing, towards real-time computer-assisted diagnosis.
5 Deep Learning in Polyp Detection Dealing with the challenge of real-time differentiation for “adenomatous and hyperplastic diminutive colorectal polyps”, colon video processing could eliminate the obstacle of interobserver variability in endoscopic polyp interpretation and enable widespread acceptance of “resect and discard” paradigm [77]. An artificial intelligence (AI) model for real-time assessment of NBI colorectal video frames using a deep convolutional neural network model is described in [77]. The different patterns to be revealed were equally split between relevant multiclasses. “The model was trained on unaltered videos from routine exams… was tested on a separate series of 125 videos of consecutively encountered diminutive polyps that were proven to be adenomas or hyperplastic polyps” [77]. Thus, they are planning to pursue “additional study of this programme in a live patient clinical trial setting” [77]. • More communications on this subject were recently done, referring to deep-learning algorithms that might succeed, in a real-time automatic system, to attend an endoscopist-level for polyp detection [78, 79] said Pu Wang, MD, Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital in China, presentation to United European Gastroenterology (UEG) Week. “Endoscopists annotated images in the development data set, outlining the boundaries of each polyp” [78, 79]. • In an article on the Healio Gastroenterology website the journalist Adam Leitenberger [80] related about the opening speech of Yuichi Mori, MD, PhD, from Showa University in Yokohama, Japan in UEG Week [81, 82]. “Artificial intelligence enables real-time optical biopsy of colorectal polyps during colonoscopy, regardless of the endoscopists’ skill” [82]. “This allows the complete resection of adenomatous polyps and prevents unnecessary polypectomy of non-neoplastic polyps” [82]. On ultra-magnified endocytoscopies (provided by Olympus system), almost “300 polyp features obtained by using narrow-band imaging (NBI) or
An Overview on Computer Processing for Endoscopy and Colonoscopy Videos
13
methylene blue dye” are used to predict a “lesion’s pathology comparing it with more than 30,000 other endocytoscopy images that were used for machine learning” [80, 82], in less than a second. Mori and colleagues compared AI system’s prediction to the expert’s results for 250 patients with colorectal polyps with very good results. Yuichi Mori and colleagues will develop in a multicenter study, an automatic polyp detection system [80] (Table 1). Table 1. Studies on colon video frames automatic processing published in the last decade Study: year, first author, title, reference
Available web address
Methods comments
Overall nr. of patients
Nr. of tests
Results
185 patients had hyperplastic or adenomatous polyp 6–10 mm in size (CTC)
Automated height and width measurements for 231 (97.1%) of 238 polyps visible at CTC
2008 Ameling S. [7]
Overview
2009 https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3412299/ Summers R.M. [54] Automated Measurement of Colorectal Polyp Height at CT Colonography: Hyperplastic Polyps are Flatter than Adenomatous Polyps
Soft for geometric size evaluation (height) in CTC
1186 patients
2010 V.F. van Ravesteijn [55] Computer-Aided Detection of Polyps in CT Colonography Using Logistic Regression IEEE Trans. on Medical Imaging
CAD on CTC candidate det. + supervised classification; logistic regression on 3 features: colon wall protrusion, mean internal intensity, features discarding enema tube detection
287
Polyps > 6 mm, sensitivities 95%, 85%, 85%, and 100% with 5, 4, 5, and 6 false positives per scan over 86, 48, 141, and 32 patients
2010 https://www.ncbi.nlm.nih.gov/pubmed/ Bashar M.K. [56] 20137998 Automatic detection of informative frames from wireless capsule endoscopy images,
CTC, CAD
14841 and 37100 frames from three videos and 66582 frames from six videos
Average detection accuracies (86.42% and 84.45%) same color features & conventional Gabor-based (78.18% and 76.29%) discrete wavelet-based (65.43% and 63.83%) texture
2010 http://citeseerx.ist.psu.edu/viewdoc/ C. van Wijk [57] download?doi=10.1.1.918.7236&rep= Detection and rep1&type=pdf segmentation of colonic polyps on implicit isosurfaces by second principal curvature flow
CTC supervised pattern recognition size, shape, intensity features Mahalanobis Transform for objects ranking and logistic classifier
84 patients with 57 polyps > 6 mm
95% sensitivity at four false positives per scan for polyps larger than or equal to 6 mm
https://www.researchgate.net/publication/ 224574408_Computer-Aided_Detection_ of_Polyps_in_CT_Colonography_Using_ Logistic_Regression
(continued)
14
M. Luca et al. Table 1. (continued)
Study: year, first author, title, reference
Available web address
2011 https://link.springer.com/content/pdf/10. Muthukudage J.K. [58] 1007%2F978-3-642-25367-6_6.pdf Color Based Stool Region Detection in Colonoscopy Videos for Quality Measurements
Methods comments
Overall nr. of patients
Nr. of tests
Results
Images stored in MPEG-2 color based classification of images RGB cube SVM KNN
58 Videos from 58 different patients
BBPS Boston Bowel Preparation Scale
Automatic stool detection with high accuracy sensitivity 93%, 95% specificity
https://pdfs.semanticscholar.org/3344/ 2013 Muthukudage J.K., [59] bb2977efbaf5a541686427f436d3ba5663ee. Automated Real-Time pdf Objects Detection in Colonoscopy Videos for Quality Measurements, 2014 Mamonov A.V. [60] Automated Polyp Detection in Colon Capsule Endoscopy
https://arxiv.org/pdf/1305.1912.pdf
2015 Tajbakhsh N. [61, 62] Automated Polyp Detection in Colonoscopy Videos Using Shape and Context Information
https://www.ncbi.nlm.nih.gov/pubmed/ 26462083 https://ieeexplore.ieee.org/document/ 7294676/
2016 Geetha K [65] Automatic Colorectal Polyp Detection in Colonoscopy Video Frames
https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC5454689/
2016 https://ieeexplore.ieee.org/document/ Yuan Y, [66] 7052426/ Improved bag of feature for automatic polyp detection in wireless capsule endoscopy images 2015 Manivannan S. [8, 70] Visual feature learning with application to medical image classification.
https://discovery.dundee.ac.uk/ws/files/ 7584173/Thesis.pdf
2016 Charisis V.S. [68, 69] Potential of hybrid adaptive filtering in inflammatory lesion detection from capsule endoscopy images,
https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC5075542/pdf/WJG-22-8641.pdf
Doctoral thesis
18900 frames from 5 CCE videos
From 3747 frames, 367 were selected for examination
Shape and Context Features selection on CVC Colon DB, and ASU-Mayo DB
300 polyp color images of 15 polyps, 19,400 frames, 5,200 polyp instances of 10 polyps
At 0.1 false positives per frame, 88.0% sensitivity CVC-Colon DB 48% sensitivity ASU-Mayo DB
Fuzzy-Color + DCT + LBP
235 colon images with polyp and 468 normal colon images
Classification accuracy 96.2
SIFT + SIFT + SIFT + SIFT + SIFT’ Plus SVM
120 visual words for polyp classification; suitable patch size: 8 * 8
Polyp detection accuracy of 93.20%
LBP uniLBP CLBP HOG
Doctoral thesis
HAFDLac: Hybrid Adaptive Filtering Differential Lacunarity Genetic Alg. on Curvelet img. repr SVM
WCE from 13 patients
800-image database
Lesion classification results: 93.8% (accuracy), 95.2% (sensitivity), 92.4% (specificity), 92.6% (precision
(continued)
An Overview on Computer Processing for Endoscopy and Colonoscopy Videos
15
Table 1. (continued) Study: year, first author, title, reference
Available web address
2016 Vazquez D., Bernal J. [72] A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images Journal of Healthcare Engineering
http://refbase.cvc.uab.es/files/vbs2017b.pdf FCN Fully https://www.hindawi.com/journals/jhe/2017/ Convolutional 4037190/ Network semantic segmentation of colonoscopy data
2017 Byrne M. [77] Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using DL model
https://gut.bmj.com/content/early/2017/11/ 09/gutjnl-2017-314547?int_source= trendmd&int_medium=trendmd&int_ campaign=trendmd
http://aircconline.com/csit/csit885.pdf 2018 Bchir O. [71] Empirical Comparison of Visual Descriptors for Ulcer Recognition in Wireless Capsule Endoscopy Video
Methods comments
Overall nr. of patients
Nr. of tests
Results
CVCColonDB 300 images polyp and background (mucosa and lumen) segmentation masks from 13 polyp video sequences 13 patients.
CVCClinicDB\612 images with polyp and background (mucosa and lumen) segmentation masks from 31 polyp video 23 patients
Training set: 20 patients 547 frames, validation set 8 patients and 183 frames; test set 8 patients, 182 frames.
125 videos For 19 (15%) polyps not sufficient confidence for histologic prediction
For 106 diminutive polyps 94% accuracy 98% sensitivity adenomas identification 83% specificity
NBI’ Deep learning
LBP Curvelet Tf CMC CCV HTD SCD Ycbcr Color Hist. CIE_Lab Color Hist HSV Color Hist
220 images of bleeding, 159 images of ulcers and 228 images of non bleeding ulcers
Deep learning for autovalidation during colonoscopy
5,545 colonoscopy images collected from 1,290 patients In 2007–2015
Accuracy LBP 98.85% CIE_Lab Color Hist 98.95% Color Coherence Vector 82.87% Chromaticity Moments Color 77.42%
Reports 2017 Pu Wang [79] Automatic polyp detection shows promise for assisting colonoscopy, Wang Pu, et al. Abstract 4. World Congress of Gastroenterology at American College of Gastroenterology Annual Scientific Meeting; Oct. 13-18, 2017; Orlando, FL
https://www.healio.com/gastroenterology/ interventional-endoscopy/news/online/% 7B53d5a435-2d7c-4ab7-ac8af0ef943a2b40%7D/automatic-polypdetection-shows-promise-for-assistingcolonoscopy?nc=1
27,113 colonoscopy images in 2016 (5,541 with polyps from 1,138 patients and 289 colonoscopy videos in 2017
91.64% sensitivity, 96.3% specificity, 100% detection rate
6 Discussion Time-consuming and intensive attention demanding, videos from colonoscopies and endoscopies automatically analysed give important information and represent an objective tool for physicians.
16
M. Luca et al.
An impressive number of papers in this domain propose different methods of preprocessing images, eliminating spots and speckles, pathologic area automatic segmentation, feature selection using colour, texture and shape, to be processed by different classifiers. Neural networks [83] and deep learning [77] have also been approached with exciting results. Important information in surgical interventions is the exact estimation of the polyps or adenoma dimensions, before deciding their resection or just in order to survey their growth. Scientists are focusing on this aspect too. Another important estimation is concerning the correctitude of the procedure regarding bowls preparation, thus, dedicated software may objectively evaluate this aspect. Therefore artificial intelligence might help a lot in offering objective parameters, making available the exact position, number, size of the structures abnormally developing on different parts of the gastrointestinal tract.
7 Conclusions Computer-aided detection of polyps in colon functional exploring previous or after polyp or adenoma resection gives objective information to the physician, for surveying the evolution and preventing further complications. Video frames number reduction, automatically discard of non-informative instances from tens of thousands of frames in colonoscopy videos, alleviates the examination burden. Diagnose systems may reduce image data reviewing time, and serve as an important assisting tool for the training of new physicians. Assuming a correct initial labeling of colon suspicious areas, training the systems with stratified classification procedures, using the appropriate features, in sufficient amount, might lead to correct results and numerous groups of researchers worldwide are trying to improve these results. Competitions [84] are meant to speed up obtaining practical results to help preventing the development of this type of cancer which continue to be a major cause of death in the world statistics.
References 1. World Health Organization: Fact Sheets: Cancer, Key Facts. http://www.who.int/newsroom/fact-sheets/detail/cancer. Accessed 5 June 2018 2. Arnold, M., Sierra, M.S., Laversanne, M., Soerjomataram, I., Jemal, A., Bray, F.: Global patterns and trends in colorectal cancer incidence and mortality, http://www.dep.iarc.fr/ includes/Gut-2016-Arnold-gutjnl-2015-310912.pdf. Accessed 10 June 2018 3. Colorectal Cancer Facts & Figures, 2017–2019, American Cancer Society. https://www. cancer.org/content/dam/cancer-org/research/cancer-facts-and-statistics/colorectal-cancerfacts-and-figures/colorectal-cancer-facts-and-figures-2017-2019.pdf. Accessed 11 June 2018 4. World Cancer Research Fund International: Colorectal cancer statistic (2015). https://www. wcrf.org/int/cancer-facts-figures/data-specific-cancers/colorectal-cancer-statistics. Accessed 5 June 2018
An Overview on Computer Processing for Endoscopy and Colonoscopy Videos
17
5. Cancer statistics - specific cancers, statistics explained. http://ec.europa.eu/eurostat/statisticsexplained/pdfscache/39738.pdf. http://ec.europa.eu/eurostat/statistics-explained/index.php/ Cancer_statistics. Accessed 11 June 2018 6. Ameling, S., Wirth, S., Shevchenko, N., Wittenberg, T., Paulus, D., Münzenmayer, C.,: Detection of lesions in colonoscopic images: a review. In: IFMBE 2010 Proceedings, vol. 25, no. 4, pp. 995–998 (2010) 7. Ameling, S., Wirth, D., Paulus, D.: Methods for polyp detection in colonoscopy videos: a review. Fachbereich Informatik Nr. 14/2008, Technical report, 1 December 2008. https:// pdfs.semanticscholar.org/6a32/0b42a67ec6f3997a8e7d837acf2d595f95b5.pdf. Accessed 11 May 2018 8. Manivannan, S., Wang, R., Trucco, E., Hood, A.: Automatic normal-abnormal video frame classification for colonoscopy. In: 2013 IEEE 10th International Symposium on Biomedical Imaging: From Nano to Macro (ISBI), San Francisco, CA, USA, 7–11 (2013). https://pdfs. semanticscholar.org/696c/ef94b8656a86b01cda1c580ba586adef3265.pdf 9. Aziz Aadam, A., Wani, S., Kahi, C., Kaltenbach, T., Oh, Y., Edmundowicz, S., Peng, J., Rademaker, A., Patel, S., Kushnir, V., Venu, M., Soetikno, R., Keswani, R.N.: Physician assessment and management of complex colon polyps: a multicenter video-based survey study. Am. J. Gastroenterol. 109(9) (2014). https://www.ncbi.nlm.nih.gov/pubmed/ 25001256. Accessed 10 May 2018 10. Le Roy, F., et al.: Frequency of and risk factors for the surgical resection of nonmalignant colorectal polyps: a population-based study. Endoscopy 48(3), 263–270 (2015). https:// www.ncbi.nlm.nih.gov/pubmed/26340603. Accessed 9 May 2018 11. Mysliwiec, P.A., Brown, M.L., Klabunde, C.N., Ransohoff, D.F.: Are physicians doing too much colonoscopy? A national survey of colorectal surveillance after polypectomy. Ann. Internal Med. 141(4), 264–271 (2004). https://www.ncbi.nlm.nih.gov/pubmed/15313742. Accessed 7 May 2018 12. Dae, K.S., Colonoscopy Study Group of the Korean Society of Coloproctology: A survey of colonoscopic surveillance after polypectomy. Ann. Coloproctol. 30(2), 88–92 (2014). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4022758/. Accessed 7 May 2018 13. Tanaka, S., et al.: Surveillance after colorectal polypectomy; comparison between Japan and U.S. Kobe J. Med Sci. 56, E204–E213 (2011). https://www.ncbi.nlm.nih.gov/pubmed/ 21937868. Annals of Coloproctology, Accessed 11 May 2018 14. Peery, A.F., et al.: Morbidity and mortality after surgery for nonmalignant colorectal polyps. Gastrointest. Endosc. 87(1), 243–250. https://www.ncbi.nlm.nih.gov/pubmed/28408327. Accessed 11 May 2018 15. Ciobanu, A., Luca (Costin), M., Drug, V., Tulceanu, V.: Steps towards computer-assisted classification of colonoscopy video frames. In: 6th IEEE International Conference on EHealth and Bioengineering - EHB 2017, Sinaia, Romania, 22–24 June 2017 (2017) 16. Watanabe, H., Narasaka, T., Uezu, T.: Colonfiberoscopy. Stomach Intestine 6, 1333–1336 (1971) 17. Ngu, W.S., Rees, C.: Can technology increase adenoma detection rate?. Ther. Adv. Gastroenterol. 11, 1–18 (2018). Creative Common Attr. http://journals.sagepub.com/doi/full/ 10.1177/1756283X17746311, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5784538/ #bibr60-1756283X17746311. Accessed 11 June 2018 18. Kuznetsov, K., Lambert, R., Rey, J.F.: Narrow-band imaging: potential and limitations. Endoscopy 38, 76–81 (2006) 19. Manfred, M.A., ASGE American Society for Gastroenterology, Technology Committee, et al.: Electronic chromoendoscopy. Gastrointest. Endosc. 81(2), 249–261, (2015). https:// www.giejournal.org/article/S0016-5107(14)01855-0/pdf. Accessed 11 June 2018
18
M. Luca et al.
20. Gono, K., Obi, T., Yamaguchi, M., et al.: Appearance of enhanced tissue features in narrowband endoscopic imaging. J. Biomed. Opt. 9, 568–577 (2004). Accessed 11 June 2018 21. Sano, Y., et al.: Narrow-band imaging (NBI) magnifying endoscopic classification of colorectal tumors proposed by the Japan NBI Expert Team, Review. Dig. Endos. 28, 526– 533 (2016) 22. Sano, Y., Kobayashi, M., Kozu, T., et al.: Development and clinical application of a narrow band imaging (NBI) system with builtin narrow-band RGB filters. Stom. Intest. 36, 1283– 1287 (2001) 23. Sano, Y.: NBI story. early colorectal. Cancer 11, 91–92 (2007) 24. Kaltenbach, T., Friedland, S., Soetikno, R.: A randomised tandem colonoscopy trial of narrow band imaging versus white light examination to compare neoplasia miss rates. Gut J. 57, 1406–1412 (2008). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1. 496.5526&rep=rep1&type=pdf. Accessed 11 June 2018 25. Nagorni, A., Bjelakovic, G., Petrovic, B.: Narrow band imaging versus conventional white light colonoscopy for the detection of colorectal polyps. Cochrane Database Syst. Rev. (2012) 26. Vișovan, I.I., Tanțău, M., Pascu, O., Ciobanu, L., Tanțău, A.: The role of narrow band imaging in colorectal polyp detection. Bosnian J. Basic Med. Sci. 17(2), 152–158 (2017). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5474109/. Accessed 10 May 2018 27. Monici, M.: Cell and tissue autofluorescence research and diagnostic application. Biotechnol. Ann. Rev. 11, 227–256 (2005). https://www.ncbi.nlm.nih.gov/pubmed/ 16216779. Accessed 10 May 2018 28. Moriichi, K., Fujiya, M., Sato, R., et al.: Back-to-back comparison of auto-fluorescence imaging (AFI) versus high resolution white light colonoscopy for adenoma detection. BMC Gastroenterol. 12, 75 (2012). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444423/. Accessed 11 June 2018 29. Matsumoto, T., Esaki, M., Fujisawa, R., Nakamura, S., Yao, T., Iida, M.: Chromoendoscopy, narrow-band imaging colonoscopy and autofluorescence colonoscopy for detection of diminutive colorectal neoplasia in familial adenomatous polyposis. Dis. Colon Rectum 52 (6), 1160–1165 (2009) 30. Su, M.Y., Hsu, C.M., Ho, Y.P., Chen, P.C., Lin, C.J., Chiu, C.T.: Comparative study of conventional colonoscopy, chromoendoscopy, and narrow-band imaging systems in differential diagnosis of neoplastic and nonneoplastic colonic polyps. Am. J. Gastroenterol. 101(12), 2711–2716 (2006). https://www.ncbi.nlm.nih.gov/pubmed/17227517/. Accessed 11 June 2018 31. Brown, S.R., Baraza, W., Din, S., Riley, S.: Chromoscopy versus conventional endoscopy for the detection of polyps in the colon and rectum. Cochrane Lib (2016). Cochrane Colorectal Cancer Group. http://cochranelibrary-wiley.com/doi/10.1002/14651858. CD006439.pub4/full. Accessed 11 June 2018 32. Song, L.M.W.K., Adler, D.G., Chand, B., Conway, J.D., Croffie, J.M.B, DiSario, J.A., Mishkin, D.S., Shah, R.J., Somogyi, L., Tierney, W.M., Petersen, B.T.: Chromoendoscopy. Gastrointest. Endos. 66(4), 639–649 (2007) 33. NICE (National Institute for Clinical Excellence) Guidance, Virtual chromoendoscopy to assess colorectal polyps during colonoscopy, Diagnostics guidance [DG28], May 2017. https://www.nice.org.uk/guidance/DG28. Accessed 11 June 2018 34. Medical Policies - Blue Cross Blue Shield of Massachusetts, Chromoendoscopy as an adjunct to colonography – Policy Nr: 904 BCBSA Ref. Nr: 2.01.84 (2018). Common media. http://www.bluecrossma.com/common/en_US/medical_policies/904%20Chromoendoscopy %20as%20an%20Adjunct%20to%20Colonoscopy%20prn.pdf. Accessed 15 May 2018
An Overview on Computer Processing for Endoscopy and Colonoscopy Videos
19
35. Pohl, J., Lotterer, E., Balzer, C., Sackmann, M., Schmidt, K.D., Gossner, L., Schaab, C., Frieling, T., Medve, M., Mayer, G., Nguyen-Tat, M., Ell, C.: Computed virtual chromoendoscopy versus standard colonoscopy with targeted indigocarmine chromoscopy: a randomised multicentre trial. Gut 58(1), 73–78 (2008). https://www.ncbi.nlm.nih.gov/ pubmed/18838485. Accessed 1 June 2018 36. Bond, A., Sarkar, S., et al.: New technologies and techniques to improve adenoma detection in colonoscopy. World J. Gastrointest. Endos. 7(10), 969–980 (2015). https://www.ncbi.nlm. nih.gov/pmc/articles/PMC4530330/. Accessed 11 June 2018 37. Fujifilm Endoscopy System. https://www.fujifilm.eu/fileadmin/migration_uploads/NEW_ HORIZONS_Catalogue_Endo_GB_2013.pdf. Accessed 17 May 2018 38. Pickhardt, P.J., Correale, L., Delsanto, S., Regge, D., Hassan, C.: CT Colonography performance for the detection of polyps and cancer in adults 65 years old: systematic review and meta-analysis. Am. J. Roentgenol. 211(1) (2018). https://www.ajronline.org/doi/ full/10.2214/AJR.18.19515 39. Diagnostic Imaging Staff, Modern Medicine Network, Diagnostic Imaging, CT Colonography Has Higher Cancer Detection Rate among Seniors. http://www.diagnosticimaging. com/ct/ct-colonography-has-higher-cancer-detection-rate-among-seniors. Accessed 17 May 2018 40. British Society of Gastrointestinal and Abdominal Radiology (BSGAR) and Royal College of Radiologists, Guidance on the use of CT colonography for suspected colorectal cancer. https://www.rcr.ac.uk/system/files/publication/field_publication_files/BFCR(14)9_COLON. pdf. Accessed 15 May 2018 41. Iddan, G., Meron, G., Glukhovsky, A., Swain, P.: Wireless capsule endoscopy. Nature 405, 417 (2000) 42. Medtronic website. http://www.medtronic.com/covidien/en-us/products/capsule-endoscopy. html. Accessed 10 May 2018 43. Yung, D.E., Rondonotti, E., Koulaouzidis, A.: Review: capsule colonoscopy - a concise clinical overview of current status. Ann. Transl. Med. 4(20), 398 (2016). https://www.ncbi. nlm.nih.gov/pubmed/27867950. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5107393. Accessed 6 May 2018 44. Spada, C., Hassan, C., Costamagn, A.G.: Colon capsule endoscopy in colorectal cancer screening: a rude awakening from a beautiful dream?. Clin. Gastroenterol. Hepatol. J. 13, 2302–2304 (2015). https://www.cghjournal.org/article/S1542-3565(15)01186-6/pdf. Accessed 17 May 2018 45. EndoVESPA, Endoscopic Versatile robotic guidancE, diagnoSis and theraPy of magneticdriven soft-tethered endoluminAl robots (2018). http://www.endoo-project.eu/. Accessed 4 May 2018 46. EndoVESPA. https://cordis.europa.eu/project/rcn/199876_en.html. Accessed 17 May 2018 47. Ciuti, G., et al.: Frontiers of robotic endoscopic capsules: a review. J. Micro-Bio Robot. 11, 1–18 (2016) http://europepmc.org/backend/ptpmcrender.fcgi?accid=PMC5646258&blob type=pdf. http://europepmc.org/articles/PMC5646258. Accessed 17 May 2018 48. Cater, D., Vyas, A., Vyas, D.: Robotics in colonoscopy. Am. J. Robot. Surg. 1, 48–54 (2014). http://europepmc.org/articles/PMC4570490/. http://europepmc.org/backend/ptpmc render.fcgi?accid=PMC5646258&blobtype=pdf. Accessed 17 May 2018 49. Chisanga, D., Keerthikumar, S., Pathan, M., Ariyaratne, D., Kalra, H., Boukouris, S., Mathew, N., Al Saffar, H., Gangoda, L., Ang, C.S., Sieber, O., Mariadason, J., Dasgupta, R., Chilamkurti, N., Mathivanan, S.: Colorectal cancer atlas: an integrative resource for genomic and proteomic annotations from colorectal cancer cell lines and tissue. Nucl. Acids Res. 44, D969–D974 (2016). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702801/. Accessed 17 May 2018
20
M. Luca et al.
50. Colorectal Cancer Atlas. http://www.colonatlas.org. Accessed 11 Apr 2018 51. Messmann, H.: Atlas of Colonoscopy Techniques Diagnosis Interventional Procedures, Colonoscopy Atlases, Thieme Stuttgart, New York (2006). http://www.colonoscopy.ru/ books/rar/Atlas%20of%20Colonoscopy.pdf 52. Image of the week. https://www.endoscopy-campus.com/en/image-of-the-week/. Accessed 17 May 2018 53. Surya Prasath, V.B.: Polyp detection and segmentation from video capsule endoscopy: a review. Open Access J. Imaging 3(1), 1 (2017). http://www.mdpi.com/2313-433X/3/1/1/htm 54. Summers, R.M., Liu, J., Yao, J., Brown, L., Choi, J.R., Pickhardt, P.J.: Automated measurement of colorectal polyp height at CT colonography: hyperplastic polyps are flatter than adenomatous polyps. AJR Am. J. Roentgenol. 5(193), 1305–1310 (2009). https://www. ncbi.nlm.nih.gov/pmc/articles/PMC3412299/. Accessed 17 May 2018 55. Van Ravesteijn, V.F., van Wijk, C., Vos, F.M., Truyen, R., Peters, J.F., Stoker, J., van Vliet, L.J.: computer-aided detection of polyps in CT colonography using logistic regression. IEEE Trans. Med. Imaging 29(1), 120–131 (2010). https://www.researchgate.net/publication/ 224574408_Computer-Aided_Detection_of_Polyps_in_CT_Colonography_Using_ Logistic_Regression. Accessed 17 May 2018 56. Bashar, M.K., Kitasaka, T., Suenaga, Y., Mekada, Y., Mori, K.: Automatic detection of informative frames from wireless capsule endoscopy images. Med. Image Anal. 14(3), 449– 470 (2010). https://www.ncbi.nlm.nih.gov/pubmed/20137998. Accessed 17 May 2018 57. Van Wijk, C., van Ravesteijn, V.F., Vos, F.M., van Vliet, L.J.: Detection and segmentation of colonic polyps on implicit isosurfaces by second principal curvature flow. IEEE Trans. Med. Imaging 29(3), 688–698 (2010). http://citeseerx.ist.psu.edu/viewdoc/download?doi= 10.1.1.918.7236&rep=rep1&type=pdf. Accessed 17 May 2018 58. Muthukudage, J., Oh, J., Tavanapong, W., Wong, J., de Groen, P.C.: Color based stool region detection in colonoscopy videos for quality measurements. In: Ho, Y.-S. (ed.) PSIVT 2011. LNCS, vol. 7087, pp. 61–72. Springer, Heidelberg (2011). https://doi.org/10.1007/ 978-3-642-25367-6_6 59. Muthukudage, J.K.: Automated real-time objects detection in colonoscopy videos for quality measurements. Ph.D. thesis, Computer Science and Engineering, University of North Texas (2013). https://pdfs.semanticscholar.org/3344/bb2977efbaf5a541686427f436d3ba5663ee. pdf.Accessed 17 May 2018 60. Mamonov, A.V., Figueiredo, I.N., Figueiredo, P.N., Tsai, Y.-H.R.: Automated polyp detection in colon capsule endoscopy. IEEE Trans. Med. Imaging 33, 1488–1502 (2014). https://arxiv.org/pdf/1305.1912.pdf. Accessed 18 May 2018 61. Tajbakhsh, N., Chi, C., Sharma, H., Wu, Q., Gurudu, S.R., Liang, J.: Automatic assessment of image informativeness in colonoscopy. In: ABDI@MICCAI 2014, pp. 151–158 (2014) 62. Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans. Med. Imaging 35(2), 630–644 (2016). https://www.ncbi.nlm.nih.gov/pubmed/26462083. Accessed 16 May 2018 63. CVC Colon DB. http://www.cvc.uab.es/CVC-Colon/index.php/databases/. Accessed 12 May 2018 64. ASU Mayo DB. https://polyp.grand-challenge.org/site/polyp/asumayo/. Accessed 7 May 2018 65. Geetha, K., Rajan, C.: Automatic colorectal polyp detection in colonoscopy video frames. Asian Pac. J. Cancer Prev. 17(11), 4869–4873 (2016). https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC5454689/. Accessed 17 May 2018 66. Yuan, Y., Li, B., Meng, M.Q.-H.: Improved bag of feature for automatic polyp detection in wireless capsule endoscopy images. IEEE Trans. Autom. Sci. Eng. 13(2), 529–535 (2016), https://ieeexplore.ieee.org/document/7052426/. Accessed 15 Apr 2018
An Overview on Computer Processing for Endoscopy and Colonoscopy Videos
21
67. Hwang, S.: Bag-of-visual-words approach to abnormal image detection in wireless capsule endoscopy videos. In: Advances in Visual Computing, pp. 320–327. Springer, Heidelberg (2011) 68. Charisis, V.S., Hadjileontiadis, L.J.: Potential of hybrid adaptive filtering in inflammatory lesion detection from capsule endoscopy images. World J. Gastroenterol. 22(39), 8641–8657 (2016). Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license. https:// www.ncbi.nlm.nih.gov/pmc/articles/PMC5075542/. Accessed 17 May 2018 69. Charisis, V.S., Katsimerou, C., Hadjileontiadis, L.J., Liatsos, C.N., Sergiadis, G.D., Computer-aided capsule endoscopy images evaluation based on color rotation and texture features: An educational tool to physicians. In: IEEE 26th International Symposium on Computer-Based Medical Systems (CBMS 2013), 20–22 June 2013 (2013) 70. Manivannan, S.: Visual feature learning with application to medical image classification. Doctoral dissertation, University of Dundee, pp. 1–154 (2015). https://discovery.dundee.ac. uk/ws/files/7584173/Thesis.pdf. Accessed 17 May 2018 71. Bchir, O., Ismail, M.M.B., Aseem, A.L.A.: Empirical comparison of visual descriptors for ulcer recognition in wireless capsule endoscopy video. In: Wyld, D.C., Zizka, J. (eds.) 4th International Conference on Image Processing and Pattern Recognition, IPPR 2018, Computer Science & Information Technology, Copenhagen, Denmark. http://aircconline. com/csit/csit885.pdf. Accessed 17 May 2018 72. Vazquez, D., Bernal, J., Sanchez, F.J., Fernandez-Esparrach, G., Lopez, A., Romero, A., et al.: A benchmark for endoluminal scene segmentation of colonoscopy images. J. Healthcare Eng. (2017). 9 p. Creative Commons Attribution License. https://www. hindawi.com/journals/jhe/2017/4037190/. Accessed 9 May 2018 73. Sanchez, F.J., Bernal, J., Sanchez-Montes, C., Rodriguez de Miguel, C., FernandezEsparrach, G.: Bright spot regions segmentation and classification for specular highlights detection in colonoscopy videos. J. Mach. Vis. Appl. 1–20. http://refbase.cvc.uab.es/files/ SBS2017.pdf. Accessed 11 May 2018 74. Angermann, Q., Bernal, J., Sanchez-Montes, C., Fernandez-Esparrach, G., Gray, X., Romain, O., et al.: Towards real-time polyp detection in colonoscopy videos: adapting still frame-based methodologies for video sequences analysis. In: 4th International Workshop on Computer Assisted and Robotic Endoscopy, pp. 29–41 (2017) 75. Bernal, J., Sanchez, J., Vilariño, F.: Towards automatic polyp detection with a polyp appearance model. Pattern Recogn. 45(9), 3166–3182 (2012). https://www.sciencedirect. com/science/article/pii/S0031320312001185. Accessed 3 May 2018 76. Fernandez-Esparrach, G., Bernal, J., Lopez-Ceron, M., Cordova, H., Sanchez-Montes, C., de Miguel, C.R.¸et al.: Exploring the clinical potential of an automatic colonic polyp detection method based on energy maps creation. Endoscopy 48(9), 837–842 (2016). http://refbase. cvc.uab.es/files/FBL2016.pdf. Accessed 17 May 2018 77. Byrne, M.F., Chapados, N., Soudan, F., Oertel, C., Linares Pérez, M., Kelly, R., Iqbal, N., Chandelier, F., Rex, D.K.: Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model. Gut BMJ J. Open Access Creative Commons. https://gut.bmj. com/content/early/2017/11/09/gutjnl-2017-314547?int_source=trendmd&int_medium= trendmd&int_campaign=trendmd. Accessed 17 May 2018 78. Leitenberger, A.: Automatic polyp detection shows promise for assisting colonoscopy. https://www.healio.com/gastroenterology/interventional-endoscopy/news/online/% 7B53d5a435-2d7c-4ab7-ac8a-f0ef943a2b40%7D/automatic-polyp-detection-showspromise-for-assisting-colonoscopy?nc=1. Accessed 12 May 2018
22
M. Luca et al.
79. Pu, W., et al.: Abstract 4. In: Presented at: World Congress of Gastroenterology at American College of Gastroenterology Annual Scientific Meeting, Orlando, FL, 13–18 October 2017 (2017) 80. Leitenberger A.: Artificial intelligence system automatically detects polyps during colonoscopy. https://www.healio.com/gastroenterology/interventional-endoscopy/news/ online/%7Bde096b7d-3cf9-408a-81b5-19323ad22b9a%7D/artificial-intelligence-systemautomatically-detects-polyps-during-colonoscopy. Accessed 17 May 2018 81. UEG, United European Gastroenterology Week. https://www.ueg.eu/week/. Accessed 17 May 2018 82. Mori, Y., et al.: Abstract OP001. In: Presented at: UEG Week, Barcelona, 28 October–1 November 1 2017 (2017) 83. Akbari, M., Mohrekesh, M., Nasr-Esfahani, M., Soroushmehr, S.M.R., Karimi, N., Samavi, S., Najarian, K.: Polyp segmentation in colonoscopy images using fully convolutional network 2018, Cornell University Library (2018). https://arxiv.org/ftp/arxiv/papers/1802/ 1802.00368.pdf. Accessed 17 May 2018 84. MICCAI. https://endovis.grand-challenge.org/endoscopic_vision_challenge/. http://endovis. grand-challenge.org/program/. Accessed 16 May 2018
Fuzzy System for Classification of Nocturnal Blood Pressure Profile and Its Optimization with the Crow Search Algorithm Ivette Miramontes, Patricia Melin(&), and German Prado-Arechiga Tijuana Institute of Technology, Tijuana, Mexico [email protected]
Abstract. Over time, different metaheuristics have been used for optimization in different soft computing techniques, such as fuzzy systems and neural networks. In this work we focus on the optimization of fuzzy systems with the Crow Search Algorithm. The fuzzy systems are designed to provide the classification of the patient’s night blood pressure profile. For this goal, two fuzzy systems are designed, one with trapezoidal membership functions and the other with Gaussian membership functions, to study their corresponding performances. When observing the results of the aforementioned study, it was decided to carry out the optimization to improve the classification of the nocturnal blood pressure profile of the patients and thereby provide a more accurate diagnosis. After carrying out the experimentation and once the different optimized fuzzy systems have been tested, it was concluded that the fuzzy system with Gaussian membership functions provides a better classification in a sample of 30 patients. Keywords: BP (blood pressure) ABPM (ambulatory blood pressure monitoring) Nocturnal blood pressure profile Systolic pressure Diastolic pressure Fuzzy system Bio-inspired algorithm
1 Introduction Optimization is the mathematical process or experimentation to find the best desirable solution to a problem. The optimization process has the following three parts: Inputs consisting in variables; the process or function is known as objective function or fitness function; and the output is the cost or the corresponding physical form [1]. Bio-inspired algorithms, as the name says, are metaheuristics inspired by nature, which try to imitate the behavior of biological systems, and as examples we have: the Grey Wolf Optimizer [2], Firefly Algorithm [3], Flower Pollination Algorithm [4], Chicken Swarm Optimization [5], Monarch Butterfly Optimization [6], just to mention a few. Bio-inspired algorithms are very useful for the optimization of different problems, either in maximization or minimization. Fuzzy inference systems are a very useful intelligent computing technique to deal with uncertainty, and for this reason have been used in different areas, such as control [7, 8], for dynamic adaptation of parameters in algorithms [9, 10], as response integrators of modular neural networks [11–13], among others. © Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 23–34, 2021. https://doi.org/10.1007/978-3-030-52190-5_2
24
I. Miramontes et al.
In this work, we are using the Crow Search Algorithm (CSA), to find the best parameters in a fuzzy system for achieving optimal classification of nocturnal blood pressure profile. This paper has been organized as follows: in Sect. 2 the literature review is described, in Sect. 3 the proposed method is presented, in Sect. 4 a methodology description is presented, in Sect. 5 the results and discussions are presented and in Sect. 6 the conclusions and future works are outlined.
2 Literature Review This section presents the basic concepts necessary to understand the proposed method: 2.1
Crow Search Algorithm
Askarzadeh proposed the CSA in 2016 [14], which is based on the intelligence of the crows, how they observe other birds to know where they hide their food and steal it once the owner leaves. In addition, once it has stolen the crow takes precautions, such as moving secretly to avoid being a future victim. The crow also uses its own experience of being a thief to predict the behavior of another thief and to determine the safest course to protect the hiding places from being stolen. The main principles of CSA are listed as follows: The crows live in a flock. The crows memorize the position of their hiding places. Crows chase each other to steal. Crows protect their hiding places from being robbed with a certain probability value. The parameters used by the algorithm are: – N which corresponds to the number of crows. – Iter represents the iterations. – Fl is the flight length. When using small values in Fl, a local search is performed, and when using large values, a global search is performed. – Ap is the Awareness probability, which helps diversification when large values are used, while using small values helps intensification. This means that when the AP value decreases, the algorithm performs a local search where it has found a good solution and when the AP value increases, the probability of searching near good solutions decreases and the CSA tends to perform a global search. 2.2
Fuzzy Logic
Fuzzy logic was proposed by L.A Zadeh in 1965, as an alternative to classical logic, to approximate non-probabilistic reasoning method, which greatly facilitates the modeling of qualitative information in an approximate way, has the ability to work with inaccurate or incomplete data and simulates the way in which humans think [15].
Fuzzy System for Classification of Nocturnal Blood Pressure
2.3
25
Blood Pressure and Hypertension
Blood pressure is defined as the force applied against the walls of the blood vessel as the heart pumps blood, carrying oxygen and nutrients throughout the body. The components of the blood pressure are the systolic pressure, which is when the heart contracts and the diastolic pressure, when heart relaxes between beats, and is measured in millimeters of mercury (mmHg) [16]. A narrowing of very small arteries called arterioles, which regulate blood flow in the body, causes high blood pressure. As these arterioles narrow (or contract), the heart has to make more effort to pump blood through a smaller space, and with it the pressure inside the blood vessels increases [17]. Normal blood pressure in an adult is below 120/80 mmHg, when blood pressure is equal to or greater than 140/90 mmHg this is considered high blood pressure or hypertension [18, 19]. 2.4
Nocturnal Blood Pressure Profile
As the body relaxes and enters a state of rest, blood pressure tends to fall; this is a normal behavior and happens regularly in the night period. This behavior is considered by a decrease of between 10–20% of the nighttime blood pressure compared with daytime and is known as dipper profile [20]. When the decrease in nocturnal blood pressure is less than 10%, a non-dipper pattern is considered. There is also another behavior, which is characterized by the decrease of more than 20% in the night blood pressure compared with the blood pressure in the day; this is known as extreme dipper pattern [20]. Table 1. Ranges of nocturnal profile Profile Extreme Dipper Dipper Non Dipper Riser
Range 1
Finally, when the resting blood pressure is higher than the daytime period, a riser pattern is considered. Feria et al. [21] defines the dipper/no dipper pattern by means of the night/day quotient, and in Table 1 the ranges for the different night profiles are presented. The absence of nocturnal BP descent, the non-dipper pattern, has classically been associated with greater risk and worse cardiovascular prognosis than the dipper pattern [21].
26
I. Miramontes et al.
3 Proposed Method A computational model for obtaining the risk of developing hypertension or a cardiovascular event was created in [22–24], which combines different soft computing techniques, such as fuzzy inference systems (FIS) and modular neural networks (MNN). In particular, this work is focus on the part of the fuzzy system for the classification of nocturnal blood pressure profile of patients and its optimization. In Fig. 1, the proposed model is presented and the fuzzy system to be optimized is highlighted. To obtain the patient’s nocturnal profile, blood pressure measurements with ambulatory blood pressure monitoring (ABPM) [25] are collected, which is a sphygmomanometer connected to a device, which collects these measurements every 15– 20 min in daytime and every 30 min in the nighttime in a period of 24 h. Until now, a database of 300 cases has been colected from ABPMs of patients from the Cardio Diagnostic Center of Tijuana and students of the doctorate and master in computer science from Tijuana Institute of Technology.
Fig. 1. Proposed method for obtaining a medical diagnosis
Once the information is obtained, the blood pressure readings of the daytime and nighttime are separated to obtain the quotient of the systolic and diastolic pressure, which is the input to the fuzzy system, to obtain as output the nocturnal blood pressure profile of the patient.
4 Methodology A Mamdani fuzzy inference system to classify the nocturnal blood pressure profile is designed, as presented in Fig. 2, and it was designed based on [21] and in the expert’s experience. It has two inputs, the first is the quotient of the systolic pressure and the
Fuzzy System for Classification of Nocturnal Blood Pressure
27
second input is the quotient of the diastolic pressure, and as output the patient’s nocturnal profile is considered. The linguistic values used on the inputs are “GreaterFall”, “Fall”, “Increase” and “GreaterIncrease”, linguistic terms. For the output, the linguistic values used are “Extreme Dipper”, “Dipper”, “Non Dipper”, “Riser”.
Fig. 2. Nocturnal blood pressure profile fuzzy system
It should be noted that the fuzzy system was tested with trapezoidal and Gaussian membership functions respectively.
Fig. 3. (a) Input for systolic, (b) Input for diastolic (c) Output for profile.
28
I. Miramontes et al.
In Fig. 3, the input and output variables for Gaussian membership functions are presented; for both inputs and output, the range is considered from 0.4 to 1.3, since this is the indicated range by the literature on which this work is based. The rule set of the fuzzy system contains four rules, which are listed below: 1. If systolic is Greater Fall and diastolic is Greater Fall then Profile is Extreme Dipper. 2. If systolic is Fall and diastolic is Fall then Profile is Dipper. 3. If systolic is Increase and diastolic is Increase then Profile is Non Dipper. 4. If systolic is Greater Increase and diastolic is Greater Increase then Profile is Greater Increase. 4.1
Fuzzy System Optimization
The parameters in the membership functions of the fuzzy systems are optimized to improve their performance, for this the CSA was used. In this case study, as mentioned above, the FIS is tested with Gaussian and trapezoidal membership functions, which are optimized to observe which of them obtains a better classification. This is because when performing different experiments there were certain cases in which a correct classification was not obtained.
Fig. 4. Individual representation of the trapezoidal fuzzy system
In Fig. 4, the representation of individuals for the fuzzy system with trapezoidal membership functions is presented. From positions 1 to 16 the systolic input is represented, from positions 17 to 32 the diastolic input is represented, and from positions 33 to 48 it is the nocturnal blood profile output. In Fig. 5, the representation of individuals for the fuzzy system with Gaussian membership functions is presented. For this case, in positions 1 to 8 the systolic input is represented, from position 9 to 16 the diastolic input is represented, and from the position 17 to 24 the blood pressure profile is represented. These positions are adjusted through the CSA to improve the classification and provide an accurate diagnosis.
Fuzzy System for Classification of Nocturnal Blood Pressure
29
Fig. 5. Individual representation in Gaussian fuzzy system
Different experiments are carried out with the CSA, being the following parameters with which better results are obtained: a) b) c) d) e)
Dimensions: 24 Flock size: 85 Iterations: 235 Awareness probability: 0.2 Flight length: 2
The Mean Squared Error was used as the fitness function, with which it is aimed at minimizing the classification error for obtaining the best solution. MSE ¼
2 1 Xn ^ Yi Yi i¼1 2
ð1Þ
5 Results and Discussion For the optimization process, 30 experiments were carried out with the two proposed fuzzy systems, in which the parameters of the CSA were varied, and are presented in Table 2, where N is the number of individuals, AP is the awareness probability, for this work, random numbers between 0 to 1 were used, Iter the number of iterations, and Pd is the number of dimensions, in this case. The dimensions correspond to the total points of the fuzzy system, which, for the fuzzy system with the trapezoidal membership functions are 48 (represented in column 5) and for the fuzzy system with Gaussian membership functions are 24 (represented in column 6) and fl is the flight length, when random numbers between 0 to 2 were used.
30
I. Miramontes et al. Table 2. CSA parameters for each experiment and percent of success No Parameters N AP Iter 1 5 0.5 4000 2 10 0.3 2000 3 15 0.7 1333 4 18 0.9 1111 5 20 0.2 1000 6 23 0.8 869 7 26 0.1 769 8 30 0.4 666 9 33 0.6 606 10 35 0.7 571 11 38 0.8 526 12 40 0.5 500 13 43 0.3 465 14 47 0.2 425 15 50 0.4 400 16 52 0.6 384 17 55 0.2 363 18 60 0.1 333 19 64 0.9 312 20 67 0.7 298 21 70 0.4 285 22 72 0.5 277 23 75 0.3 266 24 80 0.4 250 25 85 0.2 235 26 87 0.6 229 27 90 0.1 222 28 92 0.8 217 29 96 0.9 208 30 100 0.5 200
PdT 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48
PdG 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24
fl 2 0.5 0.8 1 1.5 2 0.4 1.3 1.8 0.7 0.3 1 1.2 0.1 0.9 1.7 2 0.3 0.6 1.6 1.4 0.2 0.7 1.1 2 0.5 0.8 1 0.4 0.7
% of success Trap. Gauss. 86.25 72.25 81.25 72.5 87.5 95 87.5 78.75 81.25 88.75 88.75 86.25 88.75 75 86.25 83.75 86.25 91.25 86.25 76.25 87.5 78.75 88.75 90 88.75 80 87.5 80 86.25 92.5 88.75 76.25 87.5 87.5 87.5 88.75 87.5 90 87.5 83.75 88.75 83.75 88.75 78.75 86.25 75 87.5 88.75 88.75 95 88.75 91.25 87.5 92.5 86.25 90 88.75 85 88.75 93.75
The last two columns of Table 2 are presenting the percentage of success for each of the experiments, column seven shows the results of the fuzzy system with trapezoidal membership functions and column eight the results of the fuzzy system with the Gaussian membership functions. It should be noted that the high percentage for trapezoidal fuzzy was 88.75% in the experiments 6, 7, 12, 13, 16, 21 and 22. For the fuzzy system with Gaussian membership functions, the highest percentage was 95% in experiments 3 and 25. Once having the FIS with the highest percentage of success in the optimization tests, an experiment is carried out with 30 patients to observe how many of these are
Fuzzy System for Classification of Nocturnal Blood Pressure
31
classified correctly, having that the FIS with Gaussian membership functions classifies 100% of the patients correctly, while the trapezoidal FIS only classifies 76.6% of patients correctly. In Table 3, the comparison between the optimized Trapezoidal FIS and the optimized Gaussian FIS is presented, where, Ex.Dp is Extreme Dipper profile Dp is Dipper profile, NonDp is Non Dipper profile and Rsr is Riser profile, the numbers highlighted in red are the errors in the classification compared with the real values. The real values Table 3. Comparison of optimized fuzzy systems Patient Real Optimized fuzzy model results Trapezoidal Gaussian results results 1 0.76 0.61 Ex.Dp 0.71 Ex.Dp 2 0.89 0.86 Dp 0.89 Dp 3 0.81 0.86 Dp 0.84 Dp 4 0.82 0.86 Dp 0.84 Dp 5 0.91 0.85 Dp 0.91 NonDp 6 0.87 0.86 Dp 0.87 Dp 7 0.77 0.85 Dp 0.78 Ex.Dp 8 0.9 0.85 Dp 0.91 NonDp 9 0.94 0.96 NonDp 0.93 NonDp 10 0.83 0.85 Dp 0.84 Dp 11 0.92 0.85 Dp 0.93 NonDp 12 1.03 1.1 Rsr 1.03 Rsr 13 0.84 0.86 Dp 0.85 Dp 14 1.07 1.16 Rsr 1.15 Rsr 15 0.91 0.85 Dp 0.92 NonDp 16 0.82 0.86 Dp 0.84 Dp 17 0.86 0.85 Dp 0.85 Dp 18 0.9 0.85 Dp 0.91 NonDp 19 0.84 0.85 Dp 0.85 Dp 20 0.93 0.85 Dp 0.94 NonDp 21 0.93 0.96 NonDp 0.94 NonDp 22 0.83 0.86 Dp 0.84 Dp 23 0.92 0.97 NonDp 0.92 NonDp 24 0.72 0.61 Ex.Dp 0.61 Ex.Dp 25 0.85 0.86 Dp 0.85 Dp 26 0.89 0.85 Dp 0.89 Dp 27 0.89 0.85 Dp 0.89 Dp 28 0.93 0.96 NonDp 0.93 NonDp 29 0.94 0.96 NonDp 0.94 NonDp 30 0.83 0.86 Dp 0.85 Dp
32
I. Miramontes et al.
of the nocturnal blood pressure profile are obtained by means of the night/day quotient of the systolic and diastolic blood pressure readings, respectively, obtained in a 24-h period through the ABPM. It can be noted that the first FIS with trapezoidal membership functions classifies correctly 76.6% of the cases while the FIS with Gaussian membership functions classifies 100% of patients correctly. Table 4 shows the percentages of classification of the four FIS tested in the experiments.
Table 4. Comparison of Gaussian fuzzy system No optimized fuzzy system Trapezoial Gaussian 76.6% 76.6%
Optimized fuzzy system Trapezoidal Gaussian 76.6% 100%
6 Conclusions and Future Work In this work, the optimization of the membership functions of the fuzzy system for the classification of the nocturnal blood pressure profile in patients is performed, obtaining this result is very important since different studies relate the elevation of the blood pressure at night with the risk of suffering cardiovascular events, hence the intention is to provide an accurate diagnosis. To carry out the optimization of fuzzy systems, the CSA was used, and with which 30 different experiments were carried out, changing the parameters thereof, obtaining good results, and for this reason this bio-inspired algorithm is considered a good optimization technique. Fuzzy systems with trapezoidal and Gaussian membership functions were tested, this to observe with which a better classification was obtained, and once the different optimized fuzzy systems were tested, it is obtained as a result that the FIS with Gaussian membership functions provides 100% the correct classification for the group of patients studied. As future work, this FIS will be implemented in interval type 2 fuzzy logic, then optimizing it to observe its operation and making the corresponding comparisons with the FIS using type-1 fuzzy logic. Acknowledgment. The authors would like to express their thanks to the Consejo Nacional de Ciencia y Tecnologia and Tecnologico Nacional de Mexico/Tijuana Institute of Technology for the facilities and resources granted for the development of this research.
References 1. Haupt, R.L., Haupt, S.E.: Practical Genetic Algorithms, 2nd edn. Wiley, Hoboken (2004) 2. Mirjalili, S., Mirjalili, S.M., Lewis, A.: Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014) 3. Yang, X.-S.: Firefly algorithm, Lévy flights and global optimization. In: Research and Development in Intelligent Systems XXVI, pp. 209–218 (2010)
Fuzzy System for Classification of Nocturnal Blood Pressure
33
4. Yang, X.S., Karamanoglu, M., He, X.: Flower pollination algorithm: a novel approach for multiobjective optimization. Eng. Optim. 46(9), 1222–1237 (2014) 5. Meng, X., Liu, Y., Gao, X., Zhang, H.: A new bio-inspired algorithm: chicken swarm optimization. In: Advances in Swarm Intelligence, pp. 86–94 (2014) 6. Wang, G.G., Deb, S., Cui, Z.: Monarch butterfly optimization. Neural Comput. Appl. 31(7), 1995–2014 (2019) 7. Martínez, R., Castillo, O., Aguilar, L.T.: Optimization of interval type-2 fuzzy logic controllers for a perturbed autonomous wheeled mobile robot using genetic algorithms. In: Castillo, O., Melin, P., Kacprzyk, J., Pedrycz, W. (eds.) Soft Computing for Hybrid Intelligent Systems, pp. 3–18. Springer, Heidelberg (2008) 8. Lagunes, M.L., Castillo, O., Soria, J.: Methodology for the optimization of a fuzzy controller using a bio-inspired algorithm. In: Fuzzy Logic in Intelligent System Design, pp. 131–137 (2018) 9. Méndez, E., Castillo, O., Soria, J., Melin, P., Sadollah, A.: Water cycle algorithm with fuzzy logic for dynamic adaptation of parameters. In: Advances in Computational Intelligence, pp. 250–260 (2017) 10. Bernal, E., Castillo, O., Soria, J.: A fuzzy logic approach for dynamic adaptation of parameters in galactic swarm optimization. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–7 (2016) 11. Hidalgo, D., Castillo, O., Melin, P.: Type-1 and type-2 fuzzy inference systems as integration methods in modular neural networks for multimodal biometry and its optimization with genetic algorithms. Inf. Sci. (NY) 179(13), 2123–2145 (2009) 12. Melin, P., Pulido, M.: Optimization of ensemble neural networks with type-2 fuzzy integration of responses for the dow jones time series prediction. Intell. Autom. Soft Comput. 20(3), 403–418 (2014) 13. Urias, J., Melin, P., Castillo, O.: A method for response integration in modular neural networks using interval type-2 fuzzy logic. In: 2007 IEEE International Fuzzy Systems Conference, pp. 1–6 (2007) 14. Askarzadeh, A.: A novel metaheuristic method for solving constrained engineering optimization problems: crow search algorithm. Comput. Struct. 169(Suppl. C), 1–12 (2016) 15. Jang, J.S.R., Sun, C.T., Mizutani, E.: Neuro-fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence. Prentice Hall, Englewood Cliffs (1997) 16. Wilson, J.M.: Essential cardiology: principles and practice. Tex. Heart Inst. J. 32(4), 616 (2005) 17. Texas Heart Institute: High Blood Pressure (Hypertension) (2017) 18. Guido, M.G.G.: Manual of Hypertension of the European Society of Hypertension. Taylor & Francis, Boca Raton (2008) 19. Wizner, B., Gryglewska, B., Gasowski, J., Kocemba, J., Grodzicki, T.: Normal blood pressure values as perceived by normotensive and hypertensive subjects. J. Hum. Hypertens. 17(2), 87–91 (2003) 20. Friedman, O., Logan, A.G.: Nocturnal blood pressure profiles among normotensive, controlled hypertensive and refractory hypertensive subjects. Can. J. Cardiol. 25(9), e312– e316 (2009) 21. Feria-carot, M.D., Sobrino, J.: Nocturnal hypertension. Hipertens. y riesgo Cardiovasc. 28 (4), 143–148 (2011) 22. Miramontes, I., Martínez, G., Melin, P., Prado-Arechiga, G.: A hybrid intelligent system model for hypertension diagnosis. In: Melin, P., Castillo, O., Kacprzyk, J. (eds.) NatureInspired Design of Hybrid Intelligent Systems, pp. 541–550. Springer, Cham (2017)
34
I. Miramontes et al.
23. Miramontes, I., Martínez, G., Melin, P., Prado-Arechiga, G.: A hybrid intelligent system model for hypertension risk diagnosis. In: Fuzzy Logic in Intelligent System Design, pp. 202–213 (2018) 24. Melin, P., Miramontes, I., Prado-Arechiga, G.: A hybrid model based on modular neural networks and fuzzy systems for classification of blood pressure and hypertension risk diagnosis. Expert Syst. Appl. 107, 146–164 (2018) 25. O’Brien, E., Parati, G., Stergiou, G.: Ambulatory blood pressure measurement. Hypertension 62(6), 988–994 (2013)
A Survey of Modern Gene Expression Based Techniques for Cancer Detection and Diagnosis Hafiz ur Rahman1 , Muhammad Arif1 , Sadam Al-Azani2 , Emad Ramadan2 , Guojun Wang1(B) , Jianer Chen1 , Teodora Olariu3 , and Iustin Olariu3 1
3
School of Computer Science, Guangzhou University, Guangzhou 510006, China hafiz [email protected], [email protected], [email protected], [email protected] 2 Department of Information and Computer Science, King Fahd University of Petroleum and Minerals, Dhahran, Kingdom of Saudi Arabia {g201002580,eramadan}@kfupm.edu.sa Faculty of Medicine, Vasile Goldis Western University of Arad, Arad, Romania olariu [email protected], iustin [email protected]
Abstract. Cancer is a leading cause of death and majority of cancer patients are diagnosed in the late stages of cancer by using conventional methods. The gene expression microarray technology is applied to detect and diagnose most types of cancers in their early stages. Furthermore, it allows researchers to analyze thousands of genes simultaneously. To acquire knowledge from gene expression data, data mining methods are needed. Due to the rapid evolution of cancer detection and diagnosis techniques, a survey of modern techniques is desirable. This study reviews and provide a detailed description of these techniques. As a result, it helps to enhance existing techniques for cancer detection and diagnosis as well as guiding researchers to develop new approaches.
1
Introduction
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product [1]. These products are often proteins, which usually perform different functions such as hormones and enzymes. However non-protein coding genes such as transfer RNA (tRNA) or small nuclear RNA (snRNA) are functional RNA products [2,3]. Clinical and pathological information might be incomplete or misleading for cancer detection and diagnosis [4–6]. Cancer detection and diagnosis can be objective and highly accurate using microarrays, which have the ability to support clinicians with the information to choose the most appropriate forms of treatment [7,8]. Considering detection and diagnosis cancer using gene expressions as classification problem is traced back to the work of [9,10] which classify acute leukemias of human as a test case. Practically, classification problems with large inputs and small samples are challenging and suffers over-fitting. The major c Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 35–50, 2021. https://doi.org/10.1007/978-3-030-52190-5_3
36
H. Rahman et al.
challenging in microarrays is the curse of dimensionality such that they include large number of genes. To alleviate this limitation, researchers proposed several features (gene) selection techniques that could select the most significant genes and investigated several statistical and machine learning classification methods [11,12]. Several survey studies have been conducted to review techniques of cancer detection and diagnosis. For example, Rathore et al. [13] performed a survey study for colon cancer detection techniques, while Lemaitre et al. [14] reviewed prostate cancer detection and diagnosis based on mono and multi-parametric. Furthermore, Saha et al. [15] presented a review for breast cancer detection based on mammography and thermography techniques. Therefore, to our knowledge, this is the first survey that highlights and reviews gene expression techniques for cancers’ detection and diagnosis. In general, the process of cancer detection and diagnosis based on gene expressions includes several phases as shown in Fig. 1. A brief description of each phase is given below: 1. Representation Method: Gene expressions are formed in a factored representation, i.e., the datasets represented in a vector of attribute values and outputs the discrete value. 2. Feature Selection: Using various feature selection strategies to select the most significant genes. 3. Generating the Model: The selected genes in the previous phase are used as inputs to statistical or machine learning classifiers to generate the model. This step also includes training and testing data formulation methods, there are several methods included, cross-validation, k-fold cross validation and leave-one-out cross validation. 4. Classification: The generated model in the previous step is used to classify genes based samples into categories for example, normal and malignant classes. The aforementioned steps might include pre-processing or post-processing operations such as normalization, eliminating missing values, optimization, etc. In this study, we review the modern techniques for detection and diagnosis cancers based on gene expressions. The rest of the paper is organized as follows. Section 2, presents gene expression techniques for cancer detection and diagnosis. Section 3, provides evaluation criteria and comparison. We analyze and discuss the evaluation results in Sect. 4. Finally, we discuss future directions and conclude our survey in Sect. 5.
A Survey of Modern Gene Expression Based Techniques
37
Fig. 1. Flow diagram of cancer detection and diagnosis
2
Gene Expression Techniques for Cancer Detection and Diagnosis
In the following subsections we categorize the techniques into two categories, namely techniques that addresses binary class problem and multiple class problem. As for binary class problem, the techniques are applied to classify two states or stages of cancers while in multiple class problem, the techniques applied to identify several types or stages of cancers. 2.1
The Techniques Addressed Binary Classification Problems
Lee et al. [16] used neural network to classify leukemia and colon tumor and compared with popular classifiers including Multilayer Perceptron (MLP), ELM, and Support Vector Machine (SVM). As feature selection technique, Frequency Domain Gene Feature Selection (FGFS) algorithm was used [17]. For leukemia
38
H. Rahman et al.
dataset, FIR-ELM performs better than other classifiers in terms of accuracy rates and standard deviation such that an accuracy rate of 96.53%. It is followed by SVM where an accuracy rate of 95.50% is obtained. For colon dataset, however, SVM performs better than other classifier such that an accuracy rate of 79.76% is achieved with standard deviation of 3.57% which is followed by FIR-ELM classification algorithm. Lotfi and Keshavarz [18] proposed PCA-BEL approach for gene-expression microarray data classification using PCA (Principle Component Analysis) and brain emotional learning network. Different cancer datasets (i.e., high grade gliomas (HGG), lung, colon, and breast) are used for the validation of the proposed approach. The pros of using PCA-BEL is its low computational complexity. However, an average classification accuracy of 96%, 98.32%, 87.40%, and 88% respectively in these datasets is reported. Rathore et al. [19] used two feature selection techniques in sequence. The first feature selection technique, Chi-square takes the whole dataset and selects the discriminative subset gene which is used for the second feature selection technique as an input, which also selects the most discriminative gene’s subset among them. Then they applied SVM classification techniques with linear kernel to classify the selected genes into normal and malignant classes. They evaluated the proposed approach using three famous colon cancer datasets called KentRidge, Notterman, and E-GEOD-40966. In terms of computational time and classification accuracy, Rathore et al. [19] also reported that the proposed technique achieve classification rates of 91.94%, 88.89% and 94.29% for the aforementioned datasets respectively and perform better than the individual techniques (Chisquare and mRMR). In Bouazza et al. [20], three different selection techniques namely, Fisher, ReliefF, T-Statistics, and SNR (Signal-to-Noise Ratio) are compared to select the significant gene expression subset in three stranded cancer dataset: leukemia cancer, prostate cancer and colon cancer. The effectiveness of the gene selected techniques are then evaluated using Nearest Neighbors (K-NN) and SVM. It is reported that SNR is the most trusted technique for selecting the genes and SVM is the most accurate classifier in this study. Paul and Maji [21] presented an integrated approach for colon gene expression and protein–protein interaction data to identify disease genes. Firstly, they used a framework which is based on f-information for maximum relevance-maximum significance to chose a set of differentially gene expressions. Then, the selected genes are employed to build a graph using PPI data. A gene is considered as a biomarker gene if it is on the shortest path in the constructed graph. The experiments reveal that the 97 selected genes are identified highly associated with colon cancer. Evolutionary computation requires determining several parameters and requires computational time to be optimized. In order to address these limitations, Kim and Cho [22] proposed an approach using the standard GA-based ensemble optimization with different stage of integration methods and evolutionary search algorithms. Additionally, all the outputs from the M × N clas-
A Survey of Modern Gene Expression Based Techniques
39
sification methods are used as input to meta-classification rather than eclectic ensemble approach. Here, M refers to the feature selection technique while N is the classifier. In addition, they combine three different clustering classifiers (K-NN, SVM and MLP) to identified the final and best classes of samples. The approach was evaluated using four benchmarking microarray datasets for colon, breast, prostate and lymphoma cancers. It is stated that the developed approach performed better than individual classifiers and other evolutionary computation approach. However, this approach need to evaluated using multiple class problems. Nguyen et al. [23] introduced a gene selection method based on a hierarchy process called MAHP. In this technique they used four different ranking techniques, namely: two-sample t-test, entropy test, Wilcoxon test, and signal to noise ratio. As a result, they able to quantitatively include statistics of individual gene. For different classification methods such as LDA (Linear Discriminant Analysis), K-NN, PNN (Probabilistic Neural Network), SVM, and MLP the selected genes are used as inputs. This technique is evaluated on several microarrays datasets (i.e., leukemia, prostate, colon, etc). The proposed MAHP technique was compared with other techniques: IG (Information Gain), SU, BD, and ReliefF Experimental results reveal that the proposed MAHP technique is compared favorably with other traditional feature selection techniques including, IG, SU, BD, and ReliefF in all cases. Banka and Dara [24] proposed a Hamming distance based binary Particle Swarm Optimization (HDBPSO) algorithm to select the most significant gene subsets. The HDBPSO method is tested on three standard bioinformatics datasets for colon, Lymphoma and leukemia. Several classification methods are employed to evaluate this method including BLR, BayesNet, Neuroblastoma (NB), LibLinear, SVM, MLP, J48, LMT and RF. Simjanoska, Bogdanova, and Popeska [25] analyzed the gene expression to classify the colon carcinogenic tissue. The Illumina HumanRef-8 v3.0 Expression BeadChip microarray technology was utilized to do the gene expression profiling which contains 26 colorectal tumors and 26 colorectal healthy tissues. An original methodology contains several steps was developed for biomarkers detection. The obtained biomarkers are then used as inputs to the Bayes’ classifier. Two other classifiers, SVM and BDT, are also used to be compared with the results obtained using Bayes’ theorem. It is reported that Bayes’ theorem performs better than SVM and BDT in this study in terms of accuracy, sensitivity and specificity. These findings were justified due to the realistic modeling of the priori probability of Bayes’ theorem. However, such classification method is somewhat complicated. The poriori probability model generated in [25] is then employed in Bogdanova, Simjanoska, and Popeska [26] by using Affymetrix Human Genome U133 Plus 2.0 Array with gene expression, the results revealed poor distinctive capability of the biomarker genes. That means the priori probability model is platform dependent. This finding confirms what was concluded in Wong, Loh, and Eisenhaber [27] where they stated each platform requires different statistical treatment [27]. Simjanoska, Madevska Bogdanova, and Popeska [28] generated
40
H. Rahman et al.
a statistical approach of gene expression values obtained from Affymetrix using the similar methodology of [25]. The findings revealed that an excellent performance in terms of accuracy, sensitivity and specificity were achieved using Bayes’ theorem when appropriate preprocessing methodology is followed. The results reported in [28] was then improved in Simjanoska and Bogdanova [29]. They proposed a filtering genes method to select the most essential biomarkers. It is called Leave-one-out method and is based on iterative Bayesian classification. Iteratively, the gene is considered as an essential biomarker for classification when if it is left out the accuracy rates are dropped significantly. For dimensionality gene expression problem in which we have large number of genes and very few number of sample Ibrahim et al. [30] integrated active learning and DBN unsupervised machine learning approaches in order to enhance classification accuracy. Experimental results reveal that this approach improve classical feature selection methods in F1- measure by 9% in Hepatocelluler Carcinoma (HCC) and 6% in Lung Cancer. To overcome for the problem of high dimensionality of gene expression. Wu, Zhang and Chan [31] also presented state of-the-art methodologies by introducing non-linear Maximum which is a Posteriori Probability and timevarying auto-regressive model (N-MAP-TVAR) for GRN reconstruction. For experiment yeast cell cycle dataset was selected, in which for 237 genes with 17 time points the total time was taken 10 min intervals to detect the changing regulatory mechanism. Burton et al. [32] compared seven classification methods: P-SVM, S-SVM, R-SVM, L-SVM, LR, RF, NNE7 in breast cancer to predict metastasis outcome. Voting classifier showed the best performance for breast cancer datasets. Fawzi et al. [33] Also, compared seven different machine learning techniques (SVM, RBF NN, MLP NN, Bayes, J48, RF, Bagging) by using microarray gene expression (MGE) data set of gene expression to detect breast cancer. The Experimental results reveal that the SVM and RBF NN is best for breast and other cancer types. An average classification accuracy of 97.5, 94.6, 96.5, 94.6, 93.3, 95.8, and 97.5 respectively is reported. 2.2
The Techniques Addressed Multiple Classification Problems
Tong et al. [34] introduced a GA based on SVM and gene pairs (GA-ESP). Different informative gene pairs are used to train the SVM by maximum scoring pair (TSP) criterion. The applicability of the proposed approach was evaluated on several cancer datasets in both binary-class and multiple class datasets. Support vector data description (SVDD)-based feature selection algorithms can not applied for multi class datasets and they are time consuming [35,36]. To solve the issues related to SVDD, based on multiple SVDD Cao et al. [37] presented a novel and fast feature selection method called MSVDD-RF. In addition, insignificant features are eliminated recursively. It is applied to multiple class microarray data to detect different cancers. The selected genes are then classified using K-NN and SVM. However, for colon cancer in this work they only solved the binary class problem.
A Survey of Modern Gene Expression Based Techniques
41
For coupling data dimension reduction and variable selection based on variable clustering for the original data set has also proposed by [38]. The effectiveness of the presented technique is then tested using two microarray dataset: acute Leukemia and SRBCT (Table 2). Table 1. Gene expression microarray platforms and datasets Dataset
Type of cancer Platform
Num. of classes
Num. of samples
Num. of genes
KentRidge
Colon
Affy
2
62: 40 malignant and 22 normal samples
6,500: [16, 18–20, Used 2,000 22, 23, 34, 37, 39, 40]
KentRidge
Leukemia
Affy
2
72: 47 ALL and 25 7,129 AML (Acute Myeloid Leukemia
[9, 16, 20, 23, 37]
BioGPS
Colon
Affy
2
130: 94 Malignant and 37 Normal
3
[39, 41]
Notterman
Colon
Affy
2
36: 18 Malignant and 18 Normal
7,457
[19, 39, 42]
E-GEOD40966
Colon
Affy
4
463 just used 2 stages I and II (208 and 142 )
5,851
[19, 39]
DLBCL
Lymphomas
–
2
77
7070
[23]
NCBI GEO
Colon
Illumina
2
154: 125 Tumors and 29 Normal
24, 526
[21, 43]
–
Prostate
Affy
2
102: 52 Tumor and 12, 533 50 Normal
[20, 22, 23, 44]
Leukemia3
Leukemia
Affy
3
40 : 19 B-ALL, 10 T-ALL 11 AML
[9, 37, 45]
–
Leukemia
–
2
72: 47 ALL and 25 AML
[37]
Acute Leukemia
Leukemia
cDNA
2
72: 38 ALL and 34 7129 AML
[38]
SRBCT
SRBCT
–
4
83: 11 BL, 29 2308 Ewing family of Tumors (EWS), 18 NB and 27 Rhabdomyosarcoma (RMS)
[38]
Novartis
Several Types Oligonucleotide (Affymetrix, Santa Clara, CA)
4
103: 26 Breast, 26 1000 Prostate , 28 Lung and 23 Colon
[36, 37, 46]
Lung Cancer
Lung
5
203:139ADEN, 12,600 21SQUA, 20COID, 6SCLC, 17NORMAL
[37, 47]
–
Breast
cDNA
4
97
24, 481
[18, 22, 48]
–
Lymphoma
cDNA
2
45
4,026
[22, 49]
GEO
Colon
Illumina
2
52: 26 Tumor and 26 Normal
24, 526
[20, 25]
Affymetrix Human Genome
Colon
Affy
2
64: 32 tumor and 32 Normal
54675, 21050
[26, 28, 29]
BRCA1/ BRCA2
Breast
Methylation2 specific Polymerase-chainreaction
14: 7 Carriers of the BRCA2 mutation, and 7 patients with sporadic cases of breast cancer
6512
[50]
Oligonucleotide (Affymetrix, Santa Clara, CA)
999
Ref
(contniued)
42
H. Rahman et al. Table 1. (contniued)
Dataset
Type of cancer Platform
Num. of classes
Num. of samples
Num. of genes
Ref
PubMed, G2SBC
Breast
–
2
49: 25 normal, 24 cancer
7129
[51]
GEO
Breast
Illumina
2
24: 14 sensitive, 10 12,625 insensitive
[51]
Amsterdam
Breast
Agilent/Rosetta
2
295, N +, N−
32, 418
[32]
Amsterdam
Brest
Agilent/Rosetta
1
151 N−
32, 418
[32]
Rotterdam (RO)
Breast
Affy
1
286 N−
32, 418
[32]
HUMAC
Breast
Spotted Oligonucleotides
1
60 N−
32, 418
[32]
Huang
Breast
Affy
1
52 N+
32, 418
[32]
Sotiriou2003
Breast
cDNA
2
99 N+, N−
32, 418
[32]
Sotiriou2006
Brest
Affy
2
179 N+, N−
32, 418
[32]
Uppsala
Breast
Affy
2
236, N+, N−
32, 418
[32]
Stockholm
Breast
Affy
2
159, N+, N-
32, 418
[32]
TRANSBIG
Breast
Affy
1
147, N−
32, 418
[32]
Mainz
Breast
Affym
1
200 N−
32, 418
[32]
WDBC
Breast
–
2
569: 357 Benign, 212 Malignant
–
[33]
MGE
Breast
–
2
84: 65 Tumors, 19 Cell lines
8102
[33]
Table 2. Gene expression approaches comparison Approach
Type of cancer
Feature selection
Classification method
[19]
Colon
A feed forward gene selection technique
SVM
[39]
Colon
Chi-square, F-score, PCA, rMRM
Ensemble SVM
[16]
Leukemia and Colon
FGFS
[22]
Colon, Breast, Prostate and Lymphoma
Four Similarity-based Methods, IG, MI and SNR
[23]
Diffuse Lymphoma, Leukemia MAHP, (IG, SU, BD, and Cancer, Prostate and Colon ReliefF)
LDA, K-NN, PNN, SVM and MLP
[24]
Colon, Lymphoma and Leukemia
HDBPSO and Others
BLR, BayesNet, NB, LibLinear, SVM, MLP, J48, LMT and RF
[37]
Colon, Leukemia, Lung
Multiple SVDD
K-NN, SVM
[34]
Several + Colon
TCP
Ensemble SVM
[21]
Colon
f–MRMS and (maximum relevance, rMRM)
K-NN
[20]
Leukemia, Prostate, Colon
Fisher, T-Statistics, SNR and ReliefF
SVM, K-NN
[18]
Several
PCA
BEL Network
[25]
Colon
Low Entropy Filter
Bayes’ Theorem, SVM and BDT
[26, 28]
Coloretical
Low Entropy Filter
Bayes’ Theorem
[29]
Colon
Low Entropy Filter, and Leave-one-out method
Bayes’ Theorem
[30, 31, 51] Several
T-Statistics, relief-F, Chi-Square
SVM, RF
[33]
Breast
Filtering Method
SVM, RBF NN, MLP NN, Bayes, J48, RF and Bagging
[33]
Breast
Filtering Method
GMM, MPPCA, MLiT (N), Bayes
K-NN, MLP, SVM
A Survey of Modern Gene Expression Based Techniques
3
43
Evaluation Criteria and Comparison
We compare the literature based on the following criteria: The type of cancer, the platform (microarray dataset), feature selection techniques, and classification methods. 1. Type of cancer: This attribute defines the type of cancers have been considered in each surveyed paper. Several type of cancers have been considered including colon, lukemia, lymphomas, prostate, lung and breast. Table 3 illustrate the distribution of surveyed papers on the type of cancers. 2. The platform and microarray dataset: They describe the platform and the used dataset. Table 1 illustrates the type of platforms and the characteristics of the microarray datasets. In addition, it shows the distribution of the surveyed papers on these datasets. For more clarification, Kintredge microarray gene expression data [40] is composed of 62 samples which are collected from colon-cancer patients: 40 malignant and 22 normal with 6,500 genes. It was then preprocessed in a clinical study by Alon et al. [40]. They excluded 4,500 genes out of them and just considered 2,000 genes as significant genes based on that study. Researchers in [16,18–20,22,23,34,37,39] considered the processed version in their studies. 3. Feature selection techniques: This attribute define various feature selection strategies to select the most significant genes (Table 4). Furthermore, researchers compared the effect of platforms on the effectiveness of performance. It is concluded that gene expressions are platform dependent. This finding is based on comparing two platforms namely, Affymetrix and Illumina. However, to generalize these findings it is desirable to compare with Table 3. Type of cancers Cancer
References
Colon
[16, 18, 18–23, 25, 26, 28–30, 34, 37, 39, 52–54]
Breast
[18, 22, 30–34, 50, 51, 51, 53–64]
SRBCT
[18, 38]
Leukemia
[16, 20, 23, 34, 37, 38]
Lymphomas [22, 23] Prostate
[20, 22, 23, 30, 34]
Lung
[18, 30, 30, 34, 37, 54]
Kidny
[53, 54, 59]
Ovarian
[30]
MLL
[30]
HCC
[30]
44
H. Rahman et al. Table 4. Feature selection techniques Approach
Feature selection
[19]
A feed forward gene selection technique
[39]
Chi-square, F-score, PCA, rMRM
[16]
FGFS
[22]
Four similarity-based methods, IG, MI (Mutual Information) and SNR
[23]
MAHP, (IG, SU, BD, and ReliefF)
[24]
HDBPSO and others
[37]
Multiple SVDD
[34]
TCP
[21]
f–MRMS (Information Based Maximum Relevance Maximum Significance) and (maximum relevance, rMRM)
[20]
Fisher, T-Statistics, SNR and ReliefF
[18]
PCA
[25, 26, 28] Low entropy filter [29]
Low entropy filter, and Leave-one-out method
[30]
T-Statistics, ReliefF, Chi-square
[31]
N-MAP-TVAR
[51]
Discretization method Table 5. The classification techniques
Classification method
Ref.
SVM
[16, 19, 20, 20, 22–25, 30, 32, 33, 37]
ANN: (MLP, PNN, ELM network, RBF) [16, 22–24, 31, 33] K-NN
[20–23, 32, 37]
LDA
[23, 38]
Lib-Linear
[24]
Bayes’ theorem and NB
[24–26, 28, 29, 31]
Decision tree: J48,JRip
[24, 33]
Others: (BLR, LMT, RF, BDT)
[24, 25, 30, 31, 33, 51]
Ensemble classifiers
[22, 34, 39]
other platforms. We also suggest comparing the aforementioned findings with dataset generated using other platforms (Table 5). 4. The classification methods: Classification methods are used to classify genes based samples into categories (normal and malignant classes).
A Survey of Modern Gene Expression Based Techniques
4
45
Discussion
It can be reported that, reliable cancer detection and diagnosis decisions can be made using fully automated approaches. These approaches utilize the advances in data mining and machine learning. Gene expressions analysis based techniques have been applied to several types of cancers including colon, breast, leukemia, lymphomas, prostate, lung, SRBCT, Kidney, Ovarian, MLL and HCC. There are still several significant questions need to be addressed for the cancer detection and diagnosis using gene expression techniques problem. One of the most important issues is to detect and diagnosis the stages in their different stages. Most of the reviewed techniques are applied to detect two stages either the disease is available or not. For example but not limited, for colon cancer all reviewed papers [16,18–20,22,23,25,26,28,29,34,37,39] were performed to detect two cases: malignant or normal. Although there are five stages for colon cancer [13], none of the reviewed studies have considered these different stages. Two stages I and II, with 208 and 142 samples for each stages respectively have been considered in [19,39]. Regarding leukemia three different stages B-ALL, T-ALL and AML have been considered in [37]. In this context, Navarties [46] is a multiple class microarray data composed of four classes each class represents a different type of cancer (Breast, Prostate, Lung, Colon) it is used in [37] to detect the type of cancer. The researchers compared the effect of the platforms on the effectiveness of performance. It is concluded that gene expressions are platform dependent. This finding is based on comparing two platforms namely, Affymetrix and Illumina. However, to generalize these findings it is desirable to compare with other platforms. The major drawback of applying microarrays is their cruse of dimensionality with limited number of samples. Researchers suggest several techniques to alleviate this challenging using feature selection techniques. Nearly all of the surveyed papers focus on their contributions on presenting the most effective techniques to select the most significant genes such as chi-square, F-score, PCA, rMRM , FGFS, similarity-based methods, IG, MI, SNR, MAHP, IG, SU, BD, Relief, HDBPSO, etc are employed to select the most significant genes. Several classification techniques have been applied including SVM [16,19,20,22– 25,30,32,33,37], ANN: (MLP, PNN, ELM network, RBF) [16,22–24,31,33], KNN [20–23,32,37], LDA [23,38], Lib-Linear [24], Bayes’ theorem and NB [24– 26,28,29,31], Decision trees: J48, JRip [24,33], etc. An arisen question needs also to be discussed is how these classification methods are applied. Of course, the input to the used classifiers in all studies is the selected genes excepted in [39] with BioGPS dataset [41] they used all genes because this dataset just include three genes. Some studies used one classifier while others compared the performance of several classifiers. Generally, the ensemble of classification algorithms outperforms the performance of the individual ones. However, they should be combined in proper way [65]. The ensemble methods might be homogeneous or heterogeneous. We mean here by homogeneous ensemble classifier that the elements are the same classifier with different properties. For example but not limited, combining several SVM clas-
46
H. Rahman et al.
sifiers with different kernels or several K-NN elements with different distance measures. On the other hand, heterogeneous ensemble classifier means that the each element is a different classifier for example, the ensemble of K-NN, SVM, ANN, etc. In this context, both homogeneous and heterogeneous ensemble classifiers are applied to classify cancers using gene expression data. In [34,39], they applied homogeneous ensemble classifier based on SVM while Kim et al. [22] employed a heterogeneous ensemble classifier of K-NN, SVM and MLP. There are some important issues need to be addressed in future in the area of cancer detection and diagnosis using gene expression including: • Applying gene expression techniques to address different stages of cancer (multiple classification problem). This definitely contributes in reducing the mortality rates when it is diagnosed in the early stages. • Improving the performance of the current classification methods through employing ensemble methods. • The aim is not just to propose high accurate techniques but also applicable approaches for histopathologies and not computationally expensive. • To generalize the finding of effect of the platforms on recognition rates there is need to compare the finding of Affymetrix and Illumina with other platforms.
5
Conclusions
Cancer detection and diagnosis using gene expression technique is considered as a classification problem. In this work we survey the modern gene expression based techniques to detect and diagnosis cancer at the early stage. We focus on the computational requirements instead of medical aspects. We first summarized the surveyed papers and categorized them into two groups based on the number of classes. Then, we compared the surveyed papers based on some criteria including the type of cancer, the platform and dataset, the type of feature selection techniques, and the classification methods. This is followed by discussing some related issues. There are some important issues need to be addressed in the area of cancer detection and diagnosis using gene expression in the future including: applying gene expression techniques to address different stages of cancer and improving the performance of the current classification methods through employing ensemble classification methods. Finally, there is need to compare the finding of Affymetrix and Illumina with other platforms.
References 1. Cullen, B.R., Zeng, Y.: Method of regulating gene expression, US Patent 9,856,476, 2 January 2018 2. Bewick, A.J., Schmitz, R.J.: Gene body DNA methylation in plants. Curr. Opin. Plant Biol. 36, 103–110 (2017) 3. Vijayakumar, P., Vijayalakshmi, V., Rajashree, R.: Increased level of security using DNA steganography. Int. J. Adv. Intell. Paradigms 10(1–2), 74–82 (2018)
A Survey of Modern Gene Expression Based Techniques
47
4. Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.-H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Nat. Acad. Sci. 98(26), 15149–15154 (2001) 5. Javid, Q., Arif, M., Talpur, S.: Segmentation and classification of calcification and hemorrhage in the brain using fuzzy c-mean and adaptive neuro-fuzzy inference system. Mehran Univ. Res. J. Eng. Technol. 15(1), 29 (2016) 6. Muhammad, A., Guojun, W.: Segmentation of calcification and brain hemorrhage with midline detection. In: 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 1082–1090. IEEE (2017) 7. Javaid, Q., Arif, M., Shah, M.A., Nadeem, M., et al.: A hybrid technique for denoising multi-modality medical images by employing cuckoo’s search with curvelet transform. Mehran Univ. Res. J. Eng. Technol. 37(1), 29 (2018) 8. Arif, M., Alam, K.A., Hussain, M.: Application of data mining using artificial neural network: survey. Int. J. Database Theory Appl. 8(1), 245–270 (2015) 9. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999) 10. Hong, S.-S., Kim, D.-W., Han, M.-M.: An improved data pre-processing method for classification and insider information leakage detection. Int. J. Adv. Intell. Paradigms 11(1–2), 143–158 (2018) 11. Patel, S.J., Sanjana, N.E., Kishton, R.J., Eidizadeh, A., Vodnala, S.K., Cam, M., Gartner, J.J., Jia, L., Steinberg, S.M., Yamamoto, T.N., et al.: Identification of essential genes for cancer immunotherapy. Nature 548(7669), 537 (2017) 12. Sudha, V.K., Sudhakar, R., Balas, V.E.: Fuzzy rule-based segmentation of CT brain images of hemorrhage for compression. Int. J. Adv. Intell. Paradigms 4(3– 4), 256–267 (2012) 13. Rathore, S., Hussain, M., Ali, A., Khan, A.: A recent survey on colon cancer detection techniques. IEEE/ACM Trans. Comput. Biol. Bioinform. 10(3), 545– 563 (2013) 14. Lemaˆıtre, G., Mart´ı, R., Freixenet, J., Vilanova, J.C., Walker, P.M., Meriaudeau, F.: Computer-aided detection and diagnosis for prostate cancer based on mono and multi-parametric MRI: a review. Comput. Biol. Med. 60, 8–31 (2015) 15. Saha, D., Bhowmik, M.K., De, B.K., Bhattacharjee, D.: A survey on imaging-based breast cancer detection. In: Proceedings of Fourth International Conference on Soft Computing for Problem Solving, pp. 255–266. Springer (2015) 16. Lee, K., Man, Z., Wang, D., Cao, Z.: Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis. Neural Comput. Appl. 22(3–4), 457–468 (2013) 17. Goswami, S., Chakrabarti, A., Chakraborty, B.: An empirical study of feature selection for classification using genetic algorithm. Int. J. Adv. Intell. Paradigms 10(3), 305–326 (2018) 18. Lotfi, E., Keshavarz, A.: Gene expression microarray classification using PCA-BEL. Comput. Biol. Med. 54, 180–187 (2014) 19. Rathore, S., Iftikhar, M.A., Hussain, M.: A novel approach for automatic gene selection and classification of gene based colon cancer datasets. In: 2014 International Conference on Emerging Technologies (ICET), pp. 42–47. IEEE (2014)
48
H. Rahman et al.
20. Bouazza, S.H., Hamdi, N., Zeroual, A., Auhmani, K.: Gene-expression-based cancer classification through feature selection with KNN and SVM classifiers. In: 2015 Intelligent Systems and Computer Vision (ISCV), pp. 1–6. IEEE (2015) 21. Paul, S., Maji, P.: Gene expression and protein-protein interaction data for identification of colon cancer related genes using f-information measures. Natural Comput. 15, 1–15 (2015) 22. Kim, K.-J., Cho, S.-B.: Meta-classifiers for high-dimensional, small sample classification for gene expression analysis. Pattern Anal. Appl. 18, 1–17 (2014) 23. Nguyen, T., Khosravi, A., Creighton, D., Nahavandi, S.: A novel aggregate gene selection method for microarray data classification. Pattern Recogn. Lett. 60, 16– 23 (2015) 24. Banka, H., Dara, S.: A hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation. Pattern Recogn. Lett. 52, 94–100 (2015) 25. Simjanoska, M., Bogdanova, A.M., Popeska, Z.: Recognition of colorectal carcinogenic tissue with gene expression analysis using Bayesian probability. In: ICT Innovations, pp. 305–314 (2012) 26. Bogdanova, A.M., Simjanoska, M., Popeska, Z.: Classification of colorectal carcinogenic tissue with different DNA chip technologies. In: The 6th International Conference on Information Technology, Ser. ICIT (2013) 27. Wong, W.-C., Loh, M., Eisenhaber, F.: On the necessity of different statistical treatment for illumina beadchip and affymetrix genechip data and its significance for biological interpretation. Biol. Direct 3(1), 23 (2008) 28. Simjanoska, M., Bogdanova, A.M., Popeska, Z.: Bayesian posterior probability classification of colorectal cancer probed with Affymetrix microarray technology. In: 2013 36th International Convention on Information & Communication Technology Electronics & Microelectronics (MIPRO), pp. 959–964. IEEE (2013) 29. Simjanoska, M., Bogdanova, A.M.: Novel methodology for CRC biomarkers detection with leave-one-out Bayesian classification. In: ICT Innovations 2014, pp. 225– 236. Springer (2015) 30. Ibrahim, R., Yousri, N., Ismail, M.A., El-Makky, N.M., et al.: Multi-level gene/MiRNA feature selection using deep belief nets and active learning. In: 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 3957–3960. IEEE (2014) 31. Wu, H.C., Zhang, L., Chan, S.C.: Reconstruction of gene regulatory networks from short time series high throughput data: review and new findings. In: 2014 19th International Conference on Digital Signal Processing (DSP), pp. 733–738. IEEE (2014) 32. Burton, M., Thomassen, M., Tan, Q., Kruse, T.A.: Gene expression profiles for predicting metastasis in breast cancer: a cross-study comparison of classification methods. Sci. World J. 2012 (2012) 33. Otoom, A.F., Abdallah, E.E., Hammad, M.: Breast cancer classification: comparative performance analysis of image shape-based features and microarray gene expression data. Int. J. Bio-Sci. Bio-Technol. 7(2), 37–46 (2015) 34. Tong, M., Liu, K.-H., Chungui, X., Wenbin, J.: An ensemble of SVM classifiers based on gene pairs. Compute. Biol. Med. 43(6), 729–737 (2013) 35. Li, D., Wang, Z., Cao, C., Liu, Y.: Information entropy based sample reduction for support vector data description. Appl. Soft Comput. 71, 1153–1160 (2018)
A Survey of Modern Gene Expression Based Techniques
49
36. Arif, M., Abdullah, N.A., Phalianakote, S.K., Ramli, N., Elahi, M.: Maximizing information of multimodality brain image fusion using curvelet transform with genetic algorithm. In: 2014 International Conference on Computer Assisted System in Health (CASH), pp. 45–51. IEEE (2014) 37. Cao, J., Zhang, L., Wang, B., Li, F., Yang, J.: A fast gene selection method for multi-cancer classification using multiple support vector data description. J. Biomed. Inform. 53, 381–389 (2015) 38. Karimi, S., Farrokhnia, M.: Leukemia and small round blue-cell tumor cancer detection using microarray gene expression data set: combining data dimension reduction and variable selection technique. Chemometr. Intell. Lab. Syst. 139, 6–14 (2014) 39. Rathore, S., Hussain, M., Khan, A.: GECC: gene expression based ensemble classification of colon samples. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 11(6), 1131–1145 (2014) 40. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. 96(12), 6745–6750 (1999) 41. Colon cancer data set Biogps (2013). http://biogps.org/dataset/1352/stage-ii-andstage-iii-colorectal-cancer/ 42. Notterman, D.A., Alon, U., Sierk, A.J., Levine, A.J.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Res. 61(7), 3124–3130 (2001) 43. Hinoue, T., Weisenberger, D.J., Lange, C.P.E., Shen, H., Byun, H.-M., Van Den Berg, D., Malik, S., Pan, F., Noushmehr, H., van Dijk, C.M., et al.: Genome-scale analysis of aberrant DNA methylation in colorectal cancer. Genome Res. 22(2), 271–282 (2012) 44. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002) 45. Petkovi’c, D., Arif, M., Shamshirband, S., Bani-Hani, E.H., Kiakojoori, D.: Sensorless estimation of wind speed by soft computing methodologies: a comparative study. Informatica 26(3), 493–508 (2015) 46. Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A., et al.: Large-scale analysis of the human and mouse transcriptomes. Proc. Nat. Acad. Sci. 99(7), 4465–4470 (2002) 47. Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., et al.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Nat. Acad. Sci. 98(24), 13790–13795 (2001) 48. Van’t Veer, L.J., Dai, H., Van De Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., Van Der Kooy, K., Marton, M.J., Witteveen, A.T., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002) 49. Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., et al.: Distinct types of diffuse large Bcell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000) 50. Hu, Z., Killion, P.J., Iyer, V.R.: Genetic reconstruction of a functional transcriptional regulatory network. Nat. Genet. 39(5), 683–687 (2007)
50
H. Rahman et al.
51. Sathishkumar, E.N., Thangavel, K, Nishama, A: Comparative analysis of discretization methods for gene selection of breast cancer gene expression data. In: Computational Intelligence, Cyber Security and Computational Models, pp. 373– 378. Springer (2014) 52. Marisa, L., de Reyni`es, A., Duval, A., Selves, J., Gaub, M.P., Vescovo, L., EtienneGrimaldi, M.C., Schiappa, R., Guenot, D., Ayadi, M., et al.: Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med. 10, e1001453 (2013) 53. Liu, Y., Ji, Y., Qiu, P.: Identification of thresholds for dichotomizing DNA methylation data. EURASIP J. Bioinform. Syst. Biol. 2013, 8 (2013) 54. Yang, K.-C., Hsu, C.-L., Lin, C.-C., Juan, H.-F., Huang, H.-C.: Mirin: identifying microrna regulatory modules in protein-protein interaction networks. Bioinformatics 30(17), 2527–2528 (2014) 55. Qi, P., Xiang, D.: The long non-coding RNAs, a new cancer diagnostic and therapeutic gold mine. Mod. Pathol. 26(2), 155–165 (2013) 56. Won, J.R., Gao, D., Chow, C., Cheng, J., Lau, S.Y., Ellis, M.J., Perou, C.M., Bernard, P.S., Nielsen, T.O.: A survey of immunohistochemical biomarkers for basal-like breast cancer against a gene expression profile gold standard. Mod. Pathol. 26(11), 1438–1450 (2013) 57. Radha, R., Rajendiran, P.: Using k-means clustering technique to study of breast cancer. In: 2014 World Congress on Computing and Communication Technologies (WCCCT), pp. 211–214. IEEE (2014) 58. Wang, N., Wang, Y., Hao, H., Wang, L., Wang, Z., Wang, J., Wu, R.: A bi-Poisson model for clustering gene expression profiles by RNA-seq. Briefings Bioinform. 15, 534–541 (2013). bbt029bbt029 59. Jun, H., Tzeng, J.-Y.: Integrative gene set analysis of multi-platform data with sample heterogeneity. Bioinformatics 30(11), 1501–1507 (2014) 60. Mahata, K., Sarkar, A.: Cancer gene silencing network analysis using cellular automata. In: 2015 Third International Conference on Computer, Communication, Control and Information Technology (C3IT), pp. 1–5. IEEE (2015) 61. Saribudak, A., Gundry, S., Zou, J., Uyar, M.U.: Genomic based personalized chemotherapy analysis to support decision systems for breast cancer. In: 2015 IEEE International Symposium on Medical Measurements and Applications (MeMeA), pp. 495–500. IEEE (2015) 62. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Statis. Assoc. 97(457), 77–87 (2002) 63. Dettling, M., B¨ uhlmann, P.: Boosting for tumor classification with gene expression data. Bioinformatics 19(9), 1061–1069 (2003) 64. Chamard-Jovenin, C., Jung, A.C., Chesnel, A., Abecassis, J., Flament, S., Ledrappier, S., Macabre, C., Boukhobza, T., Dumond, H.: From erα66 to erα36: a generic method for validating a prognosis marker of breast tumor progression. BMC Syst. Biol. 9(1), 28 (2015) 65. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)
Recognition of Skin Diseases Using Curvelet Transforms and Law’s Texture Energy Measures Jyotismita Chaki1, Nilanjan Dey2, V. Rajinikanth3(&), Amira S. Ashour4, and Fuqian Shi5 1
2
School of Education Technology, Jadavpur University, Kolkata, India [email protected] Department of Information Technology, Techno India College of Technology, Kolkata, India [email protected] 3 Department of Electronics and Instrumentation Engineering, St. Joseph’s College of Engineering, Chennai 600 119, Tamilnadu, India [email protected] 4 Department of Electronics and Electrical Communications Engineering, Faculty of Engineering, Tanta University, Tanta, Egypt [email protected] 5 College of Information and Engineering, Wenzhou Medical University, Wenzhou, China [email protected]
Abstract. This work presents an automated system to recognize human skin disease. In many computer vision and pattern recognition problems, such as our case, considering only a single descriptor to mine one sort of feature vector is not enough to attain the entire relevant information from the input data. Therefore, it is required to apply more than one descriptor to extract more than one feature vector categories with different dimensions. In this paper, for the purpose of skin disease classification, we propose a new hybrid method which is the combination of two methods to proficiently classify different types of feature vectors in their original form, dimensionality. The first one uses the Curvelet transform method in spatial and frequency viewpoint and the second one uses the set of energy measures to define textures had been formulated by Law’s texture energy measure. Minimum euclidean distance of the Law’s texure energy measures between different species are calculated for discrimination. Keywords: Skin disease classification
Curvelet Law’s texture Texture
1 Introduction Skin diseases are widespread in hot, humid and densely populated areas of the world. If not treated in time, they may lead to severe complications in the body including spreading of infection. Recently, computerized methodologies and pattern detection
© Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 51–61, 2021. https://doi.org/10.1007/978-3-030-52190-5_4
52
J. Chaki et al.
practices have been applied towards building cheap and automated systems for skin disease detection which can provide an early warning system. A number of procedures have been proposed to apply pattern recognition techniques to build mathematical models for automated recognition of medical conditions from images [1]. For detection of human skin and its conditions, techniques proposed have largely depended on physical image properties like color [2], texture [3] etc. Diseases which have been addressed include psoriasis [3], skin cancer [6, 9], virus and bacterial infections [5], dermatological skin diseases [7]. Feature extraction techniques used include wavelet transforms [4], spectral analysis [9], gray level co-occurrence matrix (GLCM) [10, 15], scale invariant feature transformation (SIFT) [5], Otsu thresholding [8], principal component analysis (PCA) [9]. Classification techniques include neural networks [4, 6, 10], support vector classifier and k-nearest neighbor [5] and minimum distance classifiers [7], convolutional neural network [13, 14]. In the light of previous researches, the contributions of the present work are as follows. For the purpose of automatically identifying disease of the human skin by analyzing texture information contained within the human skin digital images, a novel method is proposed that competently combines two feature vector types in their original form and dimensionality. This approach doesn’t need any domain normalization and dimension equalization of different feature vector types. First, from the human skin disease images, features are extracted based on curvelet transform in both spatial and frequency viewpoint and maximum correlation between the curvelet features is used for the classification. The main reason of using curvelet transform is its ability of multiresolution and multidirectional analysis. Its basis elements, with various aspect ratios, are extended in different directions. Images are decayed into sum of its components at different locations and scales. Curvelet frequency plane is subdivided into dyadic coronae and dyadic coronae are separated into angular wedges. The angular wedges provide parabolic aspect ratio which enables the needle shaped basis elements of curvelet with various aspect ratios oriented in different directions. For better representation of texture features, Law’s texure energy measure is introduced in this study. Typically, the Law’s texure energy measure is calculated in a small window, which scans the whole image and extract the texture feature associated with each pixel. Law’s texure energy measure can be used for distinguishing the edge, level, wave, spot and ripple at selected vector length adjacent pixels in horizontal and vertical direction which is a analogous procedure comparable to the human visual process. To embrace these texture properties in skin disease classification, Law’s texure energy measure is used to extract texture features from the spatial coefficient obtained from curvelet transform. Finally, the minimum euclidean distance of the Law’s texure energy measure between different species are calculated for classification. Arrangement of this work is as follows: Sect. 1 discusses a summary of related works, Sect. 2 defines the proposed method, Sect. 3 provides details of the dataset and experimental results acquired, Sect. 4 analyzes the current work vis-a-vis other works, and Sect. 5 brings up the overall conclusion and future scopes.
Recognition of Skin Diseases Using Curvelet Transforms
53
2 Proposed Approach (Hybrid Method) This work proposes a system for computerized recognition of three classes of skin disease groups by investigating the texture acquired from a collection images using features based on Curvelet Transform in both spatial and frequency viewpoint. The classification is done by the maximum correlation between the curvelet features. Later to increase the recognition result, Law’s texture energy measures method is applied to the spatial coefficient obtained from curvelet transform. Finally, the classification is done by the minimum Euclidean distance of the Law’s texture feature. 2.1
Curvelet Transform
Curvelet was originally presented in [11] that intricate ridgelet analysis and was stated to be comparatively slow. Then a newer quicker arrangement was established termed as Fast-Discrete-Curvelet-Transform (FDCT) that has two formation: wrapping function and Unequally-Spaced-Fast-Fourier-Transforms (USFFT). The present paper uses the wrapping function that includes dissimilar sub-bands at different scales comprising of various orientations and places in the frequency domain. A 2D image of size P Q is imperiled to FDCT that produces a bunch of curvelet coefficients indexed by a scale s, rotational angle r and spatial positional constraints and . Here 0 p P, 0 q Q and us;r;m;n is the curvelet waveform. C ðs; r; m; nÞ ¼
P;Q X
f ðp; qÞus;r;m;n ðp; qÞ
ð1Þ
p¼0;q¼0
A 2D image f(p, q), is first converted to frequency domain F(p, q) using 2D FFT viz. f(p, q) ! F(p, q). For each scale s and rotational angle r a polar “wedge” Us,r is obtained which is multiplied with the frequency domain signal to obtain the product: Us,r(p, q)F(p, q). The product is then wrapped around the origin using a window function W : Fs,r(p, q) = W{Us,r(p, q)F(p, q)}. In the final step, IFFT is applied to each Fs,r(p, q) for generating the discrete curvelet coefficients . The maximum correlation between the curvelet coefficients of a test sample and a set of training samples is used for classification. 2.2
Law’s Texture Energy Measures
Laws [16] developed a quantitative approach for measuring the quantity of gray level discrepancy in a window of static dimension in a grayscale image. It involves five 1-D masks named ML (Level), ME (Edge), MS (Spot), MW (Wave), MR (Ripple). ML ¼ ½ 1 4 6 4 1 ME ¼ ½ 1 2 0 2 1 MS ¼ ½ 1 0 2 0 1 MW ¼ ½ 1 2 0 2 1 MR ¼ ½ 1 4 6 4 1
ð2Þ
54
J. Chaki et al.
All masks except L are zero-sum. These are used to generate 25 2-dimensional kernels by the convolution between a vertical mask and a horizontal mask. These are: MLL, MEL, MSL, MWL, MRL, MLE, MEE, MSE, MWE, MRE, MLS, MES, MSS, MWS, MRS, MLW, MEW, MSW, MWW, MRW, MLR, MER, MSR, MWR, MRR. For example, MLW is formed by using a vertical ML and a horizontal MW mask. 2 6 6 MLW ¼ 6 4
1 4 6 4 1
2 8 12 8 2
0 0 0 0 0
2 8 12 8 2
1 4 6 4 1
3 7 7 7 5
ð3Þ
Of these 25 2-D kernels, 24 kernels are summed as zero, excluding MLL. Assumed an image with Q columns and P rows, the image is convolved with 25 2-D convolution kernels respectively. The outcome is a bunch of 25 P Q grayscale images, which procedure the foundation for the textural examination of original image. Now each pixel in the 25 P Q distinct grayscale images is substituted with a Texture Energy Measure (TEM) premeditated from the adjacent pixels around the pixel. This is calculated by adding the absolute values of the 15 15 adjacent pixels. m0 ðx; yÞ ¼
7 X 7 X
mðx þ i; y þ jÞ
ð4Þ
i¼7 j¼7
An innovative group of imagery are produced, that will be denoted to as the TEM imagery. The TEM images are termed by appending T with the term of every kernel. Entire convolution kernels utilized therefore are zero mean with the allowance of the KLLT kernel that is utilized as a standardization image i.e. entire TEM images are standardized pixel-by-pixel by the ILLT image. Next, the ILLT image is naturally rejected. In the concluding step, alike features are united to eliminate a bias from the features for directionality. For instance, ILWT is delicate to vertical-edges and IWLT is delicate to horizontal-edges. If this TEM imagery is combined jointly, a sole attribute, delicate to plain edge substance can be attained. Resulting this example, features that are produced with transposed convolution kernels are combined together. These innovative features are represented with a suffixed ‘R’ for ‘orientation invariance’: FELTR = KELT + KLET, FSLTR = KSLT + KLST, FWLTR = KWLT + KLWT, FRLTR = KRLT + KLRT, FSETR = KSET + KEST, FWETR = KWET + KEWT, FRETR = KRET + KERT, FWSTR = KWST + KSWT, FRSTR = KRST + KSRT, FRWTR = KRWT + KWRT. To preserve entire features dependable regarding dimension, the remaining features are scaled by 2: FEETR = KEET * 2, FSSTR = KSST * 2, FWWTR = KWWT * 2, FRRTR = KRRT * 2. The outcome, if KLLT is removed all in all, is a bunch of 14 surface features that are orientation invariant. If this imagery is loaded up, a dataset can be achieved where every pixel is characterized by 14 texture features [17–20].
Recognition of Skin Diseases Using Curvelet Transforms
2.3
55
Classification
Law’s texure energy measure is used to extract texture features from the spatial coefficient obtained from curvelet transform. Classification is an essentail process in several applications [21–35]. In the present work, image classification is accomplished by distributing the database into a training group T and a testing group S respectively comprising of n samples. The m-th training class Tm is denoted by the average of the feature significance of its complete element samples. The n-th test sample Sn with feature significance SFn is categorized to class c if the absolute difference Dn,m between the n-th test sample and m-th training class is smallest for m = c.
3 Experimentations and Results Examinations are performed with 300 skin samples divided into three classes A (Acne), B (Eczema) and C (Urticaria). The images are scaled to standard dimension of 100 by 100 and stored in GIF format. Amongst the 100 images for every class, 55 are utilized for training and 45 for testing. Samples of are shown below (Fig. 1).
Fig. 1. Training (left) and testing (right) samples of images belonging to three classes
The curvelet transform coefficients are computed in both spatial and frequency domains. Figure 2 demonstrates an example of their pictorial representations.
Fig. 2. (a) Original image (b) Curvelet representation in spatial domain (c) Curvelet representation in frequency domain
56
J. Chaki et al.
The classification is based on maximum correlation between the curvelet coefficients in spatial [CS] and frequency [CF] domains. Classification plots for the three classes are shown in Fig. 3. Table 1 indicates the classification accuracies. The overall accuracy is 83.7%. Table 1. Recognition accuracies using curvelet features Feature Class A Class B Class C Overall accuracy Curvelet 57.7% 97.7% 95.5% 83.7%
Fig. 3. Classification plots using curvelet coefficients in spatial and frequency viewpoints
Law’s texture energy measure is computed in the spatial coefficient of curvelet transform. Figure 4 provides the pictorial representation of some of the Laws texture features.
Recognition of Skin Diseases Using Curvelet Transforms
57
Fig. 4. Original image and some of its Laws texture representations
The outcomes of classification accurateness founded on these 14 features are presented below in Table 2 for the 3 classes (C) along with overall accuracy (O). Table 2. Percentage recognition accuracies using Laws texture values C A B C O
EE 100 97.8 95.5 97.8
EL 86.6 74.4 85.8 82.3
RE 74.8 89.8 71.4 78.7
RL 95.4 97.8 81.2 91.5
RR 91.8 75.2 94.8 87.3
RS 81.4 90.4 94.6 88.8
RW 94.8 96.4 98.6 96.7
SE 97.6 87.8 95.4 93.6
SL 90.4 99.4 93.8 94.5
SS 88.4 86.8 80.4 85.2
WE 93.8 90.4 94.4 92.9
WL 96.8 84.8 90.4 90.1
WS 99.4 98.8 89.2 95.8
WW 94.6 94.4 96.8 95.3
Figure 5 shows the categorization scheme for the feature FEE. The illustration displays the absolute dissimilarity (D) of the testing images of every class with the training images of the three classes. The codes have the subsequent denotation: Dpq specifies difference amid the testing images of class p with the average of the training images of class q. The topmost sub-figure displays outcomes for class-1 or class-A, the central sub-figure for class-2 or class-B and the bottommost sub-figure for class-3 or class-C.
58
J. Chaki et al.
Fig. 5. Classification of testing samples by Laws feature EE
4 Analysis To generate a perspective with the state of the art, two contemporary methods are also implemented on the current dataset for comparison of accuracy values. In [3, 10, 15] skin disease detection is done using GLCM features. The major disadvantage of this technique is, it can only capture limited directional information (0°, 45°, 90°, 135°) due to its reduced direction selectivity. Curvelet is established to be mainly successful at distinguishing picture action along multi-directions which are the most comprising substance of medical imagery [36–42]. In [13, 14] Convolution-Neural-Network (CNN) is used for the recognition of skin diseases. The limitation of CNNs lays in the volume of data given to them. If a smaller amount of data is provided, the CNNs perform poorly. CNNs have millions of parameters and with small dataset, would run into an over-fitting issue as they need massive amount of data. In this study, dataset contains only 300 images which is not sufficient for CNN to provide a good recognition result. Comparisons of accuracies obtained from the state-of-art methods with the proposed method are listed in Table 3.
Recognition of Skin Diseases Using Curvelet Transforms
59
Table 3. Comparison of % of recognition using the current dataset Method Overall % of accuracy Combination of color and GLCM features [3] 87.6% GLCM features [10] 84.8% Deep CNN [13, 14] 90.2% Morphological and GLCM features [15] 89.4% Proposed method 97.8%
5 Conclusion and Future Scope This work suggests a computerized scheme for recognition of human skin disease using the combination of curvelet transform and Law’s texture energy measure. The proposed method is compared to other contemporary approaches. Automatic classification system like this can demonstrate beneficial for fast and effectual classification of skin disease, at least to a preliminary level, particularly where satisfactory medical peoples are not present. Future work for refining the scheme would be engrossed on: (i) using a statistical categorizer like a neural network (ii) combining other texture recognition approaches with this (iii) Including provisions for rotated and skewed images in the dataset.
References 1. Angenent, S., Pichon, E., Tannenbaum, A.: Mathematical methods in medical image processing. Bull. Am. Math. Soc. 43, 365–396 (2006) 2. Fekri-Ershad, S., Saberi, M., Tajeripour, F.: An innovative skin detection approach using color based image retrieval technique. Int. J. Multimedia Its Appl. (IJMA) 4, 57–65 (2012) 3. Al Abbadi, N.K., Dahir, N.S., Al-Dhalimi, M.A., Restom, H.: Psoriasis detection using skin color and texture features. J. Comput. Sci. 6, 648–652 (2010) 4. Saa’d, W.K.: Method for detection and diagnosis of the area of skin disease based on color by wavelet transform and artificial neural network. Al-Qadisiya J. Eng. Sci. 2, 799–829 (2009) 5. Tushabe, F., Mwebaze, E., Kiwanuka, F.N.: An image-based diagnosis of virus and bacterial skin infections. In: ICCIR, pp. 1–7 (2011) 6. Jaleel, J.A., Salim, S., Aswin, R.B.: Artificial neural network based detection of skin cancer. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 1, 200–205 (2012) 7. Arivazhagan, S., Shebiah, R.N., Divya, K., Subadevi, M.P.: Skin disease classification by extracting independent components. J. Emerg. Trends Comput. Inf. Sci. 3, 1379–1382 (2012) 8. Dey, N., Rajinikanth, V., Ashour, A.S., Tavares, J.M.R.: Social group optimization supported segmentation and evaluation of skin melanoma images. Symmetry 10(2), 1–21 (2018) 9. Sigurdsson, S., Philipsen, P.A., Hansen, L.K.: Detection of skin cancer by classification of Raman Spectra. IEEE Trans. Biomed. Eng. 10, 1784–1793 (2004)
60
J. Chaki et al.
10. Islam, M.N., Gallardo-Alvarado, J., Abu, M., Salman, N.A., Rengan, S.P., Said, S.: Skin disease recognition using texture analysis. In: IEEE Control and System Graduate Research Colloquium (ICSGRC), pp. 144–148 (2017) 11. Candès, E., Donoho, D.: Curvelets – a surprisingly effective nonadaptive representation for objects with edges. In: Cohen, A., Rabut, C., Schumaker, L. (eds.) Curves and Surface Fitting: Saint-Malo 1999, pp. 105–120. Vanderbilt University Press, Nashville (2000) 12. Falconer, K.: Fractal Geometry, Mathematical Foundations and Applications, pp. 38–47. Wiley, Hoboken (1990) 13. Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 (7639), 115–118 (2017) 14. Zhang, X., Wang, S., Liu, J., Tao, C.: Towards improving diagnosis of skin diseases by combining deep neural network and human knowledge. BMC Med. Inform. Decis. Mak. 18 (2), 69–76 (2018) 15. Amarathunga, A.A.L.C., Ellawala, E.P.W.C., Abeysekara, G.N., Amalraj, C.R.J.: Expert system for diagnosis of skin diseases. Int. J. Sci. Technol. Res. 4(01), 174–178 (2015) 16. Laws, K.: Textured image segmentation. Ph.D. dissertation, University of Southern California (1980) 17. Hore, S., Chatterjee, S., Sarkar, S., Dey, N., Ashour, A.S., Balas-Timar, D., Balas, V.E.: Neural-based prediction of structural failure of multistoried RC buildings. Struct. Eng. Mech. 58(3), 459–473 (2016) 18. Malik, S., Khatter, K.: Malicious application detection and classification system for android mobiles. Int. J. Ambient Comput. Intell. 9(1), 95–114 (2018) 19. Saba, L., Dey, N., Ashour, A.S., Samanta, S., Nath, S.S., Chakraborty, S., Sanches, J., Kumar, D., Marinho, R., Suri, J.S.: Automated stratification of liver disease in ultrasound: an online accurate feature classification paradigm. Comput. Methods Programs Biomed. 130, 118–134 (2016) 20. Ahmed, S.S., Dey, N., Ashour, A.S., Sifaki-Pistolla, D., Bălas-Timar, D., Balas, V.E., Tavares, J.M.R.: Effect of fuzzy partitioning in Crohn’s disease classification: a neuro-fuzzybased approach. Med. Biol. Eng. Compu. 55(1), 101–115 (2016) 21. Sharma, K., Virmani, J.: A decision support system for classification of normal and medical renal disease using ultrasound images: a decision support system for medical renal diseases. Int. J. Ambient Comput. Intell. 8(2), 52–69 (2017) 22. Dey, N., Ashour, A.S., Althoupety, A.S.: Thermal imaging in medical science. In: IGI Global Recent Advances in Applied Thermal Imaging for Industrial Applications, pp. 87– 117 (2017) 23. Sghaier, S., Farhat, W., Souani, C.: Novel technique for 3D face recognition using anthropometric methodology. Int. J. Ambient Comput. Intell. 9(1), 60–77 (2018) 24. Hemalatha, S., Anouncia, S.M.: Unsupervised segmentation of remote sensing images using FD based texture analysis model and ISODATA. Int. J. Ambient Comput. Intell. 8(3), 58–75 (2017) 25. Trabelsi, I., Bouhlel, M.S.: Feature selection for GUMI Kernel-based SVM in speech emotion recognition. In: IGI Global Artificial Intelligence: Concepts, Methodologies, Tools, and Applications, pp. 941–953 (2017) 26. Li, Z., Shi, K., Dey, N., Ashour, A.S., Wang, D., Balas, V.E., McCauley, P., Shi, F.: Rulebased back propagation neural networks for various precision rough set presented KANSEI knowledge prediction: a case study on shoe product form features extraction. Neural Comput. Appl. 28(3), 613–630 (2016) 27. Sambyal, N., Abrol, P.: Feature based text extraction system using connected component method. Int. J. Synth. Emot. (IJSE) 7(1), 41–57 (2016)
Recognition of Skin Diseases Using Curvelet Transforms
61
28. Azzabi, O., Njima, C.B., Messaoud, H.: New approach of diagnosis by timed automata. Int. J. Ambient Comput. Intell. (IJACI) 8(3), 76–93 (2017) 29. Li, Z., Dey, N., Ashour, A.S., Cao, L., Wang, Y., Wang, D., McCauley, P., Balas, V.E., Shi, K., Shi, F.: Convolutional neural network based clustering and manifold learning method for diabetic plantar pressure imaging dataset. J. Med. Imaging Health Inform. 7(3), 639–652 (2017) 30. Khachane, M.Y.: Organ-based medical image classification using support vector machine. Int. J. Synth. Emot. 8(1), 18–30 (2017) 31. Chaki, J., Dey, N.: Texture Feature Extraction Techniques for Image Recognition. Springer, Singapore (2020) 32. Chaki, J., Dey, N.: A Beginner’s Guide to Image Preprocessing Techniques. CRC Press, Boca Raton (2018) 33. Chaki, J., Dey, N.: Signal processed texture features. In: Texture Feature Extraction Techniques for Image Recognition, Springer, Singapore, pp. 43–65 (2020) 34. Geman, O., et al.: Deep learning tools for human microbiome big data. In: Advances in Intelligent Systems and Computing, vol. 633, pp. 265–275 (2017). https://doi.org/10.1007/ 978-3-319-62521-8_21 35. Chiuchisan, I., et al.: Tremor measurement system for neurological disorders screening. In: Advances in Intelligent Systems and Computing, vol. 633, pp. 339–348 (2017). https://doi. org/10.1007/978-3-319-62521-8_28 36. Mircea, I.-G., et al.: A reinforcement learning based approach to multiple sequence alignment. In: Advances in Intelligent Systems and Computing, vol. 634, pp. 54–70 (2017). https://doi.org/10.1007/978-3-319-62524-9_6 37. Saemi, B., et al.: Nature inspired partitioning clustering algorithms: a review and analysis. In: Advances in Intelligent Systems and Computing, vol. 634, pp. 96–116 (2017). https://doi. org/10.1007/978-3-319-62524-9_9 38. AlShahrani, A.M., Al-Abadi, M.A., Al-Malki, A.S., Ashour, A.S., Dey, N.: Automated system for crops recognition and classification. In: IGI Global Applied Video Processing in Surveillance and Monitoring Systems, pp. 54–69 (2017) 39. Wang, D., Li, Z., Cao, L., Balas, V.E., Dey, N., Ashour, A.S., McCauley, P., Dimitra, S.P., Shi, F.: Image fusion incorporating parameter estimation optimized gaussian mixture model and fuzzy weighted evaluation system: a case study in time-series plantar pressure data set. IEEE Sens. J. 17(5), 1407–1420 (2016) 40. Firoze, A., Rahman, R.M.: Critical condition classification of patients from ICCDR, B hospital surveillance data. Int. J. Adv. Intell. Paradigms 9(4), 347–369 (2017) 41. Anami, B.S., Elemmi, M.C.: A rule-based approach for classification of natural and manmade fabric images. Int. J. Adv. Intell. Paradigms 9(4), 402–413 (2017) 42. Hong, S.S., Kim, D.W., Han, M.M.: An improved data pre-processing method for classification and insider information leakage detection. Int. J. Adv. Intell. Paradigms 11(1– 2), 143–158 (2018)
Fuzzy Applications, Theory, Expert Systems and Fuzzy and Control
Intelligent Roof-Top Greenhouse Buildings Marius Mircea Balas(&), Mihaela Popa, Emanuela Valentina Muller, Daniel Alexuta, and Luana Muresan “Aurel Vlaicu” University of Arad, Arad, Romania [email protected], [email protected], [email protected], [email protected], [email protected]
Abstract. The Integrated Roof-Top Greenhouse Building IRTG is a new concept, derived from the conventional roof-top greenhouse, bringing features that boost technical performances and offer us a basic tool able to oppose global warming. By specific devices (water-to-water and air-to-air heat pumps, heat exchangers, water tanks, etc.) IRTGs are harvesting the available renewable energy resources (geo-thermal, solar, wind, etc.) which are stored and managed in an integrated way (Watergy). A ventilation system conveys oxygen enriched air from the greenhouse to the building and carbon dioxide enriched air from the building to the greenhouse. A generic model of this system is presented, along with illustrative simulations. If applied on large scale - the Green-Skyline City one creates a strong carbon offset mechanism, opposing the global warming, overlapping the most active CO2 source, our cities. The IRTG temperature control demands self-adaptivity and intelligence, which in our approach are provided by fuzzy-interpolative controllers. One obtain such way Intelligent RTGs (iRTG). Keywords: Passive greenhouse Urban agriculture Roof-top-green house Carbon offset Watergy Fuzzy-interpolative control
1 Introduction Greenhouse technology is a key for our sustainable future, on Earth or in Space. The oxygen we are breathing is the exclusive result of the photosynthesis of all the plants that preceded and surround us. Replacing photosynthesis as a main oxygen source is out of question, however our civilization launched itself, three centuries ago, into a technological revolution based on wood and fossil fuels oxidations that is now beginning to provoke global effects. Large scale deforestation and carbon dioxide pollution have practically replaced large amounts of oxygen from the atmosphere with carbon dioxide. This is causing a global warming by greenhouse effect, with incalculable side sequels [1]. A global reaction against this phenomenon was agreed at the 2015 United Nations Climate Change Conference UNCCC’15, held in Paris, France, from 30 November to 12 December 2015 [2]. Our goal is to offer this massive political initiative the support of an effective technical solution. © Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 65–75, 2021. https://doi.org/10.1007/978-3-030-52190-5_5
66
M. M. Balas et al.
In some previous papers we discussed the Passive Greenhouse (PG) concept [3], etc. The policy of replacing conventional agriculture by PGs is recommended by a set of synergic features that were exposed in Ref. [4]: – The renewable energy synergy: A PG can use in situ, with no grid conversion, all the significant renewable energy sources: geo-thermal, wind, solar and biomass; – The water synergy: PGs need water for plants’ watering but also for heat pumps and for energy storing. Greenhouses are in the same time rain water collectors, water consumers and water recyclers; – The constructive synergy: Drilling water wells for heat pumps is synergetic with the watering of the plants, greenhouses transparent walls creates conditions for collecting sun energy by greenhouse effect, etc.; – The carbon offset synergy: Besides the direct carbon offset produced by the greenhouse plants, which is comparable to a forest carbon offset, extending the PG surfaces creates a supplementary carbon offset thanks to the consequent ecological reconstruction of the newly liberated surfaces; simultaneously, plants are generating oxygen; – The trophic synergy: The PG agricultural system is directly reinforcing the two trophic levels chain (plants/humans) by the produced crops and in the same time is increasing the quality of the three trophic levels chain (plants/animals/humans) due to the ecological reconstruction that can offer natural habitats for the farm animals; – The economical synergy: PGs are using only available technologies and homologated components, which are creating a fast growing market; they have the potential to boost the renewable energy market and to generate a sustainable economic growth; – The political synergy: PGs are appeasing some political goals that were seeming contradictory so far: economic growth and efficiency, increasing number of jobs, reducing carbon footprint and increasing carbon offset, improving the quality of life by a structural ecological reconstruction of our environment and removing many of the agricultural and alimentation risks; However, perhaps from psychological reasons, investors are not seriously looking on the greenhouse agriculture, as long as flowers, vegetables and fruits are available at low prices on American, Asian or European markets – tropical climate does not seek greenhouses. The 2015 oil price fall down did not help investments in renewable energy. Still, we claim that renewable energy and PGs have not lost the race. UNCCC’15 did not point any miraculous carbon offset technology, we still have to find it. A water crisis is developing. Closed greenhouses [5] save water by recirculation and perform desalination and grey water treatments. The recent growth of the oil prices, is also pleading for renewable energy investments. That is why we reoriented towards the Roof-Top Greenhouses (RTG). If our cities will replace the dull conventional roofs by RTGs and the new buildings will be designed with RTGs from the very beginning, we will certainly obtain strong carbon offsets. We called this concept Green Skyline City (GSC). Will be GSC a successful solution for a sustainable future? To answer this question and to quantitatively describe GSCs we need appropriate mathematical and computer models.
Intelligent Roof-Top Greenhouse Buildings
67
2 Urban Agriculture and Roof-Top Greenhouses Deforestation, a sequel of our technological development, is a major cause for the rise of the CO2 atmospheric concentration [6]. Given our demographic growth, a wide reforestation is illusory. In such conditions a compensatory reaction is issuing the last years: the Urban Agriculture [7–9]. We like to consider this concept as a counterstrike of Nature against the polluting modern technology, supported by the ecological postmodern technology. The classical Roof-Top-Greenhouse (RTG) is called to play a leading role in this endeavor [10, 11], etc. Figure 1 illustrates this approach. The managements of Energy E, water W and carbon dioxide G are interconnected, resulting the Integrated RTG (iRTG). Such a prototype is actually functioning in Universitat Autonoma de Barcelona, Bellaterra, Spain, as a result of the Fertilecity Project [10].
Fig. 1. The Integrated Roof-Top Greenhouse [10]
We imagined a suchlike system as a development of the PG concept, which was gradually developed in our university by student research. Our most successful student teams were The Vlaicu’s Greenhouse (GDF Suez Responsible Company Challenge 2014) and Green Skylines (The E-On Energy Challenge 2018). While the Fertilecity prototype is designed for the mild Mediterranean climate, our iRTG is meant to be easily extendable at the scale of a whole city, and to address any climate. We consider RTG-buildings as a whole, with a specific feature: the common carbon-dioxide/oxygen management, by means of a two ways ventilation system: – A RTG to building flow of oxygen enriched by plants air; – A building to RTG flow of carbon dioxide enriched by people air, which is a carbon fertilizer for the plants; In order to prevent major energy losses and to match weather conditions, the air exchanges with the environment are controlled by ventilation fans. This technique is usual to the railway coaches, where one simultaneously limit the energy losses and
68
M. M. Balas et al.
preserve the passengers comfort by means of a recirculation factor u, a parameter of the balance between recirculated and fresh air. Thanks to the oxygen provided by the RTG plants, when external temperatures become too harsh the building is able to increase u in the range of 0.9–0.95. The limit case u = 1 demanded for the long space journeys may be reached by increasing the vegetal mass.
3 A Mathematical Model for Integrated Roof-Top Greenhouse Our i-RTG mathematical model consists of a system of six nonlinear equations of first order: VS q ca dTdtIS ðtÞ ¼ f½1 UðtÞ DS ðtÞ q ca þ aS SS g ½TE ðtÞ TIS ðtÞ þ NS ðtÞ Po þ PES ðtÞ PS ðt sS Þ þ DRTG ðt sRTG Þ q ca ½TIC ðtÞ TIS ðtÞ VS
VS
dC02S ðtÞ ¼ ½1 UðtÞ DS ðtÞ ½CO2E ðtÞ CO2S ðtÞ þ QO2S dt þ DRTG ðt sRTG Þ ½CO2C ðtÞ CO2S ðtÞ
ð1Þ
ð2Þ
dCC02S ðtÞ ¼ ½1 UðtÞ DS ðtÞ ½CCO2E ðtÞ CCO2S ðtÞ þ NS ðtÞ qCO2 þ QCO2S dt þ DRTG ðt sRTG Þ ½CCO2C ðtÞ CCO2S ðtÞ ð3Þ
IC ðtÞ VS q ca dTdt ¼ f½1 UðtÞ DC ðtÞ q ca þ aC SC g ½TE ðtÞ TIC ðtÞ þ NC ðtÞ Po þ PC ðt sC Þ þ DRTG ðt sRTG Þ q ca ½TIS ðtÞ TIC ðtÞ
VC
VS
dC02C ðtÞ ¼ ½1 UC ðtÞ DC ðtÞ ½CO2E ðtÞ CO2C ðtÞ dt þ DRTG ðt sRTG Þ ½CO2S ðtÞ CO2C ðtÞ
dCC02C ðtÞ ¼ ½1 UðtÞ DC ðtÞ ½CCO2E ðtÞ CCO2C ðtÞ þ NS ðtÞ qCO2 dt þ QCO2C þ DRTG ðt sRTG Þ ½CCO2S ðtÞ CCO2C ðtÞ
ð4Þ
ð5Þ
ð6Þ
with the following parameters: V [m3] volumes, q [kg/m3] air density, ca [J/kg°K] specific heat of the air, TI [°C] internal temperatures, TE [°C] external temperature, U recirculation factor, D [m3/s] air flows, a [W/m2°K] mean heat transfer coefficient through the walls, S [m2] radiant surface, N number of persons, Po [W] mean power emitted by a person, PES [W] power of the greenhouse effect, P [W] heating/cooling power, s [s] delay times, CCO2 [kg/m3] carbon dioxide concentrations, CO2 [kg/m3]
Intelligent Roof-Top Greenhouse Buildings
69
oxygen concentrations, QCO2 [kg/m3s] CO2 emission flows, qCO2 [kg/m3s] CO2 emission flow of a person, QO2 [kg/m3s] oxygen emission flow. Index S refers the greenhouse, index C the building, index RTG the ventilation system between greenhouse and building and index E the environment. For instance CCO2E [kg/m3] is the carbon dioxide concentration outside the building and CO2E [kg/m3] the oxygen concentration outside the building and Us the recirculation factor in the greenhouse.
4 The Simulink Implementation A Simulink implementation of the (1)–(6) model is presented in Fig. 2. Index CIAS (in Romanian Clădire Inteligentă cu Acoperiș Seră) is equivalent to RTG.
Fig. 2. The Simulink implementation
The previous model is standing as a simulation platform, configurable for multiple objectives. The input parameters (orange colored), may be constants, look-up-tables, functions, sub-systems, etc.
70
M. M. Balas et al.
The setting of the next simulations may be observed in Fig. 2. It is to remark the time delays sS = 30 s, sC = 80 s and sCIAS = 60 s, which are expected to cause difficulties for the automated control: oscillations and overdriving, chattering, etc. For instance, a controller for the temperature inside building TIC is presented in the next figure. The imposed temperature Timp is 24 °C and a function for the calculus of the deployed energy E was introduced. The temperature feedback is Ties (Fig. 3).
Fig. 3. The building temperature controller
Two controllers will be tested: a linear PID and a nonlinear fuzzy interpolative [12, 13] (Fig. 4).
Fig. 4. The linear PID controller
The PID controller (P set by Ziegler-Nichols tuning) has the following values: P: 1 I: 0:005 D: 0:005
ð7Þ
Intelligent Roof-Top Greenhouse Buildings
71
The fuzzy-interpolative PID controller is realized by a 3D look-up-table (LUT): P: ½2 1 0 1 2 I: ½0:2 0:1 0 0:1 0:2 D: ½100 0 100
ð8Þ
PID: cat(3, ½2 2 2 1 0; 2 2 1 0 1; 2 1 1 1 2; 1 0 1 2 2; 0 1 2 2 2; ½2 2 2 1 0; 2 2 1 0 1; 2 1 0 1 2; 1 0 1 2 2; 0 1 2 2 2; ½2 2 2 1 0; 2 2 1 0 1; 2 1 1 1 2; 1 0 1 2 2; 0 1 2 2 2Þ The LUT design is driven, in a manner described by Jerry M. Mendel as Sculpting the State Space [14], towards the following objectives: – Overdriving rejection by decreasing the control action around error = 0; – Accelerating the transitions by strong control actions when error is great; – Increasing accuracy in steady regimes by strong control action for error = 0 (Fig. 5);
Fig. 5. The nonlinear fuzzy-interpolative PID controller
The following simulation scenario is set with a simple repeating sequence block, which is defining a daily variation of the external temperature TE: Time ½s: ½0 21600 43200 64800 86400 TE ½o C: ½10 15 18 15 10
ð9Þ
5 Simulation Results The Fig. 6 and Fig. 7 simulations cover a 24 h period for the iRTG system, comparing the linear PID and the fuzzy-interpolative nonlinear PID control responses for the building temperature TIC (no control for the greenhouse temperature TIS).
72
M. M. Balas et al.
a) Temperatures
b) Temperatures
Fig. 6. The performance of the linear PID controller
The linear PID performance is good and smooth in steady regime but very poor in transitions, where a lasting 1.8 °C overdrive is showing, as a result of the time delays and of the inherent limitations of the linear control. The energy consumed during 24 h is EPID = 178.6 kWh. The fuzzy-interpolative PID controller produces a complementary performance (Fig. 7). The steady regime is showing chatter, which is not unusual in fuzzy control, because the fired control rules tend to switch, especially when the controlled plant presents important time delays, as in our case. In this case the consumed energy was lower: EFI = 149.2 kWh, due to the sharp transient regime, with no overdrive.
Intelligent Roof-Top Greenhouse Buildings
73
Fig. 7. The performance of the nonlinear fuzzy-interpolative PID controller
For nonlinear MIMO systems (multiple-input-multiple-output) such as iRTG one can expect undesirable interferences. However, the self-adaptive fuzzy-interpolative control we used provided us robust behavior, as shown in Fig. 8.
Fig. 8. Tic (20 °C) and Tis (18 °C) controlled by fuzzy-interpolative controllers
74
M. M. Balas et al.
The CO2 or O2 concentrations may be also investigated the same way.
Fig. 9. Gas concentration simulation in iRTG
It is to mention that the whole iRTG system, at its minimum configuration has the next I/O structure (Fig. 9): – Inputs: PC, PS, DC, DS, DRTG, UC and US; – Outputs: TIC, TIS, CCO2C, CCO2S, CO2C and CO2S. Although the study on RTGs is only in an early stage, our preliminary observations point that given the high nonlinearity and the strong time variations of the parameters, this MOMO plant demands self-adaptive control at all levels, and intelligent expert supervision in a second stage. That is why we propose a more precise terminology for RTGs: – IRTG for the Integrated Roof Top Greenhouses; – iRTG for the Intelligent Roof Top Greenhouses.
6 Conclusions The paper introduces a promising concept for our sustainable future: the Intelligent Roof-Top Greenhouse Building (iRTG), which results if one apply intelligent control on Integrated Roof-Top Greenhouses (IRTG). iRTGs realize an integrated management of renewable energies, water resources and atmospheric gas composition. If applied at the scale of a whole city iRTGs create the Green Skyline City, with low carbon footprint, high carbon offset and a tight human-plant symbiosis.
Intelligent Roof-Top Greenhouse Buildings
75
The iRTG automated control must cope simultaneously with several actuators (heating/cooling devices and ventilation fans) and to adapt itself to a large variety of parameter configurations. A nonlinear temperature controller able to cope with this complicated high nonlinear and time varying plant is the self-adaptive fuzzy-interpolative one. Acknowledgement. The research was supported by the research contract no. 9542/Dec. 22, 2017.
References 1. NASA: Climate Change and Global Warming. https://climate.nasa.gov/ 2. United Nations: Framework Convention on Climate Change. Adoption of the Paris Agreement. https://unfccc.int/resource/docs/2015/cop21/eng/l09r01.pdf 3. Balas, M.M., Musca, C., Musca, S.: The passive greenhouses. In: Nathwani, J., Ng, A. (eds.) Chapter 5 in Paths to Sustainable Energy, InTech Open, pp. 75–92 (2010). http://www. intech-opencom/books/paths-to-sustain-able-energy/the-passive-greenhouses 4. Balas, M.M.: Seven passive greenhouse synergies. Acta Polytechnica Hungaricaa Budapest 11(4), 199–210 (2014) 5. Watergy International Group. http://www.watergyinternational.com/. Accessed 7 July 2018 6. Rudel, T.K., et al.: Forest transitions: towards a global understanding of land use change. Glob. Environ. Change 15, 23–31 (2005) 7. Taylor Lovell, S.: Designing a Sustainable Urban Agriculture. University of Illinois (2014). http://www.multifunctionallandscape.com/uploads/2014_ESA_DesigningUrbanAgriculture. pdf 8. Eigenbrod, C., Gruda, N.: Urban vegetable for food security in cities A review. Agron. Sustain. Dev. 35(2), 483–498 (2015) 9. Goldstein, B., Hauschild, M., Fernández, J., Birkved, M.: Urban versus conventional agriculture, taxonomy of resource profiles: a review. Agron. Sustain. Dev. 36(1), 1–19 (2016) 10. Pons, O., et al.: Roofs of the future: rooftop greenhouses to improve buildings metabolism. Procedia Eng. 123, 441–448 (2015) 11. Montero, J.I., Baeza, E., Muñoz, P., Sanyé-Mengual, E., Stanghellini, C.: Technology for Rooftop Greenhouses. In: Orsini, F., Dubbeling, M., de Zeeuw, H., Gianquinto, G. (eds.) Rooftop Urban Agriculture. UA, pp. 83–101. Springer, Cham (2017) 12. Balas, M.M.: The fuzzy interpolative methodology. In: Balas, V.E., Varkonyi-Koczy, A.M., Fodor, J. (eds.) Studies in Computational Intelligence, Soft Computing Based Modeling in Intelligent Systems, pp. 145–167. Springer, Heidelberg (2009) 13. Dale, S., Dragomir, T.L.: Interpolative-type control solutions. In: Balas, V.E., Fodor, J., Várkonyi-Kóczy, A.R. (eds.) Studies in Computational Intelligence - Soft Computing Based Modeling in Intelligent Systems, vol. 196, pp. 169–203. Springer, Heidelberg (2009) 14. Mendel, J.M.: Sculpting the state space. Key Note Speech at the 7th World Conference on Soft Computing, Baku, 30 May 2018
Human-Plant Symbiosis by Integrated Roof-Top Greenhouses Marius M. Balas1(&), Ramona Lile1, Lucian Copolovici1, Anca Dicu1, and Kristijan Cincar2 “Aurel Vlaicu” University of Arad, Arad, Romania [email protected], {ramona.lile, lucian.copolovici}@uav.ro, [email protected] 2 West University of Timisoara, Timișoara, Romania [email protected] 1
Abstract. The Integrated Roof-Top Greenhouses (IRTG) are able to create a tight human-plant symbiosis, if provided with a two flows ventilation system, which is conveying O2 enriched air from RTG to the building and CO2 enriched air from building to RTG. Besides improving such way the building’s metabolism, if applied at a large scale IRTGs offer us an effective weapon to oppose the global warming, thanks to their carbon offset capability. IRTGs are also able to harvest local renewable energy resources (geo-thermal, solar, wind, etc.), to store them and to manage them in an integrated way with the water resources. A generic model of this system, taking into account the air exchanges between greenhouse, building and environment is discussed, as well as a fuzzy rule base for the air composition. Keywords: Roof-top-greenhouse Ventilation Fuzzy rule base
Carbon offset Building metabolism
1 Introduction Humans and plants have complementary metabolisms. Humans inspire O2 and exhale CO2, plants inspire CO2 and exhale O2. The vast majority of free oxygen in the atmosphere is the result of cumulated plants’ photosynthesis over more than billion years. The human-plant symbiosis is obviously of mutual benefit. Besides oxygen, vegetarian beings live consuming plants: grains, fruits, bulbs, algae, etc. At the global scale the natural balance between plants and oxygen consuming animals, humans included, was all the time in the favour of plants. Obviously the total vegetal mass has to be largely superior to the animals’ mass. The deserts and other regions with no plants simply cannot sustain animal life. The last three centuries witnessed an exponential growth of human population in the detriment of plants, mainly by deforestations, increasing our carbon footprint at the point to influence the whole planet climate [1]. We aim to find a solution to reduce our carbon footprint and to increase carbon offset by augmenting the photosynthesis capability of our very homes.
© Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 76–83, 2021. https://doi.org/10.1007/978-3-030-52190-5_6
Human-Plant Symbiosis by Integrated Roof-Top Greenhouses
77
2 The Integrated Roof-Top Greenhouse We reckon that a solution able to offer us a sustainable future, solving in the same time some of our most disturbing environmental and climate problems should be based on the well-known Roof-Top Greenhouse (RTG), by its integration with the rest of the building [2, 3]. One obtain such way Integrated RTGs (IRTG), which have a great potential to value the local resources of renewable energy and water, by an integrated management (Watergy) [4]. Furthermore, one can improve the building’s metabolism [2], controlling besides temperatures and the CO2 concentration in the building’s air. IRTGs are associated to the urban agriculture [5–8] that intends to expand the green spaces in our cities over the buildings’ roofs. Besides any other economical considering, growing the vegetal mass will strengthen the carbon offset, acting in the sense of the 2015 UN Framework Convention on Climate Change of Paris [9]. Our version of IRTG is presented in Fig. 1.
Fig. 1. An IRTG system
78
M. M. Balas et al.
The management of the sun energy, either thermal, generated by greenhouse effect or by thermal solar panels, or photovoltaic, may be coped with a broad offer of devices and technologies available on today’s market, which can be chosen according to the climate and the budget. The air-to-air heat pumps are devices able to heat or cool RTGs thanks to the surrounding air energy. Water-to-water heat pumps are used to exploit the most reliable renewable energy, the geothermal one. An innovative IRTG prototype that includes energy, water and CO2 flows in the metabolism of the building is realized and tested by Fertilecity Project, in Barcelona [2]. The Fertilecity Project’s objective is to found a new agricultural production system for Mediterranean urban areas. Our approach aims to generalize the concept for all kind of climates, and to find methods to control and to optimize the balance of energy, water, CO2 and O2. The key item enabling us to reach our objective is the Intelligent RTG (iRTG) that is an IRTG provided with intelligent ventilation and water systems. This paper is discussing a generic iRTG ventilation system, from the automated control point of view.
3 The Simulink Model of an Integrated Roof-Top Greenhouse In a parallel paper we communicated an IRTG mathematical model [10]. A control method for the greenhouse and the building temperatures (fuzzy-interpolative selfadaptive) was illustrated by simulations. The part referring to the CO2 and O2 concentrations of the Simulink implementation is presented in the next figures (Figs. 2, 3 and 4).
Fig. 2. The main window of the IRTG Simulink model
Human-Plant Symbiosis by Integrated Roof-Top Greenhouses
Fig. 3. The gas concentrations sub-model
Fig. 4. The building O2 concentration block
79
80
M. M. Balas et al.
The gas concentrations model comprises four blocks: Greenhouse Concentration CO2, Greenhouse Concentration O2, Building Concentration CO2 and Building Concentration O2. Their structure is very much alike, as shown in Fig. 4. The inputs of the Fig. 4 Building Concentration O2 block are: VC – the building’s volume, DC – the ventilated air flow in the building, UC the recirculated air ratio into the building, DCIAS – the ventilated air flow RTG/building, tauCIAS – the time delay of the IRTG ventilation pipes, CCO2C the CO2 concentration into the building air, CO2C the O2 concentration into the building air, CCO2S the CO2 concentration into RTG, CO2S the O2 concentration into RTG, QCO2C the CO2 emission flow the building air, CO2E the O2 concentration into the outside atmosphere and CCO2E the CO2 concentration into the outside atmosphere. The output is CO2C. The entire system has four outputs: CO2C, CCO2C, CO2S and CCO2S measured in kg/m3. The following simulation exemplifies the IRTG behaviour for several perturbations (no automated control) (Fig. 5): – t = 5000 s: UC switches from 0.8 to 0.75; – t = 40000 s: DCIAS switches from 0.5 to 1.5 m3/s; – t = 60000 s: US switches from 0.9 to 0.8.
Fig. 5. A day long simulation of the CO2 concentrations in RTG (Cco2s) and in the building (Cco2c)
One can observe the effect of the IRTG ventilation: the higher DCIAS is, the CO2 concentrations in RTG and building are closer. This is obviously happening for the O2 concentration and for the temperatures too. High recirculation rates, that is fewer fresh air, increase CO2 concentrations.
Human-Plant Symbiosis by Integrated Roof-Top Greenhouses
81
CO2 concentration is lower in the RTG (Cco2s) than in the building (Cco2c), because the CO2 generating source, the people’s expiration or the gas burning kitchen equipment is located into the building.
4 An iRTG Control Rule Base The previous model is generic, its parameters are not yet properly validated. The plants metabolism is relatively complicated, presenting strong variations with the plants’ variety, the growing stage and density of the plants, the moment of the day and the season, the lightening and watering conditions, the air composition, etc. Although the first simulations regarding IRTG air composition are not very precise, they clearly indicate that such multiple inputs multiple outputs nonlinear system can be coped, at this early stage, only by a comprehensive expert system. Our previous experience with such systems is acquired with fuzzy-interpolative expert systems [11– 13], which has self-adaptive capability, can be designed by sculpting the state space of the variables [14] and can be implemented by look-up-tables with linear interpolation. The iRTG expert system will be developed gradually, embedding knowledge acquired by different means: world knowledge on greenhouses and buildings, general physics knowledge, computer simulation acquired knowledge and practical experience on existing/future prototypes. Some of the control rules that are supporting the first designing stage of the iRTG rule base, dealing with the extreme case, of cold weather, are shown below: IF Temp ext is Low AND Cco2c is not High THAN UC is High AND Us is High % cold weather, CO2 concentration not too high, fresh air is limited IF Temp ext is Low AND Cco2c is High AND Co2s is High THAN DCIAS is High % cold weather, CO2 building concentration too high, RTG can send a lot of O2 in the building and the building sends a lot of CO2 in the RTG IF Temp ext is Low AND Cco2c is High AND Co2s is Low THAN DCIAS is Medium AND US is Medium AND UC is High % cold weather, CO2 building concentration too high, RTG has not much O2, one accept to refresh the air in the building, at low rate, not directly but through the RTG. In case of a sunny/partially sunny day, the fresh air is heated by the greenhouse effect. The above rules were written such way to give priority to the inhabitants’ comfort: the cold fresh air is introduced through RTG. If we prefer to firstly protect the plants, the previous rule can be rewritten: IF Temp ext is Low AND Cco2c is High AND Co2s is Low THAN DCIAS is Medium AND US is High AND UC is Medium For hot weather the rules are not very different, the main protection of the system relies on high recirculation rates and minimum fresh air admission. When weather is not excessive, US and UC may be reduced, for a good ventilation.
82
M. M. Balas et al.
5 Discussion The paper is dealing with the Roof-Top Greenhouse integrated buildings that are creating a tight synergy between the building inhabitants and the roof plants. This synergy is manifestly benefic for both parts and also has an indirect yet strategic role: the upsurge of the global carbon offset due to the increasing vegetal mass located in our buildings. The large scale application of IRTGs has the potential to create future Green Skyline Cities. Future work has to develop the IRTG model by detailing the way humans and plants are consuming and exhaling carbon dioxide, oxygen, water vapors and heat. An expert system for the intelligent management of the IRTG system, aiming to transform it into an iRTG is under development. Our target, a valid model, will help iRTGs designing (structure, sizing, equipment, etc.) and hopefully, will contribute to the concept’s wide acceptance. Acknowledgement. The research was supported by the research contract no. 9542/Dec. 22, 2017.
References 1. NASA: Climate Change and Global Warming. https://climate.nasa.gov/ 2. Pons, O., et al.: Roofs of the future: rooftop greenhouses to improve buildings metabolism. Procedia Eng. 123, 441–448 (2015) 3. Montero, J.I., Baeza, E., Muñoz, P., Sanyé-Mengual, E., Stanghellini, C.: Technology for rooftop greenhouses. In: Orsini, F., Dubbeling, M., de Zeeuw, H., Gianquinto, G. (eds.) Rooftop Urban Agriculture. Urban Agriculture, pp. 83–101. Springer, Cham (2017) 4. Rudel, T.K., et al.: Forest transitions: towards a global understanding of land use change. Global Environ. Change 15(1), 23–31 (2005) 5. Watergy International Group. http://www.watergyinternational.com/. Accessed 7 July 2018 6. Taylor Lovell, S.: Designing a sustainable urban agriculture. University of Illinois (2014). http://www.multifunctionallandscape.com/uploads/2014_ESA_DesigningUrbanAgriculture. pdf 7. Eigenbrod, C., Gruda, N.: Urban vegetable for food security in cities. A review. Agron. Sustain. Dev. 35(2), 483–498 (2015) 8. Goldstein, B.P., Hauschild, M.Z., Fernandez, J., Birkved, M.: Urban versus conventional agriculture, taxonomy of resource profiles: a review. Agron. Sustain. Dev. 36(1), 9 (2016). https://doi.org/10.1007/s13593-015-0348-4 9. United Nations: Framework Convention on Climate Change. Adoption of the Paris Agreement. https://unfccc.int/resource/docs/2015/cop21/eng/l09r01.pdf 10. Balas, M.M., Popa, M., Muller, E.V., Alexuta, D., Muresan, L.: Intelligent roof-top greenhouse buildings. In: Proceedings of SOFA’18 the 8th International Workshop on Soft Computing Applications, Arad, 13–15 September 2018 (2018, in Press) 11. Balas, M.M.: The fuzzy interpolative methodology. In: Balas, V.E., Varkonyi-Koczy, A.M., Fodor, J. (eds.) Studies in Computational Intelligence, Vol. Soft Computing Based Modeling in Intelligent Systems, pp. 145–167. Springer (2009)
Human-Plant Symbiosis by Integrated Roof-Top Greenhouses
83
12. Balas, M.M., Balas, V.E.: The fuzzy interpolative control for passive greenhouses. In: Teodorescu, H.N., Watada, J., Jain, L.C. (eds.) Intelligent Systems and Technologies, Chap. 12 vol. 217, pp. 219–231. Springer (2009) 13. Balas, M.M., Buchholz, M., Balas, S.: Expert control for the coupled tanks greenhouse. In: Proceedings of the 6th International Workshop on Soft Computing and Applications SOFA 2014, Timisoara, vol. 2, pp. 939–948, July 2014 14. Mendel, J.M.: Sculpting the state space. In: Key note Speech at the 7th World Conference on Soft Computing, Baku, May 2018
Fuzzy Scoring Theory Applied to Team-Peer Assessment: Additive vs. Multiplicative Scoring Models on the Signed or Unsigned Unit Interval Paul Hubert Vossen1(&) and Suraj Ajit2 1
SQUIRE Research Institute, Kiefernweg 1A, 97996 Niederstetten, Fed. Rep. of Germany 2 Computing, Faculty of Arts, Science and Technology, University of Northampton, Waterside Campus, University Drive, NN1 5PH Northampton, UK
Abstract. Teamwork in educational settings for learning and assessment has a long tradition. The reasons, goals and methods for introducing teamwork in courses may vary substantially. However, in the end, teamwork must be assessed at the group level as well as on the student level. The lecturer must be able to give students credit points or formal grades for their joint output (product) as well as for their cooperation in the team (process). Schemes for such multicriteria quantitative assessments appear difficult to define in a plausible way. Over the last five decades, numerous proposals for assessing teamwork processes and products on team and student level have been given using diverse scoring schemes. There is a broad field of empirical research and practical advice about how team-based educational assessment might be set up, implemented, improved, and accepted by staff and students. However, the underlying methodological problems with respect to the merging of several independent measurements have been severely underestimated. Here, we offer an entirely new paradigm and taxonomy of teamwork-based assessment following a rigorous fuzzy-algebraic approach based on two core notions: quasi-arithmetic means, and split-join-invariance. We will show how our novel approach solves the problem of team-peer-assessment by means of appropriate software tools. Keywords: Performance assessment scoring systems Team-peer-assessment Collaborative learning Learning groups Scoring algebra Additive scoring Multiplicative scoring Quasi-arithmetic means Split-join-invariance Scoring function Scoring equation Peer rating Student scoring Zooming factor
1 Introduction Team-Peer-Assessment (TPA for short) has a long tradition (cf. Dochy et al. 1999; Falchikov 1986; Falchikov 1993; Falchikov and Goldfinch 2000; Gibbs 2009; Strijbos and Sluijsmans 2010; Topping 1998; Van Rensburg 2012). One of the most cited approaches has been described in a paper by Sharp (2006) (see Sect. 2). Recently, © Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 84–111, 2021. https://doi.org/10.1007/978-3-030-52190-5_7
Fuzzy Scoring Theory Applied to Team-Peer Assessment
85
research practice has moved from assessment model development to empirical studies focussing on didactic, cognitive and social aspects of the assessment of teamwork (cf. Dijkstra et al. 2016). However, it would be false to conclude that the problem of deriving individual student marks, scores or grades from corresponding measures on team level has already been solved adequately and completely. On the contrary, most TPA practices still rely uncritically on scoring proposals following statistical and similar approaches which don’t consider that educational assessment practices almost invariably work with twosided bounded scales quite different from the real numbers used in statistics (cf. Sharp 2006). Other proposals follow standard psychometric approaches like Item Response Theory (e.g., Ueno 2010; Uto and Ueno 2016) which are well-suited for educational research, but not for everyday practice in the classroom. Therefore, in 2007, we started a research project aimed at developing a measurement approach for educational assessment which is geared to the needs of practising lecturers, without sacrificing formal requirements such as correctness, completeness, flexibility, and simplicity. Our approach is based on concepts, methods, techniques and tools borrowed from diverse fields, notably fuzzy mathematics (e.g. Fodor 2009; Dombi 1982), measurement theory (e.g. Batchelder et al. 2016), functional equations (e.g. Aczél 1966; Ng 2016), and the theory of quasi-arithmetic means, a powerful generalisation of the common arithmetic mean (Kolmogorov1930; Jäger 2005). Previous results have been presented at numerous conferences and published in the conference proceedings, e.g. (Vossen and Kennedy 2017a, b, Vossen 2018). Here, we will present our most recent results regarding the shift to such a rigorous educational measurement paradigm for TPA (see also Bukowski et al. 2017). In a nutshell, the fundamental TPA problem can be formulated as follows: How to assign correct and fair individual scores to the members of a learning team, if the only judgmental data we have, are an overall team output score assigned by the lecturer, and the mutual peerwise ratings of team dynamics by the students (excl. self-assessments)?
Although the same approach may also be used outside the TPA paradigm, e.g. when the lecturer himself provides all judgmental data or when assessment is fully integrated in an e-learning context, we will not further elaborate on this here. Likewise, empirical studies about the acceptance, dissemination and actual use of the new TPA ideas presented here are not the topic of the current paper, but we encourage educational practitioners to reflect seriously on their own practice of TPA and compare it with the approaches and models we propose here. We will show that a full theory and method of TPA can be developed bottom-up from a few core concepts, constructs and principles, notably the concept of QuasiArithmetic Mean (see glossary) and the principle of Split-Join-Invariance. In a nutshell, the latter principle states:
86
P. H. Vossen and S. Ajit
If the overall team score, as judged by the lecturer , based on a course-dependent list of assessment criteria, is split into separate, possibly different student scores using a predefined scoring function, then the corresponding quasi-arithmetic mean of the individual student scores shall be equal to the initial lecturer's score.
Here is a graphical illustration of this principle:
Fig. 1. Illustration of the Split-Join-Invariance principle (see text and glossary for explication)
It turns out that there are four distinct scoring functions or rules making up a two by two taxonomy of assessment schemes or models: signed additive (A±), unsigned additive (A+), signed multiplicative (M±), and unsigned multiplicative (M+). The following figure gives a high-level description of these four scoring rules. Full details and derivations will be given in Sect. 4. A fully worked out representative assessment for a team of five students using one of the four scoring models will appear in Sect. 3. For a quick look-up and short explanations of the main concepts of TPA, see also the alphabetic glossary at the end of this paper. However, before we delve into this novel theory of Team-Peer-Assessment, let us give a reconstruction of two classical approaches in which scores (used by the lecturer to assess the team’s product) and ratings (used by the students to assess the team’s process) are incorrectly treated as real numbers endowed with ordinary arithmetic addition and multiplication instead of fuzzy-mathematics based quasi-arithmetic addition and multiplication of scores or ratings on bounded ranges.
Fuzzy Scoring Theory Applied to Team-Peer Assessment
87
Fig. 2. The two by two taxonomy of TPA scoring models
2 Two Reconstructions 2.1
The Linear Statistical Approach of SHARP
The linear statistical approach of Sharp (2006, p. 335) which uses well-known statistical techniques of analysis of variance (ANOVA) is based on a simple assumption about the relationship between marks (percentages) and peer ratings (unspecified) (p. 341):
¼ u Sj Mj M S
ð1Þ
is the so-called location factor set equal to the Here, Mj is the final mark of a student, M tutor mark (as a global average for the entire team, see below), Sj is the contribution of is the (arithmetic) mean of all peer ratings. The student j to the joint group work, and S constant of proportionality u (Sharp’s terminology) is an arbitrary rescaling factor “to be determined in the light of empirical evidence” (p. 333). What kind of empirical evidence is meant here and how a choice can be justified in the light of it is not fully clear from the it may and/or S, paper. Note however that in the case of (too) large deviations from M readily happen that Mj > 100, the given upper limit of marks (p. 335: “The scales of linear statistical models… are not constrained within particular intervals”). A substantial part of the appendix to Sharp’s paper (pp. 341–343) is devoted to the calculation of a statistical measure of detectability (A) which shall be used to decide at all or not. If A is such that the tutor whether to moderate tutor’s overall score (M) may conclude that the differences in rating between the students are not significant (in statistical terms) then the recommendation is not to moderate (“… ‘switch the system off’ for that group.”, p. 338). This aspect of the proposal is highly technical, as Sharp acknowledges himself (p. 337): “The method described here is rigorous but not easy to explain to students unfamiliar with analysis of variance.”. We hasten to add, that (this
88
P. H. Vossen and S. Ajit
part of) the method may also appear prohibitive for tutors or lecturers from disciplines in which empirical research based on (advanced) statistical techniques is not in their usual bag of skills. At the end of his paper, Sharp warns, that because of the statistical approach his method only works reliably for teams of size four or more: small teams of 2 or 3 students are ignored at all. It will be clear from our summary that we have strong doubts about the viability of Sharp’s proposal. On the one hand, it uses a linear scoring model and formula which are not appropriate for the type of measurement scales used for scores and ratings and unnecessarily relies too much on the creativity and intuition of the assessor. On the other hand, it introduces research techniques that may well be adequate for empirical researchers, but that may overwhelm the intended population of practising lecturers and students in practice-oriented disciplines (think e.g. of language courses, design curricula, and practical medical, legal or social studies). 2.2
The Quadratic Numerical Approach of Nepal
The paper (Nepal 2012) is relevant because of its unusual approach to the issue of how much the final student score may deviate from the score given by the lecturer. Nepal proposes a peer rating model which substantially differs from the prototypical approaches found elsewhere because it breaks with the dominant linear statistical tradition which culminated in the proposal in (Sharp 2006). Unlike Sharp’s, his basic equation is non-linear in the peer assessment influence on lecturer’s score. It is a quadratic polynomial with two parameters. We will start with a reconstruction, pointing out its good ideas, and then conclude why we believe that this approach nonetheless does not enable a real breakthrough a.o. because of its very specificity. Nepal uses an individual weighting factor IWF which is also implicit in the default solution for u proposed by Sharp (2006: p. 342), except for how the average contribution is calculated: IWF ¼
Individual contribution ð%Þ Individual contribution ð%Þ ¼ Average contribution ð%Þ ð100=nÞ
ð2Þ
Nepal considers the determination of the individual weighting factor IWF as nonproblematic. The determination rests upon an estimate of the individual contribution to the project using co-assessments (p. 555), and a fixed average contribution of 100/n, where n is the team size (p. 566). However, Nepal suggests an entirely different way of adjusting the team mark TM (on a scale from 0 to 100) by IWF (a non-negative real number). After shortly reviewing four well-known formulae for calculating an individual mark IM on the base of TM and IWF, Nepal introduces his so-called parabolic formula (p. 557): 2
IWF 6 ðIWF1Þ2 IWF 2a IM ¼ TM 6 TM 4 ð1100 Þ a TM 1 þ 2 1 100
½IWF 1 TM 1\IWF\1 þ a 1 100 TM IWF 1 þ a 1 100
ð3Þ
Fuzzy Scoring Theory Applied to Team-Peer Assessment
89
Here, IM is the mark awarded to an individual team member; IWF is the individual weighting factor; TM is the team mark, and a is a scaling parameter. This parameter a 0 is introduced to enable adjustment of the impact of IWF to local “mark-grade translations” (his phrase, p. 557), but it also plays a role in enforcing the resulting individual mark to stay within the allowed range from 0 to 100 (p. 558). As can be seen from the definition of IM (individual mark) in formula (3), Nepal distinguishes three regions of different behaviour of this function where t = TM/100: up to 1, between 1 and 1 + a(1 − t), and above 1 + a(1 − t). For w = IWF up to 1, IM is independent of t and a. From 1 + a(1 − t) upward, IM will be equal to 1 + ½a(1 − t), the maximum value of the parabolic function defined in (3) for the region between 1 and 1 + a(1 − t). The regions and behaviour of the function are chosen such that there are no gaps (discontinuities) at the transition points 1 and 1 + a(1 − t). Here is an example of the calculation of student score IM for TM = 80, w = 2, and a = 5. As w = 2 = 1 + 5 (1 – 0.8), it follows that the third case of formula (3) applies: IM = 1 + ½ 5 (1 – 0.8) = 1½. Since TM = 80, this yields an individual mark of 80 1½ = 120. Obviously, this student has done quite well. Moreover, due to the high values of both w and a, the calculated individual mark of 120 is beyond the maximum allowable mark of 100. The “creative solution” is to set the final individual mark to min(100, 120) = 100. Nepal’s proposal is in certain respects an improvement upon previous peer rating or co-assessment proposals. First, Nepal defined explicitly a standard average contribution to a project team which only depends on the team size (n), if the total workload will be evenly distributed and taken up by all members of a cooperative and well-coordinated team. In a sense, it is like criterion-based testing instead of norm-based testing, where the criterion is “equal contribution by all team members in percentages of the required workload” and the norm would have been “average contribution of all team members as indicated by the peer assessment reports”. Unclear is what should be changed in the calculation, if the team does not deliver according to the total required workload (team underachievement) or if the team delivers more than required (team overachievement). Second, Nepal courageously departs from the usual linear-statistical thinking style of the mark or score adjustment, which uses something of the form at + b, where t stands for a team score, and a and b are some formally or empirically determined modifiers or parameters, e.g. peer ratings as discussed here. Instead, he suggests, that this way of adjusting the team mark is only OK for individual weightings up to one. However, for individual weightings above this standard, or default value of 1, the effect of the weighting upon the team score should be deflated, to avoid overambitious team members to “eat up” all the work instead of co-operating with their peers (the reverse of free-riding, so to speak). The scaling parameter is used to adjust the individual weightings to be in line with local marking policies. However, Nepal points out, that only a scale factor smaller than two guarantees that the adjusted mark will fall within the allowable range from 0 to 100. In fact, the condition is that the product of the scaling factor and the team score is less than two, but that does not change the line of argument. Thus, the problem of outof-range scores is not eliminated at all. The underlying cause is that polynomial
90
P. H. Vossen and S. Ajit
functions are not appropriate for simultaneously enforcing an upper and lower bound on the values of a function unless one arbitrarily restricts the range of the function variable.
3 A Gentle End-User’s Introduction to the TPA Approach Let’s start with a realistic case study of peer assessment for a team of five students using a standard spreadsheet tool (here, we have used Excel, but any other modern spreadsheet software would do as well, e.g. google sheets). For the moment, we don’t care about which of the four standard models will be used: for the end-user (lecturer or student), all would appear exactly the same, except of course for the scores or ratings that are allowed, as that depends upon whether the lecturer uses a signed [−1, +1] or unsigned [0, 1] scale. Let’s assume the lecturer starts with entering his judgments: the team product score tlecturer = 0,70 (+70%), the default zooming factor zlecturer = 1,0 (decreasing or increasing the impact of student ratings on the team score) as well as the individual student weights w1, … w5 (Fig. 3):
Fig. 3. Data input of the lecturer: zooming z, weights w and team score t.
As can be seen, there were some issues with students 2, 3 and 5, the others take their usual share of 1/5 = 20%. Students 2 and 3 may have been absent or sick for a short period of time, or there may be other reasons why they contributed less to the project than formally required or expected. On the other hand, student 5 apparently took over the work which was not done by students 2 and 3. The weights as such don’t tell us anything about the quality of the work, it’s just a quantitative indication of the participation or involvement of a student in the teamwork due to external factors. The zoom factor or zoom parameter z has been set by this lecturer on its default value of 1, which means that the lecturer has initially no reason to manually shrink or stretch the range of final student scores. That is, he or she will be confident that the students will do their best to come up with differential peer ratings so that systematic differences in cooperative behaviour of peers will be captured and have corresponding impacts on their final scores. The zooming factor may later be adjusted if this initial hypothesis about correct and fair peer rating turns out to be somehow wrong or not plausible.
Fuzzy Scoring Theory Applied to Team-Peer Assessment
91
Now it’s time for the students to deliver their judgments of each other’s cooperative behaviour during teamwork, which is a pragmatic way to capture team dynamics. Each team member will rate each other team member on a list of process quality criteria using the same scale which will be used for all other judgments (this simplifies the scheme but is not really necessary). The ratings for one and the same peer (5 – 1 = 4, i.e. excluding self-assessment) will be averaged using the quasi-mean belonging to the scale. This rating will be recorded in the so-called peer assessment matrix (see Fig. 4):
Fig. 4. Mutual peer ratings of all members of the learning team
Given that we are working here with a signed additive scoring scheme, the students appear to be very satisfied with each other’s cooperation in the team. Still, there are (relatively small) differences, but no systematic outliers. How strong will this impact the final student scores which can now be calculated? Here are the numbers (Fig. 5):
Fig. 5. Calculated student ratings (ri), mean rating (s), student scores (si) and mean score (t)
92
P. H. Vossen and S. Ajit
We happily observe that the individual peer ratings are indeed spread out clearly, from 0,52 (r1) to 0,86 (r4), so there seems to be no reason to adjust the zooming factor. Their impact on the team score of 0,70 set by the lecturer is clear. We see this immediately in the following diagram (Fig. 6):
Fig. 6. Peer adjusted student scores around a team score of 0,70 set by the lecturer for a default zooming (impact) factor of one
Nevertheless, for didactic purposes, let’s increase the zooming factor to two so that we can explore its effect, or impact, on the dispersion (i.e., differentiation) of the final student scores (Fig. 7, next page). Now the scores range from 0,25 to 0,93, which will very probably have a noticeable impact on the grades that will be calculated from these scores (grades will be no issue in this paper, as the types of grading systems worldwide are too diverse to be considered in our system). Note, however, that all scores are still above the pass-fail threshold of 0 on the signed additive scale [−1, +1] that we are using here. Finally, it is very important to point out that the principle of Split-Join-Invariance (SJI) has been satisfied (cf. Fig. 1, glossary). This can be clearly seen in Fig. 4, where tlecturer and tmodel are exactly equal, as required by SJI. If the team average calculated by the model (tmodel) would not be equal to the initial global team score given by the lecturer (tlecturer), then students would have reason to doubt the correctness of the assessment procedure: either the model didn’t take the lecturer’s global score into due account or the lecturer didn’t take all the available evidence about teamwork into account. This is the very reason why we formulated and introduced SJI.
Fuzzy Scoring Theory Applied to Team-Peer Assessment
93
Fig. 7. Peer adjusted student scores around a team score of 0,70 set by the lecturer for an increased zooming factor of two
4 Basics of the Scoring Models The TPA tool introduced in the preceding section rests on the following conceptual model (Fig. 8):
Fig. 8. Input-output diagram of our TPA scoring models
94
P. H. Vossen and S. Ajit
z The core concept is the scoring formula or function fs;t ðri Þ which transforms peer ratings ri into student scores si. Evidently, this function needs three parameters or modifiers to work:
• The team product score tlecturer. This team score is initially set by the lecturer based on the team’s products (outputs, deliverables) of any type relevant for the original task assignment, e.g. a paper, a design prototype, software or hardware, the solution of a mathematical problem, translation, presentation, demonstration, etc. Usually, this product score will be calculated as the mean of several quality factors or criteria using the adopted scoring scale (signed or unsigned, additive or multiplicative). The default values are scale-dependent, 0 or ½. • An initial team zoom factor zlecturer. This zooming parameter determines the spread of final student scores around the team product score. It is a non-negative real number, with a default setting of 1, which gives the simplest scoring function. To get more differentiation, as perhaps required by faculty admin, the scores may be spread (stretched) by setting z > 1; setting z < 1 reduces the spread (shrinking), e.g. if there seems to be adverse coalition forming in the team. With z = 0, peer assessment will effectively be disabled, e.g. in the extreme case that apparently the students didn’t understand or follow the rules of the (assessment) game or if their team dynamics was ostensibly completely out of order. • The team rating s, i.e. the mean of the individual student ratings. This quasi-mean can be calculated straightforwardly once the student ratings have been calculated from the peer matrix. It needs the student weights, which are non-negative real numbers adding up to one. Once the final student scores have been calculated by means of the scoring function, it is an easy matter to calculate the mean score for this team, again using the student weights set by the lecturer (as before for the team rating). Due to the SJI principle, this calculated team score shall be equal to the initial team score given by the lecturer. If not, there is something wrong either with the data entered (invalid data input, e.g. out of range), with the parameter settings (e.g., invalid parameter values) or with the spreadsheet (implementation errors, e.g. lost or false functions). The only component which is still undefined at this point in the discussion is the z scoring function fs;t ðri Þ. How shall we proceed? In hindsight, it is quite simple. To find adequate scoring functions, we have constructed four scoring algebras. Here are the basics: (1) analogously to the real line, we have endowed the additive scoring scales with suitable operations of addition, subtraction, and scalar multiplication with a positive number (division is missing, we don’t need it); (2) analogously to the positive real line, we have endowed the multiplicative scoring scales with suitable operations of addition, scalar multiplication with a positive number and scalar division yielding a positive number (subtraction is missing, it is not available). The important thing about these operations is the following. When one applies addition or subtraction to two scores or ratings one again gets a valid score or rating
Fuzzy Scoring Theory Applied to Team-Peer Assessment
95
within the original range. Scalar multiplication and scalar division are somewhat different. With scalar multiplication, a non-negative number is applied to a score or rating, yielding again a valid score or rating. With scalar division, one score or rating is divided by another score or rating to yield a non-negative number, i.e., a scalar, which when applied to the latter score or rating - would again yield the former score or rating. [It would be possible to define genuine multiplication and division though at the cost of getting more complicated formulae. We hope to find another way to handle scalar division, but for the moment it works fine]. Obviously, the operations we have just introduced will not be the usual arithmetic operations of addition, subtraction, multiplication and division on the real line. To remind us of this fact and to be able to distinguish between ordinary arithmetic operations and the new quasi-arithmetic operations we have chosen proper symbols for them which remind us of their corresponding arithmetic ones: • • • •
x y for quasi-addition, defined in all models x y for quasi-subtraction, defined in both additive models r x for quasi-multiplication, defined in all models x y for quasi-division, defined in both multiplicative models
With these operations, we can define quasi-arithmetic means (quasi-means for short) in all models, where t is the team score and s is the team rating: ð4Þ ð5Þ Finally, we are ready to give a concise definition of the additive and multiplicative scoring functions in terms of scores, mean score, ratings, mean rating, zoom factor and the quasi-operations we have introduced before: ð6Þ
Expressed in the usual arithmetic notation, the scoring equations differ not only according to their type, i.e. additive or multiplicative, but also depending on their range, i.e. whether we use the signed range [−1, +1] or the unsigned range [0, 1]:
96
P. H. Vossen and S. Ajit
Fig. 9. The scoring equations for all four scoring models in arithmetic notation (cf. Fig. 2).
We end this section with some of the obvious but important properties of the scoring function which hold for all four scoring models: z z ðrÞ\ fs;t ðr0 Þ fs;t
if and only if
z fs;t ðlowest ratingÞ ¼ lowest score
r\r0
ðeither 1 or 0Þ
z fs;t ð 1Þ ¼ 1
ð7Þ ð8Þ ð9Þ
z ðsÞ ¼ t the mean rating corresponds to the mean score fs;t
ð10Þ
0 ðrÞ ¼ t if z ¼ 0 then all scores equal the team score fs;t
ð11Þ
1 fs;t ðrÞ ¼ lowest score
if r \ s
1 ðrÞ ¼ 1 if s \ r fs;t
ð12Þ ð13Þ
5 The Four Standard TPA Scoring Models In the preceding section, we have boldly postulated four basic quasi-arithmetic operations for the scoring models presented in Sect. 1 without saying exactly how these operations are defined. Instead, we just used those to-be-defined operations to formulate three required constructs: (1) the common concept of the quasi-arithmetic mean for all scales, (2) the additive and multiplicative scoring formulae, and (3) the resulting scoring equations for the four scoring models.
Fuzzy Scoring Theory Applied to Team-Peer Assessment
97
In this section, we will show, for each of the four scoring models, how to define the required quasi-operations. It turns out, that all we have to do is to specify the correct rescaling function for each of the four models. A rescaling function, say u, maps the scores or ratings either to the real numbers (in the additive case) or to the non-negative real numbers (in the multiplicative case). The quasi-operations will then be defined in such a way that they correspond in a nice and simple way to the usual arithmetic operations (so-called Cauchy equations, cf. Ng 2016). Here is how, where x and y are scores or ratings, and r is an arbitrary non-negative number (technically called scalar): ð14Þ ð15Þ ð16Þ ð17Þ Now, each scoring model can be characterized by a unique rescaling mapping u. Using u , each of the formulae given in the preceding section can be “translated” in ordinary arithmetic, as we have already done for the scoring equations (Fig. 9). As an intermediary step, we may as well get rid of the special operator symbols we have introduced in the previous section. For instance, applying the rescaling function u to the definition of the team score in Eq. (4), we get the usual definition of a quasiarithmetic mean (cf. Aczél 1966): Xn Xn uðtÞ ¼ u ni¼1 wi si ¼ uðwi si Þ ¼ w uðsi Þ i¼1 i¼1 i
ð18Þ
which is most often rendered in the following form (applying u−1 to both sides): t ¼ u1
Xn
w uðsi Þ i¼1 i
ð19Þ
In words: the quasi-mean of scores si is just like the usual arithmetic mean of those si, except that before weighting those scores are rescaled by u and after summing the weighted rescaled scores the result is mapped back to the original (signed or unsigned) scale by applying the inverse of u (the rescaling function will always be one-to-one, so that an inverse exists). It is simple to prove, that for instance the well-known geometric mean, harmonic mean and power means are just quasi-arithmetic means for some elementary u (Jäger 2005). Now, this is all one needs to know to understand how the four scoring models have been constructed. Of course, the crucial and difficult step is to find suitable rescaling functions u, a different one for each of the four models. Luckily, in terms of u, there are close relationships between the two additive models and the two multiplicative
98
P. H. Vossen and S. Ajit
models, respectively. On the other hand, there are clear differences between the additive and the multiplicative scales which deserve fuller attention in future publications. 5.1
The Signed Additive Scoring Model A–
The signed additive scoring function will be defined on the standard scale [−1, +1]. In practice, such scales may run from n to +n, for any integer n from 1 upward, and scores or ratings on such a (2n + 1)-point scale will be standardized before any further calculations are made. To define the three operations of addition , subtraction and scalar multiplication we need the following rescaling function u : ½1; þ 1 ! R and its inverse: ð20Þ Applying the Cauchy Eqs. (14–16), we get the following definitions and properties: (Fig. 10)
Fig. 10. The three operations on the scale [−1, +1] for the additive model.
It is not difficult to check that these operations are well-defined, i.e. addition, subtraction and scalar multiplication (with a non-negative number) always yield valid scores or ratings, with 0 and 1 playing special roles. The formula for the quasiarithmetic mean scoring t can now be explicitly given (the quasi-mean rating s is defined analogously):
Fuzzy Scoring Theory Applied to Team-Peer Assessment
99
ð21Þ
Taking up the scoring function for the additive model on [−1, +1] from formula (6) and applying the foregoing definitions we get:
ð22Þ
5.2
The Unsigned Additive Scoring Model A+
The unsigned additive scoring function will be defined on the standard scale ½0; 1. In practice, such scales may run from 1 to n, for any integer n from 2 upward, and scores or ratings on such a n-point scale will be standardized before any further calculations are made. To define the three operations of addition, subtraction and scalar multiplication we need the following rescaling function and its inverse: ð23Þ Applying the Cauchy Eqs. (14–16) we have the following definitions (Fig. 11):
100
P. H. Vossen and S. Ajit
Fig. 11. The three operations on the scale [0, 1] for the additive model.
It is not difficult to check that these operations are well-defined, i.e. addition, subtraction and scalar multiplication (with a non-negative number) always yield valid scores or ratings; ½ and 1 are the neutral elements. The formula for the quasi-arithmetic mean scoring t can now be explicitly given (the quasi-mean rating s is defined analogously):
ð24Þ
Taking up the scoring function for the additive model on [0, 1] from formula (10) and applying the foregoing definitions we get:
Fuzzy Scoring Theory Applied to Team-Peer Assessment
101
ð25Þ
5.3
The Signed Multiplicative Scoring Model M–
The signed multiplicative scoring function will be defined on the standard scale ½1; þ 1. In practice, such scales may run from n to +n, for any integer n from 1 upward, and scores or ratings on such a (2n + 1)-point scale will be standardized before any further calculations are made. To define the operations of addition , scalar multiplication and scalar division we need another rescaling function u1 : ½1; þ 1 ! R and its inverse: ð26Þ Applying the Cauchy Eqs. (14, 16–17) we get the following definitions (Fig. 12):
Fig. 12. The three operations on the scale [− 1, +1] for the multiplicative model.
102
P. H. Vossen and S. Ajit
It is not difficult to check that these operations are well-defined, i.e. addition and scalar multiplication (with a non-negative number) always yield valid scores or ratings and scalar division always yields a scalar; −1 and 1 are the neutral elements. The formula for the quasi-arithmetic mean scoring t can now be explicitly given (the quasimean rating s is defined analogously):
ð27Þ
Taking up the scoring function for the signed multiplicative model on [−1, +1] from formula (6) and applying the foregoing definitions we get:
ð28Þ
5.4
The Unsigned Multiplicative Scoring Model M+
The unsigned multiplicative scoring function will be defined on the standard scale ½0; 1. In practice, such scales may run from 1 to n, for any integer n from 2 upward, and scores or ratings on such a n-point scale will be standardized before any further calculations are made. ð29Þ Applying the Cauchy Eqs. (14, 16–17) we get the following definitions (Fig. 13, see next page) It is not difficult to check that these operations are well-defined, i.e. addition and scalar multiplication (with a non-negative number) always yield valid scores or ratings and scalar division always yields a scalar; ½ and 1 are neutral elements. The formula
Fuzzy Scoring Theory Applied to Team-Peer Assessment
103
Fig. 13. The three operations on the scale [0, 1] for the multiplicative model.
for the quasi-arithmetic mean scoring t can now be explicitly given (the quasi-mean rating s is defined analogously):
ð30Þ
Taking up the scoring function for the additive model on [0, 1] from formula (10) and applying the foregoing definitions we get:
ð31Þ
104
P. H. Vossen and S. Ajit
6 From Excel to Professional Software Tools Currently, our TPA models are mainly implemented and applied using Excel, with or without the help of VBA (Visual Basic for Applications). Here is an example. The example consists of two so-called dashboards. The first dashboard (Fig. 14) is mainly used for recording the names of the students (“peers”) in the team and their mutual peer ratings. It consists of the following parts: • • • • • • •
A: Lecturer’s name or ID B: Course name or ID C: Team name or ID D: Name or ID’s of the peers + student weights (in percentages) E: Peer to be assessed (drop-down menu) F: Peer ratings from the other students (slider) G: Peer rating matrix (lower part) + Student ratings (upper part)
Fig. 14. Implementation of a TPA model in Excel-VBA: dashboard I.
Fuzzy Scoring Theory Applied to Team-Peer Assessment
105
Fig. 15. Implementation of a TPA model in Excel-VBA: dashboard II.
The second dashboard (Fig. 15) is used by the lecturer to enter the team score, the team weighting (a parameter not used any more) and the zoom factor, and for calculation and graphical presentation of the final student scores. It consists of the following parts: • • • • •
A: Replication of the student ratings (taken from Dashboard I) B: A slider for entering the team score (here dubbed grade) C: A slider for entering a team weighting factor (*not used any more) D: A slider for entering the zooming factor E: Presentation of the final student scores
Our long-term goal is to develop and distribute a fully stand-alone software package written in one of the mainstream programming languages, e.g. Java, with an excellent User-Friendly Interface. As a first step in this direction we have already run two student projects, one rather small end-of-course assignment, the other a more challenging bachelor thesis project, in which selected modules of the Team-PeerAssessment system have been programmed (Cresta 2018). In general, the latter prototype follows the logic of the Excel-VBA implementation shown above but it permits on-the-fly selection of one of the scoring models and it has a
106
P. H. Vossen and S. Ajit
more modern look-and-feel. The results are very encouraging, but we need more of such pilot projects to convince commercial software developers of the potential merits of our TPA approach.
7 Conclusion In this paper, we have presented a detailed account of our innovative Team-PeerAssessment approach based on an advanced theory of scoring and rating based on fuzzy algebra. For the first time, we have shown that it is possible to consistently define so-called additive and multiplicative scoring models for peer assessment. These scoring models are based on a few sound principles and behave correctly and fairly, primarily thanks to the Split-Join-Invariance principle. It is now up to the community of educational practitioners and researchers to investigate the suitability and practicality of the approach in the classroom. Many details and advantages of our approach will only become clear when practitioners and researchers compare their own approach with our TPA proposal. We will be glad to offer support to initiate such pilot projects in the context of teamwork-based courses.
8 Glossary The definitions pertain specifically to this paper, if not stated otherwise. Arithmetic mean (AM): A well-known quantity used to represent a set of measures of one and the same phenomenon. Often used in the context of statistical data analysis, but that is not a requirement. Because the measures are added together after weighting them by weighting factors (all summing to one) to arrive at the quantity, it assumes that the measures can take on any real value. If measures are restricted to bounded ranges (intervals), arithmetic means may give distorted results. Differential scoring: In the context of teamwork as an assessment method, a type of scoring that may produce different scores for different students – in contrast to the default approach by which all students will get the same (team) score. Geometric mean (GM): Another well-known quantity used to represent a set of measures of one and the same phenomenon. Often used in the context of statistical data analysis, but not a requirement. Because the measures are multiplied with each other after raising them to a exponential weight (all summing to one) to arrive at the quantity, it assumes that the measures can take on only non-negative real values. If measures are restricted to other bounded ranges (intervals), geometric means may give distorted results. Mean: Generally, a quantity assumed to faithfully represent a set of given measures of one and the same phenomenon. Often used in the context of statistical data analysis. Here, we use the algebraic notion of mean, which is precisely defined by a small number of axioms. See e.g. Kolmogorov (1930).
Fuzzy Scoring Theory Applied to Team-Peer Assessment
107
Mean score (t): In the context of teamwork, the mean score is the score which results from calculating the mean of the individual scores of all members of that team. Mean team rating (s): In the context of team assessment, the mean team rating is the rating which results from calculating the mean of the individual ratings of the students in a team. If there can be no confusion, it will also be called mean rating or team rating. Mean peer rating (ri): In the context of TPA, the mean peer rating is the rating which results from calculating a mean (QAM or QGM) of all peer ratings for one student (with index i) in that team. Peer assessment: Part of TPA’s procedure in which the students of a team assess each other. Another part of TPA is the criteria-based assessment of a team’s product which is conducted by the lecturer to set the team score. Finally, the set of formulae which glue all the different measurements together (mainly quasi-arithmetic mean and scoring formula), may be thought of as a third part of TPA. Peer matrix: A square matrix of order n (the size of the team) which systematically captures all peer ratings of all members of a team. Self-ratings (a student rating himor herself) are not foreseen in TPA. Peer rating (rij): In the context of TPA, a peer rating is the rating which results from calculating the mean (QAM or QGM) of all criteria-based judgments from one student about another peer in that team. Quasi-addition (): A quasi-addition is like common arithmetical addition on the reals with the difference that the range of admissible values is restricted to a subset of the reals. Here, we only consider the two standard ranges signed unit interval and unsigned unit interval, as usual in fuzzy mathematics. Furthermore, quasiaddition requires a so-called rescaling function (here always denoted by /) which uniquely defines the rescaled quasi-sum of x and y as the sum of the rescaled values of x and y. Quasi-subtraction ( ): A quasi-subtraction is like common arithmetical subtraction on the reals with the difference that the range of admissible values is restricted to a subset of the reals. Here, we only consider the two standard ranges signed unit interval and unsigned unit interval, as usual in fuzzy mathematics. Furthermore, quasi-subtraction requires a so-called rescaling function (here always denoted by /) which uniquely defines the rescaled quasi-subtraction of x and y as the difference between the rescaled values of x and y, in that order. Quasi-multiplication ( ): A quasi-multiplication is like common scalar multiplication of x with a non-negative real number r (called scalar) with the only difference that the range of admissible x is restricted to a subset of the reals. Here, we only consider the two standard ranges signed unit interval and unsigned unit interval, as usual in fuzzy mathematics. Furthermore, quasi-multiplication requires a socalled rescaling function (here always denoted by /) which uniquely defines the rescaled quasi-product of x with the scalar r as the product of r and the rescaled value of x. Quasi-division ( ): A quasi-division is like common arithmetical division on the reals with the difference that the range of admissible values is restricted to a subset of the reals. Here, we only consider the two standard ranges signed unit interval
108
P. H. Vossen and S. Ajit
and unsigned unit interval, as usual in fuzzy mathematics. Furthermore, quasidivision requires a so-called rescaling function (here always denoted by /) which uniquely defines the rescaled quasi-quotient of x and y (in that order) as the quotient of the rescaled values of x and y, which is a scalar, not a score or rating. Quasi-arithmetic mean (QAM): A quasi-arithmetic mean is like the usual arithmetic mean with one important difference which makes it much more flexible: the mean is not calculated on the basis of the raw measurements; instead a rescaling of those raw measurements is performed before calculating the mean, and afterwards the resulting mean is send back to the original scale by applying the inverse of the rescaling function. It can be proven that many well-known means, e.g. the geometric mean, are just quasi-arithmetic means for a rescaling function. Quasi-geometric mean (QGM): A quasi-geometric mean is like the usual geometric mean with one important difference: the mean is not calculated on the basis of the given measurements; instead a rescaling of those raw measurements is performed before calculating the geometric mean, and afterwards the resulting mean is scaled back to the original scale by applying the inverse of the rescaling function. Rating (r): Rating is a form of human measurement, in a double sense. Firstly, it means the judgement of human performance, attitudes or characteristics. Secondly, it means the measurement conducted by human beings in contrast to some physical measurement procedure or equipment. Here, in the context of peer assessment, we have adopted the term rating for the mutual judgment of students (peers) in a team on diverse process quality criteria. Rating scale: A range (interval) of values on the real line chosen for rating a welldefined attitude, characteristic or performance of a person or group of persons. As ratings are (usually) judged by human beings (judges) these measurements (judgments) may be subject to well-known forms of bias and should therefore be handled with care. A way to prevent some forms of bias is to provide adequate training of the judges and/or to collect multiple ratings of the same factor by a team of raters. This is exactly the goal of peer assessment for summative purposes. Scale: A range (interval) of values on the real line chosen for measuring a given characteristic or performance of a person or group of persons. Values which don’t belong to the chosen range aren’t admissible as valid measurements of the characteristic or performance being measured. Sometimes, a scale is defined as such a range together with some well-defined relations and operations on the numbers in that range (e.g. equality, addition, multiplication, etc.). Scoring (s): Scoring is a form of human measurement, in a double sense. Firstly, it means the judgement of human performance, attitudes or characteristics. Secondly, it means the measurement conducted by human beings - in contrast to some physical measurement procedure or equipment. Here, in the context of TPA, we have adopted the term scoring for the judgment of a team’s products (output, deliverables) by the lecturer and the subsequent calculation of individual student scores based on the adopted scoring rule (scoring function). z Scoring function (fs;t ): A mapping from the rating scale to the scoring scale which takes a given student rating and maps it on the corresponding student score. Used to differentiate (moderate) between students who have been working differently in
Fuzzy Scoring Theory Applied to Team-Peer Assessment
109
the same team. The scoring function requires three parameters: the team score t as set by the lecturer, the team rating s calculated from all individual student ratings (peer assessments), and the zooming parameter z. Scoring scale: The measurement scale used for scoring the work of students or teams of students in educational assessment. Here, two standard scoring scales are used (cf. signed unit interval or unsigned unit interval). Signed unit interval ([0, +1]): All real numbers between (inclusive) 0 and +1. Student rating (r or ri): In the context of TPA, the student rating is the rating which results from calculating a mean (QAM or QGM) of all peer ratings for a single student in the team. Student score (si): A student score is the final score for work done on an educational assignment, in teamwork or not. Here we only consider teamwork, so that a student score is a student’s individual score resulting from the teamwork. This score usually encompasses the assessment of both product and process quality criteria. In the context of TPA, product criteria are assessed by the lecturer, and process criteria are assessed by the students (peer assessment). Team rating: see Mean team rating Team score (t): The mean (QAM or QGM) of all individual student scores. Team-Peer-Assessment (TPA): A well-founded framework for educational assessment, which enables lecturers to assign differential scores to students who have been working on a team assignment. It assumes, that lecturers are best equipped to judge the quality of the team product (what they deliver), while students are best equipped to judge the quality of the team process (how they deliver). Based on a few principles, the most important of which are: using quasi-arithmetic means on all levels of aggregation, and the split-join-invariance principle. Split-Join-Invariance (SJI): Basic principle of TPA. To calculate the individual score of a student, a scoring formula must be applied, which has the team score as one of its three parameters. This team score will be calculated based on the team’s products as the mean performance on a set of product quality criteria judged by the lecturer (his unique competence, see TPA). Once the individual student scores have been calculated, one can again calculate the team score using the adopted QAM or QGM. The SJI now forces and guarantees that this calculated team score is exactly equal to the initial team score provided by the lecturer. Unsigned unit interval ([−1, +1]): The real numbers between (inclusive) −1 and +1. Weights (w or wi): In the context of calculating means, weights are real numbers between 0 and 1 (inclusive) which are used as multipliers (AM or QAM) or exponents (GM, QGM) in the formula for the mean. The sum of all weights should be 1. Note that weights have nothing to do with probability! Zoom(ing) factor (z): To adjust (moderate) the impact of student ratings on lecturer’s score (equal to the team score) in the scoring formula, a zooming factor or parameter can be used. A zooming factor is a non-negative real number. Its default value in all scoring formulae is 1. Choosing a zooming factor smaller than 1 decreases the impact of students’ ratings on the team score; in the extreme case, z = 0, there will be no adjusting or moderating at all. Choosing a zooming factor larger than 1 increases the impact of students’ ratings on the team score; in the
110
P. H. Vossen and S. Ajit
extreme, the student score will become as low as possible on the scale (−1 or 0) or as high as possible (always 1).
References Aczél, J. (ed.): Lectures on Functional Equations and Their Applications, vol. 19. Academic Press, Cambridge (1966) Batchelder, W.H., Colonius, H., Dzhafarov, E.N., Myung, J. (eds.): New Handbook of Mathematical Psychology: Volume 1, Foundations and Methodology. Cambridge University Press, Cambridge (2016) Bukowski, W.M., Castellanos, M., Persram, R.J.: The current status of peer assessment techniques and sociometric methods. In: Marks, P.E.L., Cillessen, A.H.N. (eds.) New Directions in Peer Nomination Methodology. New Directions for Child and Adolescent Development, vol. 157, pp. 75–82 (2017) Cresta, R.-A.: Peer Performance Scoring System (PPASS). Technical report, BSc Computing (Software Engineering) dissertation, University of Northampton (2018). 148p. Dijkstra, J., Latijnhouwers, M., Norbart, A., Tio, R.A.: Assessing the “I” in group work assessment: state of the art and recommendations for practice. Med. Teach. 38(7), 675–682 (2016) Dochy, F.J.R.C., Segers, M., Sluijsmans, D.: The use of self-, peer and co-assessment in higher education: a review. Stud. High. Educ. 24(3), 331–350 (1999) Dombi, J.: A general class of fuzzy operators, the DeMorgan class of fuzzy operators and fuzziness measures induced by fuzzy operators. Fuzzy Sets Syst. 8(2), 149–163 (1982) Falchikov, N.: Product comparisons and process benefits of collaborative peer group and selfassessments. Assess. Eval. High. Educ. 11(2), 146–165 (1986) Falchikov, N.: Group process analysis: self and peer assessment of working together in a group. Educ. Train. Technol. 30(3), 275–284 (1993) Falchikov, N., Goldfinch, J.: Student peer assessment in higher education: A meta-analysis comparing peer and teacher marks. Rev. Educ. Res. 70(3), 287–322 (2000) Fodor, J.: Aggregation functions in fuzzy systems. In: Fodor, J., Kacprzyk, J. (eds.) Aspects of Soft Computing, Intelligent Robotics & Control, SCI 241, pp. 25–50 (2009) Gibbs, G.: The assessment of group work: lessons from the literature. Assessment Standards Knowledge Exchange (2009) Jäger, J.: Verknüpfungsmittelwerte. Math. Semesterberichte 52, 63–80 (2005) Kennedy, I.G., Vossen, P.H.: Teamwork assessment and peerwise scoring: Combining process and product assessment. DeLFI, Leipzig: Bildungsräume, Lecture Notes in Informatics, Gesellschaft für Informatik, Bonn (2017a) Kennedy, I.G., Vossen, P.H.: Software engineering teamwork assessment rubrics: combining process and product scoring. Assessment in Higher Education, Manchester (2017b) Kolmogorov, A.: On the notion of mean. In: Mathematics and Mechanics, Kluwer 1991, pp. 144–146 (1930) Nepal, K.P.: An approach to assign individual marks from a team mark: the case of Australian grading system at universities. Assess. Eval. High. Educ. 37(5), 555–562 (2012) Ng, C.T.: Functional equations. In: Batchelder, W.H., Colonius, H., Dzhafarov, E.N., Myung, J. (eds.) New Handbook of Mathematical Psychology: Volume 1, Foundations and Methodology, pp. 151–193. Cambridge University Press, Cambridge (2016) Sharp, S.: Deriving individual student marks from a tutor’s assessment of group work. Assess. Eval. High. Educ. 31(3), 329–343 (2006)
Fuzzy Scoring Theory Applied to Team-Peer Assessment
111
Strijbos, J.W., Sluijsmans, D.: Unravelling peer assessment: Methodological, functional, and conceptual developments. Learn. Instr. 20(4), 265–269 (2010) Topping, K.: Peer assessment between students in colleges and universities. Rev. Educ. Res. 68 (3), 249–276 (1998) Ueno, M.: An item response theory for peer assessment. In: Rosson, M.B. (ed.) Advances in Learning Processes. InTech (2010) Uto, M., Ueno, M.: Item response theory for peer assessment. IEEE Trans. Learn. Technol. 9(2), 157–170 (2016) Van Rensburg, J.: Assessment of group work: summary of a literature review (2012) Vossen, P.H.: Distributive fairness in educational assessment: psychometric theory meets fuzzy logic. In: Balas, et al. (eds.) Soft Computing Applications. Advances in Intelligent Systems and Computing, vol. 634, pp. 381–394. Springer (2018)
On Image Compression for Mobile Robots Using Feed-Forward Neural Networks Viorel Nicolau(&) and Mihaela Andrei Department of Electronics and Telecommunications, “Dunarea de Jos”, University of Galati, 47 Domneasca St., 800008 Galati, Romania {viorel.nicolau,mihaela.andrei}@ugal.ro
Abstract. In mobile robots control, the vision system is placed in the feedback loop, and it requires the ability to extract the necessary information for multiple real-time image and video processing tasks. Image compression techniques are useful in vision systems for large data streaming, like image transmission, archival and retrieval purposes. Neural networks (NN) are widely used in image processing, for solving different issues, with different NN topology, training and testing sets selection, and learning algorithms. Aspects of grayscale image compression for vision system in mobile robots using artificial NNs are discussed in this paper. Several feed-forward neural networks (FFNN) are analyzed, using different structures, input dimensions, neuron numbers, and performance criteria. The goal of the paper is to study the behavior of lowcomplexity FFNN models for grayscale image compression, with good compression rate and small enough errors for the vision system purposes in mobile robots. Keywords: Neural networks
Image compression Mobile robots
1 Introduction In navigation and guidance of autonomous mobile robots, the image and video processing techniques play a major role for image reduction, feature extraction, and segmentation in the domain of dynamic vision and analyzing image sequences [1]. The vision system is placed in the feedback loop of moving control law, and it requires the ability to extract information for multiple real-time image and video processing tasks, as in [2–4]. The required processing power increases with image size and bit depth. The control algorithm must be simple but also robust. It must work at a rate compatible with the imposed bandwidth, and it must deal with many uncertainties from the process [5]. Neural networks (NN) are widely used in image processing, for solving different issues, with different NN topology, training and testing sets selection, and learning algorithms [6]. Reviews of image compression NN techniques are presented in [7] and [8] (studied in the paper). In image compression using NN methods, feed-forward neural networks (FFNN) can be used, with different approaches NN [9–11]. The use of FFNN with two layers is extended to multilayer NNs in [12], and a NN modular structure for lossy image © Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 112–121, 2021. https://doi.org/10.1007/978-3-030-52190-5_8
On Image Compression for Mobile Robots Using Feed-Forward Neural Networks
113
compression is presented in [13]. Over time, different techniques of feed-forward NNs were tested for image compression, like complexity level approach [14], classified blocks from image [15], and cumulative distribution function [16]. The goal of the paper is to study the behavior of low-complexity FFNN models for grayscale image compression. NN model with acceptable compression rate and also acceptable errors for the vision system purposes in mobile robots is sought. Several feed-forward neural networks (FFNN) are analyzed, using different structures, input dimensions, neuron numbers, and performance criteria. The paper is organized as follows. The structure of visual system for image compression in mobile robots is presented in Sect. 2. Different aspects of compression problem using FFNN are studied in Sect. 3. In Sect. 4, FFNN structures used for image compression are presented. Simulation results are described in Sect. 5, based on different neural architectures and performance criteria. Conclusions are presented in Sect. 6.
2 Vision System Structure in Mobile Robots In general, a camera can be connected with an image processing system directly or through an interface controller. Hence, two different structures can be used. In the first case, if the processing system has dedicated camera interface or it is fast enough, then it can be connected to the camera directly, as shown in Fig. 1. In the second case, when the host system does not have a dedicated camera interface and it is based on low speed processor, additional hardware is necessary, as shown in Fig. 2. An entire frame from the camera module can be stored in RAM by the buffer controller, before being read by the processor. The most common vision systems use VGA cameras. The VGA camera generates frames with 640 480 pixel resolution. In addition, different features can be set-up. The processor can adjust image parameters, like white balance, gamma, hue and saturation. For the purposes of this paper, the camera module generates image resolution up to 256 256, and rate up to 25 fps.
Fig. 1. Camera module connected directly with processing system
114
V. Nicolau and M. Andrei
Fig. 2. Camera connected through interface controller with processing system
A Camera Module with 8-bit parallel output interface is used. Usually, camera module transmits only 2 color bytes for every pixel, representing RGB or YCbCr simplified formats. RGB444 format with 12 bits in the 2 bytes is used, in color image transmissions. This means that there are transmitted 4 bits for every basic color. YCbCr format is used for grayscale images, where the Y luminance component is the second byte in the 2 color bytes. In this paper, YCbCr format is used with grayscale images, for lowcomplexity of image processing system. This is quite enough for vision system in common mobile robots.
3 Image Compression Problem with Neural Networks In image compression, lossless and lossy compression techniques are used. In lossless compression no information is lost. The purpose of these methods is to identify and eliminate statistical redundancy information. Lossy methods reduce bits by identifying marginally important information and removing it. In general, neural image compression is a lossy compression method. It is a complex problem due to the dimension of the input space, which can be reduced by resizing the original image. For example, starting from a grayscale image with LxM pixel resolution, the input space of LM dimension can be reduced, by using only L inputs for NN, representing a column of the image. In this case, input dimension is smaller, but input data set is larger. It has M vectors. Further more, the input space can be reduced by dividing the image in smaller blocks. If each block has NxN pixel resolution, then the network has N inputs. Input vectors contains the luminance values of N consecutive pixel samples, denoted xn, n = 0 … N−1, and the outputs are amplitudes of the compressed image. FFNNs are characterized by neuron equations without memory. Hence, their outputs depend on only of the current inputs, and the time complexity of learning is related to the complexity of the problem.
On Image Compression for Mobile Robots Using Feed-Forward Neural Networks
115
The performance goal is to assure high compression rate between the original image (IO) and the reconstructed one (IR), with small-enough errors. The image error, which is denoted err, can be defined as the matrix: err ðm; nÞ ¼ IO ðm; nÞ IR ðm; nÞ;
8m ¼ 1; M; 8n ¼ 1; N
ð1Þ
where M and N are the dimensions of the input images. The mean squared error mse is: M P N P
mse ¼ E½err ðm; nÞ ¼ 2
m ¼ 1 n¼1
err 2 ðm; nÞ
MN
ð2Þ
The compression ratio is denoted CR, and it is defined based on the original image size, SO, and the size of the compressed image, SC: CR ¼
SO SC
ð3Þ
Using FFNN, the compressed image is obtained at the hidden layer. The condition is to impose the number of neurons in the hidden layer, NH, to be less than NI the number of neurons at the input layer (NH < NI). As a result, the compression ratio is: CR ¼
NI NH
ð4Þ
The problem complexity can be reduced, by dividing the input image into smaller blocks of dimensions n n (e.g. n = 4, 8, 16, 32 or 64 pixels), which are fed to the FFNN input: n2 = NI.
4 FFNN Structures Used for Image Compression Multi-layer neural networks with back-propagation algorithm can directly be applied to image compression. In this paper, three different neural network structures based on FFNN topology were studied. For this purpose, the simplest network structure, denoted structure 1 (S1FFNN), has one hidden and output layers, as illustrated in Fig. 3. Because the output of the neural network has to reconstruct the image at the initial size, the layers have the same number of neurons, NI, which is smaller than input dimension. The output of hidden layer generates the compressed image.
Fig. 3. FFNN structure 1
116
V. Nicolau and M. Andrei
The second FFNN structure, denoted structure 2 (S2FFNN), has two parallel hidden layers, as shown in Fig. 4.
Fig. 4. FFNN structure 2
The input is fully connected to both hidden layers, and each hidden layer has its own connection to the output. Both hidden layers have the same number of neurons. The compression is represented by two images, one at the output of each hidden layer. The compression ratio decreases at half of the one of previous structure, and it needs an enriched training set, even so the network outputs a good reconstructed image. Also, the compressed images are negative one to the other. The third FFNN structure, denoted structure 3 (S3FFNN), has two serial hidden layers, as illustrated in Fig. 5. In this case, the first hidden layer and the output layer have the same number of neurons, and the second hidden layer has a smaller number of neurons. The compressed image can be obtained from the output of first hidden layer with lower compression ratio, or from the output of second hidden layer with higher compression ratio.
Fig. 5. FFNN structure 3
Each of the above structures has its own advantages and disadvantages, regarding the quality of the reconstructed image, compression ratio, learning duration, and required processing power. In general, the higher the compression ratio is, the lower is the quality of the reconstructed image. In addition, the processing power increases with the number of layers, the number of neurons in each layer, and the interconnection types between layers.
On Image Compression for Mobile Robots Using Feed-Forward Neural Networks
117
5 Simulation Results In this paper, different FFNN structures for grayscale image compression are studied. The tansig and pureline transfer functions were chosen, as shown in previous section. The mean squared error (mse) is performance criterion, both for training and testing, denoted msel, and mset, respectively. The image data set contains different grayscale images (landscape, building, and portraits) with 256 256 pixel resolutions, as shown in Fig. 6. The first image with cathedral was chosen for FFNN training process. The rest of images are used for network testing.
Fig. 6. Image data sets
Each image is rearranged in 3 steps. First, the image is divided in smaller blocks with n n pixels and dimension n. In this paper, two different block dimensions were chosen, n = 4, and 8.
118
V. Nicolau and M. Andrei
Then, the image is resized as a line of L square blocks: ð5Þ
L ¼ 256 256=n2
Next, each block is rearranged as a column vector [n2, 1], with consecutive pixels in a row. At the end it results a matrix of type [n2, L], where L is the number of columns. Hence, the input space is much smaller. For simulations, all FFNN structures presented in previous section are used. For S1FFNN, 4 different networks were used. The neural network parameters and performances are illustrated in Table 1. Table 1. S1FFNN parameters and training performances FFNN S1FFNN1 S1FFNN2 S1FFNN3 S1FFNN4
NN architecture NI = NO 64 64 16 16
NH 16 4 4 1
Performance metrics CR msel 4 0.0596 16 0.1041 4 0.0521 16 0.1047
The compressed images corresponding to CR = 16, generated by S1FFNN2 and S1FFNN4, are illustrated in Fig. 7. For a good representation, they are drawn doubled in size.
Fig. 7. Results for CR = 16, doubled in size
The compressed images corresponding to CR = 4, are shown in Fig. 8.
Fig. 8. Results for CR = 4
On Image Compression for Mobile Robots Using Feed-Forward Neural Networks
119
For S2FFNN with two parallel hidden layers with the same number of neurons, 2 different networks were used. The training performances are shown in Table 2. Table 2. S2FFNN parameters and performances FFNN
NN architecture
S2FFNN1 S2FFNN2
NI = NO 64 16
NH1 16 4
NH2 16 4
Performance metrics CR msel 2 0.0507 2 0.0331
For S3FFNN with two serial hidden layers, also 2 different networks were used. In this case, the second hidden layer has a smaller number of neurons. Their parameters and performances are shown in Table 3. Table 3. S3FFNN parameters and training performances FFNN
NN architecture
S3FFNN1 S3FFNN2
NI = NO 64 16
NH1 16 4
NH2 4 1
Performance metrics CR msel 16 0.1024 16 0.1104
Each FFNN was tested with the rest 3 images from data set: building, Lena, and a second portrait. The testing performances were denoted mset1, mset2, and mset2. The results are represented in Table 4.
Table 4. FFNN parameters and testing performances FFNN
CR
S1FFNN1 S1FFNN2
Performance metrics mset1
mset2
mset3
4
0.1095
0.1012
0.1285
16
0.1808
0.1456
0.2723
S1FFNN3
4
0.0994
0.0869
0.1055
S1FFNN4
16
0.1839
0.1435
0.2465
S2FFNN1
2
0.1015
0.0910
0.1242
S2FFNN2
2
0.0621
0.0541
0.0749
S3FFNN1
16
0.1972
0.1552
0.2577
S3FFNN2
16
0.1817
0.1355
0.1996
120
V. Nicolau and M. Andrei
From training and testing Tables, it can be observed that for the same compression ratio, similar performances in training are obtained. In addition, if smaller compression ratio is used, the performances are better. The network S1FFNN3, which was marked with grey colour in Table 4, is selected for testing. For example, the testing building image and the reconstructed one are shown in Fig. 9. The output image is good enough for vision system purposes in mobile robots.
Fig. 9. Original building image and the reconstructed one with S1FFNN3
6 Conclusions Neural network approach for grayscale image compression is used in vision system of mobile robots. Several feed-forward neural networks were tested, using different structures, input dimensions, neuron numbers, and performance criteria. For the same compression ratio, the network performances are comparable. Hence, the simplest FFNN structure can be used for lower computational complexity. Even for high compression ratio, the reconstructed image is accurate enough for the purposes of robot vision system.
References 1. Espiau, B., Chaumette, F., Rives, P.: A new approach to visual servoing in robotics. IEEE Trans. Robot. Automat. 8(3), 313–326 (1992) 2. Lee, J.M., Son, K., Lee, M.C., Choi, J.W., Han, S.H., Lee, M.H.: Localization of a mobile robot using the image of a moving object. IEEE Trans. Ind. Electr. 50(3), 612–619 (2003) 3. Horak, K., Zalud, L.: Image processing on raspberry Pi for mobile robotics. Int. J. Sign. Process. Syst. 4(6), 494–498 (2016) 4. Juang, J.-G., Yu, C.-L., Lin, C.-M., Yeh, R.-G., Rudas, I.J.: Real-time image recognition and path tracking of a wheeled mobile robot for taking an elevator. ACTA Polytechnica Hungarica 10(6), 5–23 (2013) 5. Chaumette, F.: Image moments: a general and useful set of features for visual servoing. IEEE Trans. Robot. 20(4), 713–723 (2004)
On Image Compression for Mobile Robots Using Feed-Forward Neural Networks
121
6. Hemanth, D., Balas, V.E., Anitha, J.: Hybrid neuro-fuzzy approaches for abnormality detection in retinal images. In: Soft Computing Applications. SOFA 2014. Advances in Intelligent Systems and Computing, vol 356 (2016) 7. Egmont-Petersen, M., deRidder, D., Handels, H.: Image processing with neural networks-a review. Pattern Recogn. 36(10), 2279–2301 (2002) 8. Dony, R.D., Haykin, S.: Neural network approaches to image compression. Proc. IEEE 83(2), 288–303 (1995) 9. Jiang, J.: Image compression with neural networks – a survey. Sig. Process. Image Commun. 14, 737–760 (1999) 10. Madan, V.K., Balas, M.M., Radhakrishnan, S.: Fermat number app. and fermat neuron. In: Soft Computing App. SOFA 2014. Advances in Intelligent System and Computing, vol. 356 (2016) 11. Gaidhane, V.H., Singh, V., Hote, Y.V., Kumar, M.: New approaches for image compression using neural network. J. Intell. Learn. Syst. Appl. 3, 220–229 (2011) 12. Abdel-Wahhab, O., Fahmy, M.M.: Image compression using multilayer neural networks. IEEE Proc. Signal Proc. 144(5), 307–312 (1997) 13. Watanabe, E., Mori, K.: Lossy image compression using a modular structured neural network. In: 2001 IEEE Signal Processing Society Workshop Neural Networks for Signal Processing XI (2001) 14. Veisi, H., Jamzad, M.: A complexity-based approach in image compression using neural networks. Int. J. Signal Processing 5, 82–92 (2009) 15. Xianghong, T., Yang, L.: An image compressing algorithm based on classified blocks with BP neural networks. In: International Conference on Computer Science and Software Engineering, pp. 819–822 (2008) 16. Durai, S.A., Saro, E.A.: Image compression with back-propagation neural network using cumulative distribution function. World Acad. Sci. Eng. Technol. 17, 60–64 (2006)
Image, Text and Signal Processing
LeapGestureDB: A Public Leap Motion Database Applied for Dynamic Hand Gesture Recognition in Surgical Procedures Safa Ameur1,2(B) , Anouar Ben Khalifa2,3 , and Med Salim Bouhlel1,4 1
UR-SETIT:Research Unit Sciences of Electronic, Technologies of Information and Telecommunication, University of Sfax, Sfax, Tunisia [email protected], [email protected] 2 ENISo:National Engineering School of Sousse, University of Sousse, Sousse, Tunisia [email protected] 3 LATIS:Laboratory of Advanced Technology and Intelligent Systems, University of Sousse, Sousse, Tunisia 4 ISBS:Biotechnology High Institute of Sfax, University of Sfax, Sfax, Tunisia
Abstract. The research in the dynamic hand gestures recognition field has been promoted thanks to the recent success of the emerging Leap Motion controller (LMC) sensor, especially in the human-computer interaction field. During surgery, surgeons need to make more effective use of computers through a touchless human-computer interaction. In this paper, we present a set of gestures that are about commands to manipulate medical images. The public availability of hand gesture datasets for dynamic hand gestures, in particular with the LMC as an acquisition device, is currently quite poor. Therefore, we collect a more challenging dataset, called “LeapGestureDB”. The dataset is presented as text files, which contains 6,600 samples of 11 different dynamic hand gestures performed by over 120 participants. It is hosted online for the public interest, and especially to enhance research results. We suggest an approach of three-dimensional dynamic hand gesture recognition. We extract spatial, frequential and spatio-frequential features from the three-dimensional positions of the fingertips and the palm center delivered from the input sensor. These descriptors are fed into a support vector machine classifier to determine the performed gestures. We first evaluate our work on an existing dataset, namely “RIT dataset”, and then we use our collected dataset. We achieved a better accuracy rate of 93% comparing to results obtained from the “RIT dataset”. Keywords: Hand gesture · Dataset · Operating room controller · SVM · Wavelet transform
c Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 125–138, 2021. https://doi.org/10.1007/978-3-030-52190-5_9
· Leap Motion
126
1
S. Ameur et al.
Introduction
There have been several human-machine interfaces such as mice, keyboards, microphones, etc. These types of interfaces have shown their limits in degrees of freedom and the difficulties in applying them in a natural environment [1]. Users dream to interact with computer systems in increasingly natural and intuitive ways. As highlighted in [2], with the introduction of low-cost depth camera devices over the past few years, such as the Microsoft Kinect and the Intel RealSense Camera, the acquisition of 3D data has become readily available to the mass market. This has made it possible for natural interfaces based on the obtained 3D data to be employable in commercial applications. Nevertheless, the vocal control has many challenges. The Kinect is relatively expensive, intraoperative 3D imaging is hard to manipulate, and the 1.2 m working distance requires an appropriate screen size. Therefore, to overcome these problems, research has been directed toward more intuitive interfaces. This is the case for gestural human-machine interfaces [3] that propose to recognize the user gestures and translate them into commands. As mentioned in [4], hand gesture recognition has attracted a growing interest in the recent years thanks to its applications in a lot of different fields such as HMI, automatic sign language interpretation [5], computer gaming, robotics, human action recognition [6], violence detection [7] and so on. In this context, we find the hand gestural device “Leap Motion Controller” (LMC), which offers a contactless interaction between the user and the computer [8]. We notice the interest of this technology in operating rooms where the contact between the surgeon and the computer is disadvantageous. Because the operating room computer is not sterile or sterilized, manipulating the CT scans or MRIs it displays means either a non-sterile nurse or the surgeon does it, this taking another 10 min to scrub before resuming surgery. The system aimed to get over any physical contact when interacting with the machine so that hand gestures can solve a major frustration for surgeons. During technical visits to the university hospital Sahloul in Tunisia, we discussed with some of the surgical staff members the effectiveness of touchless interaction systems to handle medical images in the sterile field. Finally, we identify the indispensable commands on the basis of intuitiveness and memorability. The name of gestures were chosen as the action that it enables: “rotate” (left or right), “click” (for selection), “zooming” (zoom in or zoom out), “alter contrast” (increase or decrease), “next and previous” (navigate through a sequence of images)and“browse” (make an horizontal move to the left or right) [8]. Considering this, we create a larger database than existing ones which contains dynamic hand gestures performed by over 120 distinct people, each performing 11 various gestures repeated 5 times each. We should note that our dataset is the first released database with the LM sensor in the medical field. We will use this dataset for our future work, essentially to train and assess our system in the aim of controlling DICOM images and will be freely obtainable for the research community. It is named “LeapGestureDB”; a leap motion database for dynamic hand gesture recognition available online as a set of 6,600
LeapGestureDB
127
text files, which contains parameters provided by the Leap SDK (with the v2 version 3.2.0). The present paper aims to validate our classification approach for dynamic hand gesture recognition, based on spatial and frequential features. We compare their performances separately and by combining them. We test our algorithm on two different databases: “LeapGestureDB” and “RIT dataset”, to compare the classification results and prove the effectiveness of our approach. In the remainder of this paper, first a survey of the public LMC datasets is carefully described and compared in Sect. 2. Then, in Sect. 3, we present the LMC device. Next, Sect. 4 contains a detailed description of our dataset, in which the first subsection presents the acquisition environment, the second one describes the dataset structure, and the third one shows the content of dataset files. Afterwards, Sect. 5 introduces the general pipeline of the proposed approach. Subsequently, the experimental results are shown and discussed in Sect. 6. Finally, some conclusions are presented in Sect. 7.
2
Public Leap Motion Datasets
After its first release in 2013, the research community have started to exploit the LMC performances. In [9], the authors presented a first study of the precision and reliability of this sensor. Thanks to the LMC, we can ameliorate the humancomputer interaction in various domains by recognizing the utilizer gestures to enter commands. Just six public LMC hand-gesture datasets are available online until the time of writing this paper . We give an overview of these datasets: The Microsoft Kinect and LMC dataset [10], contains 10 various poses repeated 10 times each by 14 different people, for 1,400 gestures. Moreover, Marin et al. in [11] recognize the American manual alphabet by the LMC and the Kinect devices. A static gesture dataset is available online1 formed by fingertips’ orientations and positions. They utilized a multi-class Support Vector Machine (SVM) classifier to identify their poses (static gestures). They combined depth features from the Microsoft Kinect device with features taken from the LMC to improve the recognition performances. The authors in [12] collected the Leap-TV dataset composed of 378 hand gesture samples of 21 distinct television control tasks performed by 18 participants using an LMC device in the context of controlling multiple functions of the interactive TV. It was released online2 as a set of 18 XML files. They reached a recall rate of 72.8% and false-positive recall of 15.8%, by using fingertips positions, direction, and velocity for classification features. In [13], the students and staff on the Rochester Institute of Technology (RIT) campus utilized a graphical user interface to record their repetitions of each of 12 motions gathered from over 100 subjects and then utilized to train a threedimensional recognition model based on convolutional neural networks. The data is hosted online for public download3 . They worked with the mean and standard 1 2 3
http://lttm.dei.unipd.it/downloads/gesture. http://www.eed.usv.ro/∼vatavu/index.php?menuItem=leaptv2014. http://spiegel.cs.rit.edu/∼hpb/LeapMotion.
128
S. Ameur et al.
deviation of a fixed size image representation of just five gestures to train a CNN classifier. They achieved an average correct rate of 92.4% in their approach. The “LMDHG dataset” that has been recently published4 , contains hand motions acting with one or both hands. The dataset includes unsegmented sequences of 13 gestures realized by 21 participants, 50 sequences resulting in a total of 608 gesture samples. The 3D coordinates of 23 joints for each hand are described in each frame. The position of the hand is set to zero when it is not tracked. Thanks to skeleton data in this dataset, they can recognize both pre-segmented and unsegmented dynamic hand gestures. Authors in [14] used a Baseline approach for unsegmented hand gesture recognition with an SVM classifier to provide the most correct class. Eventually, Pradeep Kumar, a Ph.D in Indian, collected many datasets using the LMC and made it online5 for the research community. He is working with Indian Sign Language (ISL) [15], Multiple person activity recognition and 3D handwriting recognition systems [16] using Microsoft Kinect and Leap Motion sensors. Likewise, in [17] the authors put forward a new dataset6 gathered from the Kinect and the LMC sensor. The “Jackknife dataset” is composed of 8 different gestures performed by 20 participants. They used the dynamic time warping based approach referred to the Jackknife gesture recognizer in order to determine the realized action. Furthermore, other studies with the LMC that exploit their own datasets in their research are not provided on the internet [18–23]. Table 1 presents a comprehensive comparison of the publicly available datasets by specifying the type of sensors and gestures utilized and then we enumerate the number of subjects, gestures and repetitions performed to collect the dataset. On the other hand, the context of work and the year of release is an essential criterion for comparing the different datasets. We finish the table by describing our dataset: the“LeapGestureDB”.
3
Leap Motion Sensor
There are several motion detection sensors available on the market. We have chosen the LMC for this project because of its reliability and low cost. The device has a small size: 80 mm of height, 30 mm of width and 12.7 mm of depth. Under the brushed aluminum body and the black glass on its surface, there are three infrared LEDs, used for scene illumination, and two CMOS cameras spaced four centimeters apart, which capture images with a frame rate of 50 up to 200 fps [9], as appears in Fig. 1. The LMC essentially tracks hands, though using similar infrared camera technology. It tracks the position of objects in its field of view, which is roughly the size of an inverted triangle whose lower area length 4 5 6
https://www-intuidoc.irisa.fr/en/english-leap-motion-dynamic-handgesture-lmdhgdatabase. https://sites.google.com/site/iitrcsepradeep7/resume. http://www.eecs.ucf.edu/isuelab/research/jackknife.
LeapGestureDB
129
measures about 65 mm through the reflection of infrared light from LEDs. We can implement gesture recognition algorithms with the raw data obtained from the LMC Software Development Kit (SDK). As more specifications of the SDK, we can mention the language support for Java, JavaScript, Python, Objective C, C# and C. The SDK v2 introduces a skeletal model for the human hand in realtime tracking. It supports queries in a 3D space with 0.01-mm accuracy. The X, Y and Z coordinates are delivered in a coordinates system relative to the center of the controller, taking millimeters as the unit and the right-handed coordinate system as reference. Here we use this device to collect a dataset recorded by this API in order to develop and implement our dynamic hand gesture recognition algorithm. Table 1. A summary table showing the characteristics of the publicly available leap motion datasets. ( S: number of participants, G: number of gestures, R: number of repetitions) Ref
Dataset
Sensors
Gesture type
LMC
Kinect
Static
S
G
R
Context
Year 2015
Dynamic
[10]
Kinect and LMC dataset
14
10
10
HMI
[12]
LeapTV dataset
18
21
1
TV control
2015
[13]
RIT Dataset
103
12
5 to 10
HMI
2015
[15]
Indian sign language dataset
10
25
8
Sign language
2016
[14]
LMDHG dataset
21
13
1 to 3
Action recognition
2017
[17]
Jackknife dataset
20
8
2
Action recognition
2017
Leap-GestureDB (our)
120
11
5
Medical
2018
Fig. 1. Leap motion bar seen from inside [24]
4
Leap Gesture Dataset
We introduce, in this paper, a new dynamic hand gesture dataset gathered from the LMC called “LeapGestureDB”. It is a valuable addition to the few existing LMC datasets. These datasets not only increase the amount of information available for analysis but also introduce novel methods for the quantitative evaluation of motion tracking and pose estimation.
130
4.1
S. Ameur et al.
Acquisition Environment
We record all repetitions in a controlled indoor environment under natural light at daytime, and we set up moderate lighting systems deliver cool, bright white light for the dataset recording as shown in Fig. 2. We place the LMC on a flat desk. In order to detect hand movements on top of the device. The user should connect the LMC by a USB to the computer. In addition, a human operator is required to sit in front of the laptop (next to the participant) in order to monitor and control the acquisition process.
Fig. 2. Acquisition process
4.2
Dataset Structure
Exactly 6,600 gesture instances are collected from over 120 volunteers from the National Engineers School of Sousse (ENISo), with the full compressed dataset totaling around 450 MB. The data are hosted online for public download7 . Our dataset contains 11 actions performed by 120 different participants who repeat the same action five times. Each subject took about 20 min to achieve the recording process and respond to the questionnaire prepared to collect demographic information. All repetitions are recorded in text files registered under a name that indicates the number of the ongoing participant, gesture and repetition. For the dataset recording, students and researchers at ENISo have participated in the collection process. They were 81 females and 39 males. The participants were born between 1960 and 1997, and only 14 participants are left-handed. The demographic distribution of the new “LeapGestureDB” can be seen in Fig. 3. We carefully select 11 dynamic hand motions, which correspond to commands to manipulate medical images. The different gestures were described in detail in our previous article [8]. The Fig. 4 resume and illustrate all gestures that formed our dataset. All images were taken under controlled conditions. Stroke lengths, angles, sizes, and positions are individual characteristics of each gesture that vary widely within the controller’s field of view. We 7
https://sites.google.com/view/leapmotiondatabase/accueil.
LeapGestureDB
131
notice that some users are comfortable performing gestures quickly after starting because they used the LMC, while others struggle with the basic coordination required to execute the hand movements. They live a new virtual interaction experience.
Fig. 3. Demographic partition of proposed LeapGestureDB by gender, handedness and age.
4.3
Dataset File Content
The motion-sensing devices provide us two levels of output: high and low. The interpreted version of the raw frame data is the high-level output. In this level, predefined gestures are recognized (just these four gestures: circle gesture, a screen tap gesture, key tap gesture, and swipe gesture). A frame contains the delivered data from the sensor, such as the number of fingers, fingertip positions, palm position, and direction, etc. The series of frames form the low-level output. The LMC visualizer presents the contents of the text files. The lines of the file are separately converted frames obtained from LMC. In one moment, only one frame is displayed. The next frame is identified by another ID. The frame rate varies according to the computer power and settings. In our case, we reach up to 115 fps knowing that the API is running on an ASUS laptop Intel 7th-Generation Core i7-7500U Processor and Windows 10 (64 bits) as an operating system. The proposed software for the LMC is developed in eclipse software using JAVA language. The datum is chosen to be extracted from the LMC in a captured frame at a given moment. The information received and processed by the controller are: Frame id: A unique number is given to each frame. Timestamp: The moment of frame capture in microseconds. Hand number: 1 or 2. Hand Id type: The LMC software assign a unique ID to the hand object. This value remains the same across consecutive frames, while the tracked hand remains visible. If tracking is lost, the LM software may assign a new ID when
132
S. Ameur et al.
Fig. 4. Illustration of different gestures enabled for interacting with DICOM images.
it detects the hand in a future frame, and it will identify whether it is a left or right hand. Hand fingers number: i ∈ [0, 5] returns the number of the recognized fingers. Palm position: P (x y z) the three dimensional position of the palm calculated from the origin of the coordinated LM system. Hand direction: The direction is expressed as a unit vector in 3D space pointing in the same direction as the directed line from the palm position to the fingers. Palm normal: It is the normal vector to the palm, pointing downward. Fingers type: It is the anatomical type of the finger (thumb, index, major, ring and pinky). Tipsposition: F(x y z) is a vector containing the three dimensional positions of each detected fingertip. The tip position is calculated in millimeters from the LM origin.
5
Hand Gesture Recognition Approach
In this section, our proposed method for hand gesture recognition is discussed in detail. The suggested system is illustrated in Fig. 5. The gathered data from the LMC contain the palm and fingers’position, direction, velocity, etc. They will be
LeapGestureDB
133
Fig. 5. Flowchart of suggested approach
pretreated and then fed into the classification process. We make segmentation of 25% from the original acquired signal to eliminate the excess data and ameliorate the accuracy recognition rate. First, we choose a model of gesture, considering both spatial and temporal characteristics. It is to note that two realizations of the same gesture do not give the same vector of parameters. That is why it is recommended to make different repetitions by the same participants. For gesture recognition, we use the temporal windowing technique which enables associating several samples, temporally close, in order to construct a unique sample taking the characteristics of each one. Each repetition is divided into five temporal windows (Wt = 5), and then the discrete Fourier transform, the standard deviation, the arithmetic mean and the approximation coefficients of the discrete wavelet transform of coordinates in each window is calculated. This set of features is calculated and normalized in the range [−1, 1], to be fed into an SVM classifier for recognizing the gesture. At this level, we should identify the most informative and discriminative features from a given pattern to complete the feature extraction step. Otherwise, patterns will not be recognized efficiently and the classification rate will be higher. In our work, various descriptors from the LMC data are extracted and use the SVM in the hand motion classification process. To validate our approach, we introduce the following mathematics notions: Arithmetic mean: The mean of a random variable vector V made up of m observations is defined as: m
M=
1 Vk m
(1)
k=0
Standard deviation: The standard deviation of a random vector V made up of m observations is defined as: m 1 (Vk − M )2 SD = (2) m−1 k=1
134
S. Ameur et al.
Discrete Fourier transform: For a vector V of length m, the Fourier transform is defined as follows: f f t(V ) = Y (k) =
m
V (j) × W m(j−1)(k−1)
(3)
j=1
with:
−2iΠ ) m Discrete wavelet transform (dwt): Starting from a signal V of length m, two sets of coefficients are computed: approximation coefficients CA1, and detail coefficients CD1. These vectors are obtained by convolving V with the low-pass filter Lo D for approximation and with the high-pass filter Hi D for detail, followed by dyadic decimation. The length of each filter is equal to 2L. For a signal of length m, the coefficients CA1 and CD1 are of length: [(m − 1)/2 + L]. W m = exp(
In order to discriminate different hand gestures of the considered dataset into 11 classes, we utilize a multi-class one-against-one SVM classifier, trained on the already presented synthetic dataset. We use the implementation from the LIBSVM package. The employed kernel is the radial basis function, where the parameters are selected utilizing a grid search approach and cross-validation on the training set.
6
Recognition Results and Discussion
We will first evaluate our approach on the largest dataset among the others listed in Sect. 2, namely “RIT dataset” [13]. Then We will get results on accuracy rates from the collected “LeapGestureDB dataset”. To evaluate our recognition system, the quantitative results of the classification accuracy obtained from our method are summarized in Table 2. We should mention that the approach based on spatial-frequential characteristics gives the best performances and surpasses the performances recorded with separate spatial and frequential characteristics on both datasets. In our work and after various tests, we have registered the highest classification accuracy with the feature set consisting of the Discrete Wavelet Transform (DWT) decomposition coefficients using the Daubechies wavelet filter of order 4 (db4). The maximum classification accuracy is reached by employing the normalized variances and the means of the DWT approximation coefficients, at the fourth decomposition level and is even higher than that obtained by the separate use of normalized means and variances. The detailed results obtained from the “RIT dataset” are given in Table 3. We consider the hole “RIT dataset” with the 12 gesture classes to evaluate our approach, not only the five gestures used by authors in [13] in their classification approach. We reached a good accuracy rate of about 85% with fewer computer resources than required to train a CNN model.
LeapGestureDB
135
Table 2. Performance of LM features Features
RIT dataset LeapGestureDB
Mean (M) + standard deviation (SD) 70.85%
80.95%
Fast Fourier transform (fft)
74.95%
84.95%
Discrete wavelet transform (dwt)
84.85%
93.36%
Table 3. Confusion matrix for performance evaluation on RIT dataset with spatiofrequential features G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G1 93.14 1.71 1.1 0 0 0.5 0 3.5 0 0 0 0 G2 1.71 61.15 18.3 0 0.6 0 0 16 0 0 1.5 1.5 G3 20.6 3.4 68.6 0 0 0 0 6.3 0 0 1.5 0 G4 0 0 0 100 0 0 0 0 0 0 0 0 G5 0 0.5 7.5 1.8 76.6 6.8 0 6.8 0 0 0 0 G6 0 0 0 12.6 0 87.43 0 0 0 0 0 0 G7 0 0 0 0 0 0 94.85 0 0 5.15 0 0 G8 1.7 0 0.6 0 0.6 0 0 94.29 2.85 0 0 0 G9 0 0 0.6 0 2.3 3.42 0 0 94.85 0 0 0.5 G10 0 0 0 0.5 0 2.3 3.4 0 0 93.7 0 0 G11 0 4.5 17.7 0 0 0 0 2.3 0 0 74.3 1.15 G12 3.4 2.28 6.85 0 0 0 0 7.4 0.5 0 0 79.5
Table 4. Confusion matrix for performance evaluation on our dataset with spatiofrequential features G1 G2 G3 G4 G5 G6 G7 G8 G9 G1 97 0 2.5 0 0 0 0.5 0 0 G2 0 89.5 9.5 0 0 0 0 0 0.5 G3 0 9.5 88.5 0 0 0 0 0 0 G4 0 0 0 97.5 0 0.5 0.5 0 0 G5 0 0 0 0 96 4 0 0 0 G6 0 0 0 1 5 91.5 2.5 0 0 G7 0.5 0 0 0.5 0 2.5 96.5 0 0 G8 0 0.5 0 0 0 0 0.5 98.5 0 G9 0 0 0 0 0 0 0 0 98.5 G10 0 0 0 2 0 0 0 0.5 0.5 G11 0.5 0.5 0 1 0 0.5 1.5 0.5 0
G10 0 0.5 1.5 1.5 0 0 0 0 1.5 86 8
G11 0 0 0.5 0 0 0 0 0.5 0 11 87.5
136
S. Ameur et al.
From the results in Table 4, which describes the performance with the spatiofrequential features proved in our dataset, it is possible to notice how the accuracy is very close or above 90% for most of the gestures. The critical gestures for the LMC are G1, G4, G5, G6, G7, G8 and G9, and reveal a very high accuracy when recognized from the sensor. While, the gestures G2, G3, G10, and G11 frequently fail the recognition. For instance, both G2 and G3, two reciprocal gestures, have a single raised finger (index) and in most of the time are confused with each other. Likewise, for the gestures grip in and grip out also for the hand swipe left and right.
7
Conclusion
In this paper, we have presented the “LeapGestureDB”, a new dataset based on the LMC sensor for dynamic hand gesture. This dataset is freely available online. We have carefully described the dataset structure and the acquisition environment. To conclude, our proposed dataset will help in bridging the evaluation gap between diverse systems dedicated to recognizing hand gestures that use their own datasets and the emerging of the LMC technology. Touchless interaction may enhance the surgeon’s experience in operating rooms. There is a great advantage if a surgeon could control computer displays just with intuitive gestures. This can effectively reduce the surgy time and the risk of infection. In this work, we have shown that accurate hand gestures may be achieved by machine learning. We have proposed a simple and accurate approach to dynamic hand gesture recognition. Acknowledgment. This work was supported and financed by the Ministry of Higher Education and Scientific Research of Tunisia. We would like to thank all the volunteers from the National engineers School of Sousse (ENISo) who dedicated their valuable time and effort in the data collection process. Also, we gratefully acknowledge the surgery service at the university hospital Sahloul in Tunisia, for their valuable suggestions and discussions.
References 1. Smari, K., Bouhlel, M.S.: Gesture recognition system and finger tracking with kinect: steps. In 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), Tunisia, pp. 544–548. IEEE (2016). https://doi.org/10.1109/SETIT.2016.7939929 2. Yun, E.: Analysis of machine learning classifier performance in adding custom gestures to the Leap Motion. Master thesis, in the Faculty of California Polytechnic State University, San Luis Obispo (2016). https://doi.org/10.15368/theses.2016. 132 3. Triki, N., Kallel, M., Bouhlel, M.S.: Imaging and HMI: fondations and complementarities. In 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 25–29. IEEE ,Tunisia (2012). https://doi.org/10.1109/SETIT.2012.6481884
LeapGestureDB
137
4. Ben Abdallah, M. , Kallel, M., Bouhlel, M.S.: An overview of gesture recognition. In 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp.20–24. IEEE ,Tunisia (2012). https://doi.org/10.1109/SETIT.2012.6481883 5. Hore, S., Chatterjee, S., Santhi, V., Dey, N., Ashour, A.S., Balas, V.E., Shi, F.: Indian sign language recognition using optimized neural networks. Information Technology and Intelligent Transportation Systems. Springer Cham, vol. 455, pp.553–563 (2017). https://doi.org/10.1007/978-3-319-38771-0 54 6. Mimouna, A., Ben Khalifa, A., Essoukri Ben Amara, N.: Human action recognition using triaxial accelerometer data: selective approach. In the 15th International Multi-Conference on Systems, Signals and Devices (SSD), pp. 467–472 (2018) 7. Lejmi, W., Ben Khalifa, A., Mahjoub, M.A.: Fusion strategies for recognition of violence actions. In: The IEEE/ACS, The 14th International Conference on Computer Systems and Applications (AICCSA), pp. 178–183 (2017). https://doi.org/ 10.1109/AICCSA.2017.193 8. Ameur, S., Ben Khalifa, A., Bouhlel, M.S.: A comprehensive leap motion database for hand gesture recognition. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), Tunisia, pp. 514–519. IEEE (2016). https://doi.org/10.1109/SETIT.2016.7939924 9. Weichert, F., Bachmann, D., Rudak, B., Fisseler, D.: Analysis of the accuracy and robustness of the leap motion controller. J. Sensors 1, 6380–6393 (2013). https:// doi.org/10.3390/s130506380 10. Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with leap motion and kinect devices. In: IEEE International Conference on Image Processing (ICIP), France, pp. 1565–1569 (2014). https://doi.org/10.1109/ICIP.2014.7025313 11. Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with jointly calibrated leap motion and depth sensor. J. Multimed. Tools Appl. 75, 14991–15015 (2016). https://doi.org/10.1007/s11042-015-2451-6 12. Zaiti, I.A., Pentiuc, S.G., Vatavu, R.D.: On free-hand tv control: experimental results on user-elicited gestures with leap motion. J. Pers. Ubiquit. Comput. 19, 821–838 (2015). https://doi.org/10.1007/s00779-015-0863-y 13. McCartney, R., Yuan, J., Bischof, H.P.: Gesture recognition with the leap motion controller. In: The International Conference on Image Processing, Computer Vision, and Pattern Recognition (2015). http://scholarworks.rit.edu/other/857 14. Boulahia, S.Y., Anquetil, E., Multon, F., Kulpa, R.: Dynamic hand gesture recognition based on 3D pattern assembled trajectories. In: The 7th IEEE International Conference on Image Processing Theory, Tools and Applications (IPTA), pp.1–6. Canada (2017). https://doi.org/10.1109/IPTA.2017.8310146 15. Kumar, P., Gauba, H., Roy, P.P., Dogra, D.P.: Coupled HMM-based multi-sensor data fusion for sign language recognition. J. Pattern Recogn. Lett. 86, 1–8 (2017). https://doi.org/10.1016/j.patrec.2016.12.004 16. Kumar, P., Saini, R., Roy, P.P., Pal, U.: A lexicon-free approach for 3D handwriting recognition using classifier combination. J. Pattern Recogn. Lett. 103, 1–7 (2018). https://doi.org/10.1016/j.patrec.2017.12.014 17. Taranta II, E.M., Samie, A., Maghoumi, M., Khaloo, P., Pittman, C.R., LaViola Jr, J.J.: Jackknife: a reliable recognizer with few samples and many modalities. In: The CHI Conference on Human Factors in Computing Systems (CHI 2017), Denver, Colorado, USA, pp. 5850–5861 (2017). https://doi.org/10.1145/3025453. 3026002
138
S. Ameur et al.
18. Vikram, S., Li, L., Russell, S.: Handwriting and gestures in the air, recognizing on the fly. In: Proceedings of the CHI 2013 Extended Abstracts, Paris, France (2013). http://www.cs.cmu.edu/∼leili/pubs/vikram-chi2013-handwriting.pdf 19. Mapari, R.B., Kharat, G.: Real time human pose recognition using leap motion sensor. In: The International Conference Research in Computational Intelligence and Communication Networks (ICRCICN), India, pp. 323–328 (2015). https://doi. org/10.1109/ICRCICN.2015.7434258 20. Khelil, B., Amiri, H.: Hand gesture recognition using leap motion controller for recognition of Arabic sign language. In: The 23rd International Symposium on Industrial Electronics (ISIE), Turkey, pp. 233–238 (2014). https://doi.org/10.1109/ ISIE.2014.6864742 21. Lu, W., Tong, Z., Chu, J.: Dynamic hand gesture recognition with leap motion controller. IEEE Signal Process. Lett. 23, 1188–1192 (2016). https://doi.org/10. 1109/LSP.2016.2590470 22. Li, W.J., Hsieh, C.Y., Lin, L.F., Chu, W.C.: Hand gesture recognition for poststroke rehabilitation using leap motion. In: The International Conference on Applied System Innovation (ICASI), Japan, pp. 386–388 (2017). https://doi.org/ 10.1109/ICASI.2017.7988433 23. Opromolla, A., Volpi, V., Ingrosso, A., Fabri, S., Rapuano, C., Passalacqua, D., Medaglia, C.M.: A usability study of a gesture recognition system applied during the surgical procedures. In: Marcus, A. (eds.) Design, User Experience, and Usability: Interactive Experience Design. DUXU 2015. Lecture Notes in Computer Science, vol. 9188, pp. 682–692. Springer, Cham (2015). https://doi.org/10.1007/ 978-3-319-20889-3 63 24. Leap Motion Controller. https://www.leapmotion.com. Accessed 10 Nov 2014
Experiments on Phonetic Transcription of Audio Vowels Ioan Păvăloi1 and Anca Ignat2(&) 1
Institute of Computer Science, Romanian Academy Iaşi Branch, Iaşi, Romania Faculty of Computer Science, University “Alexandru Ioan Cuza” of Iași, Iași, Romania [email protected]
2
Abstract. In 2007, a volume from a series of linguistic atlases (NALR. Moldova şi Bucovina, the 3-rd volume) was entirely edited using the computer, which was a national premiere at that moment. A second step consists in creating an instrument that would help the human operator in the phonetic transcription of the audio recordings. This instrument is also meant to be used for analyzing the phonetic characteristics of the Romanian language sounds and for comparisons between Romanian language and other Romanic (or neighboring people) languages. In this paper we present our research on automatic phonetic transcription of the audio signal. We give some details about creating acoustic and phonetic resources, the phonetic alphabet adopted for the representation of dialectal utterances, as well as a statistical analysis of the occurrences of vowels and of the associated phenomena from the electronic dictionary. We introduce some preliminary results obtained in the automatic recognition of the phonetic transcription for the Romanian language’s vowels. These results lead us to some conclusions and future research continuation of this work. Keywords: Phonetic alphabet for Romanian language Linguistic atlas Automatic phonetic transcription Dialectal audio recordings K-NN SVM
1 Introduction In 2009, the Romanian linguistic cartography celebrated a century of existence, which emphasizes the vast activity in this field. Over the years, three sets of national linguistic atlases have been published, as well as cartographic works devoted to limited, north or south Danubian ethno-linguistic areas. One of the first notable results was obtained in 2007 at the Iași Branch of the Romanian Academy where the first volume of the NALR/ALRR series (NALR, Moldova and Bucovina, 3-rd volume), entirely computer edited, was a national premiere. An important objective that was taken into consideration from the beginning was that the development of linguistic atlases in electronic format will later enable the large amount of data stored in digital format to be used for the development of future interdisciplinary projects. The correlation of the phonetic transcription for each phoneme with the corresponding segment of the acquired audio signal will allow the analysis of the phonetic and phonological characteristics of the © Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 139–152, 2021. https://doi.org/10.1007/978-3-030-52190-5_10
140
I. Păvăloi and A. Ignat
sounds of the Romanian language as well as the comparison between the Romanian language and other Romanic languages or languages of the neighboring countries. Later, a new project was launched, which aimed at creating a multimedia linguistic atlas. It takes into account both the Romanian tradition in this field and the most recent European experiences in the field (Sprachatlas des Dolomitenladinischen und angrenzender Dialekte - ALD, Atlas linguistique audiovisuel du Valais romand ALAVAL, Atlas Linguístico-Etnográfico dos Açores - ALEAç etc.). If in the first stage, the development of the phonetic transcription is a process that is performed exclusively by a human operator, using a dedicated editor for this purpose [1, 2], now it is considered an interactive process of transcription, in which the computer proposes the basis of the audio signal, one or more phonetic transcript variants for a word. Thus, if initially the purpose of the research was to make a useful tool in editing and publishing volumes of the Romanian Regional Linguistic Atlas [3–5], at this stage the emphasis is on creating a tool to help the human operator with the phonetic transcription of the audio material. If possible, this would be very useful because the phonetic transcription process is relatively time expensive. Moreover, it is a challenging process for the people involved. In this paper, we aim to show the first steps we have taken to design and implement a semiautomatic phonetic transcription software for dialectal audio recordings. As far as we know, this type of approach was never addressed before, at least not for the Romanian language. In Sect. 2 we present details about the automatic recognition of speech and phonetic transcription we worked on. Section 3 introduces the resources we have created. Section 4 mainly focuses on primary vowels and phenomena and Sect. 5 offers details about data collection and processing. Experimental results are presented in Sect. 6 and Sect. 7 summarizes this study and emphasizes some possible further developments of this work.
2 Automatic Recognition of Speech and Phonetic Transcription Obviously, the ideal solution for obtaining an automatic phonetic transcription for audio recordings should address two different issues: • automatic speech recognition for audio recording; • the automatic phonetic transcription corresponding to the specific recording. In Romania, there are certain approaches to solving this problem; for example, at the RACAI Institute in Bucharest tools were designed that allow both automated speech recognition and synthesis of speech [6]. If automatic speech recognition for audio recording has a wide range of applicability, the automatic recognition of phonetic transcription has a limited applicability - it can help a linguist in making dialectal dictionaries. We have strictly addressed the issue of the possibility of automatic recognition of the phonetic transcription corresponding to a recording, assuming that speech recognition was already achieved for that recording. In other words, the text corresponding to the recording is previously known and the audio signal is segmented at the phoneme level.
Experiments on Phonetic Transcription of Audio Vowels
141
Both automatic recognition of phonetic transcription and automatic speech recognition are influenced by a series of components; the most important will be presented below. The first important factor influencing phonetic transcription processes is the style of speech. It refers to how fluent, natural or conversational the speech to be recognized is. Obviously, the phonetic transcription of isolated words (separated by pauses), is much simpler than that of continuous speech, where the words are pronounced in an apparently linked manner. Another extremely important component for automatic speech recognition is the acoustic environment in which the recording was taken. For those taken outside a recording studio, there are usually one or more acoustic sources, including other speakers, background noise, etc. In most cases, separating the different acoustic signals is a very difficult issue. The microphone used for recording also has a significant impact on the accuracy of speech recognition. In general, the results of an automated phonetic transcription system depend on the speaker’s characteristics; among these features, we can list the speaker’s accent, the used language or dialect, whether or not the speaker uses the native language, the speed of pronunciation, physiological characteristics (age, possible health problems), the emotional state of the speaker at the time of recording. Inter-speaker variability can be solved by designing speaker-dependent recognition systems (specific to each speaker), but this type of approach has two major disadvantages related to the existence of training data for each speaker as well as the fact that a new acoustic model must be trained for that speaker. On the other hand, it is obvious that an independent speaker system is less efficient (for a particular speaker) than a system created and trained for that speaker.
3 Creating Acoustic and Phonetic Resources For a considerable number of languages (including Romanian) there are not enough appropriate acoustic and phonetic resources (audio recordings databases, phonetic transcriptions of recordings), so designing and creating a system of automatic recognition of phonetic transcription must start with the design and collection of such resources. In order to obtain these minimal resources, we started from a dialectal collection which was the basis for the creation of the Bucovina-ALAB Audio-Visual Language Atlas. This atlas/collection has approximately 3,500 answers (and also 500 answers in “huțule dialect”, additionally obtained in Brodina) and a set of specific records (ethnotexts) obtained from the 28 persons who were questioned during the entire project. 3.1
Creating the Dialectal Archive
A group of researchers conducted a number of field surveys, based on a questionnaire specifically created for this purpose (published in a distinct brochure), called questionnaire of the New Romanian Linguistic Atlas. This questionnaire was compiled under the guidance of Acad. Emil Petrovici and M.C. Prof. B. Cazacu, from the
142
I. Păvăloi and A. Ignat
Dialectology Department of the Centre of Phonetic and Dialectal Research of the Romanian Academy, and the Dialectology Group of the Institute of Linguistics of the Cluj Branch of the Romanian Academy. From the questionnaire, a subset of 126 questions were selected, choosing the section Curtea, mijloace de transport, animale domestice, păsări de curte (Court, means of transportation, domestic animals, poultry). A site was developed, using this survey results (image and sound files); these results were thus made available to all interested parties. The recordings were made in seven locations, Ilişeşti, Doroteia, Mănăstirea Humor, Solca, Deluţ, Fundu Moldovei, Brodina. It was established that a number of four subjects from each location would be interrogated: two adult subjects (one woman and one man), both over 60 years of age, as well as two young subjects (a man and a woman) both up to 35 years old. 3.2
Selection of a Subset of Data and Its Processing
Of the 4233 audio recordings (obtained from the video recordings posted on the ALAB site), a subset of more than 700 records was selected. We selected 26 questions and the corresponding answers. There are 28 responses to each question (4 interrogated persons for each of the 7 locations), thus obtaining a subset of 728 recordings. This subset of recordings was the initial basis from which we intend to develop an acoustic resource to be used in further research. We emphasize that the answers to the questions from the questionnaire are not always the expected ones (those indicated in the questionnaire). There are situations in which the interviewed person responds “I don’t know” and situations where the answer is different from the expected one.
4 Phonetic Transcription, Primary Vowels, Phenomena Studying a foreign language raises a number of issues. In many languages there is a discrepancy between the way the word is written and its pronunciation; therefore, a first step in getting the correct pronunciation is the study of a phonetic transcription. For example, the English language alphabet has 26 letters, and in the spoken language 44 sounds/phonemes can be identified. In order to help English speakers to pronounce each sound correctly, a graphical representation was required, that is, a phonetic transcription. Thus, in any English-Romanian dictionary, the readers will find the phonetic transcription of each word in the dictionary. Analogously, each dialectal speech is accompanied by the corresponding phonetic transcription. There exists an international phonetic alphabet [7], but for our purpose, it was considered to be limited, failing to grasp the variety encountered in a dialectal speech, thus a new phonetic alphabet was developed for the representation of dialectal sounds. The primary vowels used in phonetic transcription are shown in Table 1 [3]:
Experiments on Phonetic Transcription of Audio Vowels
143
Table 1. Primary vowels [3] Simple Diacritics a ä ă â ȧ ä e ë ĕ i î ȋ o ö ŏ u ü û
The complete set of vowel sounds is achieved by using these “primary vowels” and the three accented variants of each of them (a – á, à, a). There are a number of phonetic phenomena that can be applied to the basic vowels that can be classified into five main groups see Table 2 [3]. A maximum of 5 phonetic phenomena can be applied to a primary vowel, but not more than one from each group. The notion of aphonic transformation comprises all vowel achievements obtained by applying at least one phonetic phenomenon to the primary vowels. There are, however, some linguistic exceptions. • vowels i ȋ î u ü û - the phonetic phenomenon of “closed” cannot be applied to them, these vowels being by their nature closed vowels (with the lowest degree of aperture). • analogously, to vowels a ä å cannot be applied the vowel phenomena of “halfopened”, “opened” and “extra-opened” because they are by their nature open vowels (with the highest degree of aperture). Table 2. Phonetic phenomena for vowels [3] Groups
Phenomena
Duration
Short Half-long Long
Nasality
Half-nasalized Nasalized
Glottal Occlusion
Coup de glotte
Aperture
Closed Half-opened Opened Extra-opened
Aphonization
Half-silenced Silenced
Examples
144
I. Păvăloi and A. Ignat
5 Data Collection and Processing In the preliminary step, the conversion from the Flash Video (.flv) format to the Waveform Audio (.wav) format was performed on the previously selected subset. A record usually contains one or two words, singular and eventually plural forms for a term. The conversion was done using the software AVC free converter that can be downloaded from [8]. In the first phase, the processing was focused on two aspects. Firstly, the phonetic transcription of the chosen subset was performed, and secondly audio recordings were processed. 5.1
Phonetic Transcription of the Selected Subset
Phonetic transcription was performed using the ALT_IIT editor, which was also used in the editing of two volumes of the New Romanian Atlas of Linguistics, by regions. Moldova and Bucovina. The phonetic transcription of the selected audio recordings is based on the audition of these recordings. Additional to the initial version, a number of modules have also been designed and implemented which allow not only to quickly examine the dictionaries of the linguistic atlas, but also to export this information into a text file, which will then be correlated with the features extracted from the audio recordings. All this data is necessary for testing the software modules that perform the phonetic transcriptions recognition. Moreover, there were also developed modules to allow handling the audio files corresponding to a phonetic transcript. 5.2
Processing Achieved Using Audio Files
The annotation of the audio files from the selected subset, was made manually, at the phoneme level, using the Praat utility [9]. After the manual annotation, for each record a text file with the extension. TextGrid is obtained. In the next step, using a script, two text files are created for each audio record and the corresponding. TextGrid file, the first containing the values of the F0 formant and the second with F1–F4 formant values. Using the programs HCopy and HList respectively [10], for each record, two text files (with extensions.mfcc and.plp) were created. These files contain the values of MFCC (Mel-frequency cepstral coefficients), DMFCC (delta-MFCC), DDMFCC (deltadeltaMFCC) coefficients and PLP (Perceptual Linear Prediction), DPLP and DDPLP coefficients, respectively. For each audio record, multiple feature sets can be generated based on the value of the F0–F3 formants and MFCC, DMFCC, DDMFCC, PLP, DPLP, and DDPLP coefficients [11]. Finally, six sets of feature vectors were generated that were later used in the automatic recognition of phonetic transcription for vowels in the audio signal: • SET1 - the first set of feature vectors is obtained based on the values of the 12 MFCC coefficients, from which the following statistical values are computed: mean, median, standard deviation, first and third quartile, finally obtaining 60 features (5 12);
Experiments on Phonetic Transcription of Audio Vowels
145
• SET2 - the second set of feature vectors is obtained by analogy with the first set, based on the values of the MFCC, DMFCC and DDMFCC coefficients respectively, computing the same statistical values thus obtaining 180 features (3 60). Optionally, the C0, DC0 and DDC0 values can also be used, thus obtaining 15 additional/extra features; • SET3 - the third set of feature vectors is obtained as the first set, but this time using the 12 PLP coefficients, finally obtaining 60 features; • SET4 - the fourth set of feature vectors is obtained similarly to the second set, based on the values of PLP, DPLP and DDPLP coefficients, obtaining vectors with 180 characteristics. Analogously, the values C0, DC0 and DDC0 can also be used; • SET5 - is generated based on the values of the F0–F3 formant values, generating 45 statistical features for each formant, with a total of 4 45 = 180 statistical features. Optionally, the duration can be added; • SET6 - consists of all the features that are generated in the first five sets of features. The generative models GMM (Gaussian Mixture Models) [12] and HMM [13, 14] can be used when dealing with long feature vectors [12], which is not the case in our situation. This is the reason why we used two discriminatory classifiers, k-NN (k Nearest Neighbor) [15] and SVM (Support Vector Machine) [16, 17]. In the tests performed using k-NN different values for k were used. The best results are for the low values of k, respectively the values k = 1, k = 3 and k = 5, higher values of k have generally yielded poor results, which can be explained by the small number of feature vectors. This situation can change as the volume of the data set increases. In the tests performed using the k-NN algorithm four different distances were used: Euclidean, Manhattan, Canberra and Minkowski.
6 Results After establishing all the requirements, we designed an architectural model that has the following steps: 1. creation of data collection; 2. for each entry in the dictionary that contains phonetic transcripts: 2:1. scanning the manual annotation for that entry; 2:2. reading for each entry the corresponding values of the formats, the MFCC and PLP coefficients; 2:3. based on user-selected options, generating the feature vectors, one vector for each phoneme; 3. establishing the training and test datasets based on a chosen cross validation technique (we used the LOO-CV Leave-one-out cross validation technique). 4. use of two discriminative classifiers k-NN and SVM classifiers for the automatic recognition of the phonetic transcription of the Roman language vowels. 5. estimating the classification error and evaluating the model based on the employed cross-validation technique.
146
6.1
I. Păvăloi and A. Ignat
Statistical Analysis
In a first stage, a statistical analysis was performed, on the occurrences of the vowels and related phenomena of the electronic dictionary “Noul Atlas lingvistic român, pe regiuni. Moldova şi Bucovina, vol II” (“New Romanian linguistic atlas, by regions. Moldova and Bucovina, vol II “) in order to determine which are the classes of interest for each vowel (primary vowels that have a significant number of appearances). In this way one can avoid situations in which for a vowel appears a class with a very small number of entrances. It was intended to determine a threshold (minimum percentage of appearances) that will allow us to obtain superior results in the analysis. This dictionary contains the answers to 359 questions obtained in 210 survey locations [8], five dialectologists participated in the phonetic transcripts. In total, the number of phonemes appearing in the electronic dictionary is 459185. We note that the comments (as well as the spaces, punctuation marks) have been ignored during the processing. Thus, a total of 36.1470 phonemes were taken into account for processing. The total number of phonetic transformations applied to phonemes is 24631. It was intended to highlight all phenomena (including groups of phenomena) applied to a phoneme that appear in the dictionary as well as the corresponding number of appearances. In total, 50 classes of phonetic transformations were highlighted, each class comprising one, two, or three phonetic transformations. The main aspects to be pointed out are: • of the 50 classes, 16 contain only one phonetic transformation, 30 have two and only 4 have three phonetic transformations. The four classes that contain three phonetic transformations are applied to only 6 phonemes (0.024%). The 30 classes containing two phonetic phenomena are applied to 977 (3.97%) of the phonemes. In contrast, the 16 classes containing one phonetic transformation apply to 23,648 phonemes (over 96%). Thus, we can conclude that the recognition of classes that contain two or three phonetic transformations cannot be addressed, for the moment. • studying the frequencies of occurrence for the phonetic transformations, we note that there is a series of groups of phonetic transformations, that cannot be the subject of automatic recognition either now or in the future, due to the small number of occurrences. We refer to duration, syllabic character, glottal occlusion. Also, in the case of other phonetic transformations, we notice the small number of occurrences, for example the phenomena of semi-nasalization (six occurrences) and large palatalization (three occurrences). Also, if we consider the number of phonemic characters for vowels, 77 appearances for a phonetic phenomenon like “large opening”, this number is too small to try the automatic recognition for this type of transformation; • from the group of 16 classes containing only one phonetic transformation, half of them (eight) apply to at least 1% of the total number of phonemes from the dictionary (361470 phonemes). This refers to the phenomena: aphonization (10711 appearances), aperture (3966), palatalization (3282), half- silenced (1354), nasality (1114), half-palatalized (1029), closed (955) and half-opened (936). If we consider a threshold of 5%, the first four of the above-mentioned phenomena apply to less
Experiments on Phonetic Transcription of Audio Vowels
147
than 5% of the total number of phonemes from the dictionary (aphonization, aperture, palatalization, half-silenced). We note that the phonetic transcription is an extremely subjective process and it is marked by the particular “fingerprint” of the person who performs it. Thus, each of the five dialectologists used a different number of classes of phonetic transformations. In Table 3, we have for each dialectologist (coded with a code from 1 to 5) the total number of phonemes transcribed, the number of classes of phonetic transformations used, the number of phonemes to which the phonetic transformations apply and the percentage of phonemes to which these transformations apply. In Table 3 we denoted by TNF the total number of phonemes, and by NFTF the number of phonemes that support phonetic transformations. Table 3. Statistics on dialectologist work 1 2 3 4 5
TNF 73799 80341 72392 51118 83820
Classes 50 14 8 15 16
NFTF Percent 9014 12.214 4595 5.719 3243 4.480 2508 4.906 5271 6.288
Thus, we notice that there exists a very large difference between the dialectologists with codes 1 and 3. In the case of the first dialectologist, one of the 50 phonetic transformations classes apply to 9014 phonemes from 73799 (i.e. 272,653), while for the one having code 3 only 8 phonetic transformations classes apply to 3243 phonemes out of a total of 72392. Additionally, a cluster analysis of the data collection for the vowels of the Romanian language was performed. In this analysis, the same six sets of feature vectors were used as in the primary vowel recognition. In the cluster analysis, different distances were used in the tests: Euclidean, Manhattan, Canberra, Minkowski, Cebychev, Hausdorff and Pompei. The cluster centroid computation was performed in two variants, using the arithmetic mean or the median value. The indices Davies-Bouldin, Calinski-Harabasz and Dunn were computed. The cluster analysis highlighted the same thing as the statistical analysis, that it is necessary to validate the existing data collection. In other words, there are vowels in audio recordings whose acoustic features are extremely “similar,” but in the phonetic transcription different primary vowels were used. The fact that phonetic transcription has not been validated certainly makes the recognition rates lower than those that would have been obtained on a collection of data for which a phonetic transcription validation had been performed. 6.2
Processing Achieved Using Audio Files
Some preliminary results obtained in experiments made on phonetic transcription of audio vowels are presented below. We performed k-NN (k = 1,3,5) computations, but
148
I. Păvăloi and A. Ignat
we shall present in the following tables, only 3-NN results for vowel recognition. The 3-NN provided the overall best average results. In some cases, 5–NN yielded better results, as we shall mention in the following. The best result obtained for vowel ‘a’, 69.46% (see Table 4), is obtained using SVM for SET2 and SET4. Using 3-NN, the best results are obtained using Canberra distance and SET4 of feature vectors. Table 4. 3-NN and SVM results obtained for vowel ‘a’ SET1 3-NN - Euclidean 57.62 3-NN - Manhattan 58.96 3-NN - Canberra 60.47 SVM 65.44
SET2 60.30 60.64 61.81 69.46
SET3 58.79 58.46 59.80 65.77
SET4 59.97 63.15 63.65 69.46
SET5 58.29 63.15 61.81 63.42
SET6 60.47 63.15 64.32 68.46
The best result obtained in experiments made for vowel ‘e’, is 63.48% (see Table 5) and it is obtained using 1-NN and 3-NN with Euclidean distance and set SET6 of feature vectors. Table 5. 3-NN and SVM results obtained for vowel ‘e’ SET1 SET2 3-NN - Euclidean 46.96 47.83 3-NN - Manhattan 46.96 42.61 3-NN - Canberra 46.96 58.26 SVM 54.39 59.65
SET3 47.83 46.09 46.96 52.63
SET4 47.83 46.96 46.96 57.90
SET5 48.70 47.83 56.52 57.90
SET6 63.48 62.61 57.39 59.65
As for vowel ‘a’, the best results in experiments are obtained using 3-NN and Canberra distance. The best result obtained in phonetic transcription of audio vowels for vowel ‘i’, 82.96%, was obtained using SET6 dataset and 5-NN with Manhattan distance. Using SVM classifier, a result of 73.63% is obtained using the same feature vectors dataset (Table 6). Table 6. 3-NN and SVM results for vowel ‘i’ SET1 SET2 3-NN - Euclidean 71.94 74.83 3-NN - Manhattan 73.64 75.51 3-NN - Canberra 71.77 77.55 SVM 71.92 73.63
SET3 74.49 75.51 72.11 71.58
SET4 77.04 79.93 77.55 73.29
SET5 78.40 76.87 77.04 72.60
SET6 79.42 82.10 79.59 73.63
Experiments on Phonetic Transcription of Audio Vowels
149
The best result obtained for vowel ‘o’ is 74.70%, obtained using 5-NN and Canberra distance for SET6 features dataset. The 3-NN and SVM results obtained in our experiments are presented in Table 7. Table 7. 3-NN and SVM results for vowel ‘o’ SET1 3-NN - Euclidean 58.91 3-NN - Manhattan 55.67 3-NN - Canberra 59.11 SVM –
SET2 64.17 66.60 71.26 71.54
SET3 62.75 60.93 61.94 62.20
SET4 67.81 71.26 73.08 62.20
SET5 64.37 62.55 60.32 65.44
SET6 72.67 73.68 72.67 68.70
For vowel ‘u’, the best result obtained in phonetic transcription is 81.52%, obtained using SET6 dataset and 3-NN with Canberra (Table 8). Table 8. 3-NN and SVM results for vowel ‘u’ SET1 SET2 3-NN - Euclidean 67.66 71.47 3-NN - Manhattan 66.30 72.83 3-NN - Canberra 67.93 76.63 SVM 69.95 70.49
SET3 68.21 66.30 56.52 69.95
SET4 71.74 72.28 68.48 70.49
SET5 73.64 70.65 77.17 72.68
SET6 79.08 78.53 81.52 72.13
In experiments made in phonetic transcription we get 72.62% for vowel ‘ă’, using SVM for SET2, SET4 and SET6 datasets. The best result using k-NN classifier is 71.60% being obtained for SET 6 dataset, for k = 1 with Manhattan distance and k = 5 with Euclidean distance. Table 9 presents the results computed with 3-NN and SVM. Table 9. 3-NN and SVM results for vowel ‘ă’ SET1 SET2 3-NN - Euclidean 65.09 60.36 3-NN - Manhattan 66.27 60.36 3-NN - Canberra 63.91 65.68 SVM 70.24 72.62
SET3 63.91 66.86 55.62 70.24
SET4 65.68 61.54 55.62 72.62
SET5 65.68 64.50 63.31 71.43
SET6 68.64 68.05 65.68 72.62
The results obtained for vowel ‘î’, as were those for vowel ‘e’, are not so good. A score of 56.50% was obtained for SET2, SET4, SET5 and SET6 dataset using SVM classifier. The value 55.65% is obtained using 5-NN with Canberra distance for SET6 dataset. SVM and 3-NN results obtained in classification are presented in the Table 10.
150
I. Păvăloi and A. Ignat Table 10. 3-NN and SVM results for vowel ‘î’ SET1 SET2 3-NN - Euclidean 44.92 51.41 3-NN - Manhattan 46.89 48.59 3-NN - Canberra 40.68 52.54 SVM 55.37 56.50
SET3 47.46 50.56 46.61 53.11
SET4 51.13 51.13 48.87 56.50
SET5 44.92 48.31 50.56 56.50
SET6 47.74 53.39 54.80 56.50
7 Comments, Conclusions, Future Work Taking into account the results of the experiments, we can synthesize some conclusions: • Regarding the recognition of the type of phenomenon applied to the primary variables, their small number in the current database makes it impossible, at this point, to perform some automatic processing aimed at recognizing the type of phenomenon applied to the primary vowels. But as the data collection develops, these tests will become possible. It is possible that even if the data collection becomes thrice larger than it is now, the number of phenomena applied does not become significantly higher than the present one, or it will increase very little. Unlike other situations in which the appropriate data for each class can be chosen (as was the study of emotions [18, 19]), in this case we cannot choose data beforehand such that after the phonetic transcription is performed we obtain the phenomena we want to recognize (we don’t not know in advance which phenomena will appear in the respective records). Obviously, from the text analysis, there is the possibility to signal the semi-vowels. Analogically, the duration phenomenon can probably be highlighted by comparing the duration of the analyzed vowel with the average duration for the same primary vowel; the currently available data in the collection did not allow us to perform such estimations; • Cluster analysis has emphasized the subjectivity expressed in phonetic transcription, which adds additional difficulties to the recognition process, making it obvious that the recognition rates are inferior to those that would have been obtained on a validated data collection (from the point of view of phonetic transcription). In other words, it is perfectly possible for a person to find a certain phonetic transcription as correct, while another person may find it inappropriate. In order to avoid this situation as much as possible, the phonetic transcription for the current database is made by one person; • The fact that some recognition rates are higher than the statistical percentage of the occurrence of that primary vowel suggests, however, that the primary vowel recognition can be automatically made, with an acceptable error for a validated data collection; • The calculation and interpretation of some measures based on the confusion matrix, indicates us which primary vowels are difficult to recognize due to the lack of a certain number of specific records in the data collection; if for the phonemes, we cannot anticipate in advance their occurrence in a set of recordings, the problem is
Experiments on Phonetic Transcription of Audio Vowels
151
somewhat simplified, because it is possible to delimit a set of data in which the possibility of occurrence is higher and thus, by phonetic transcription of that set of recordings, to properly increase the existing data collection; • Among the employed distances, in interpreting the results, Canberra distance provided the best results. The overall best results were obtained with SVM. Regarding the feature vectors, one can choose SET2 or SET4, i.e. the sets based on MFCC, DMFCC and DDMFCC and PLP, DPLP and DDPLP, respectively. Although the SET6 set allows obtaining better results, they are not significantly better than those obtained using SET2 or SET4 feature vectors, so we think it is enough to use the previously mentioned sets, sets that have a smaller number of features. Since the analysis of the results obtained in the recognition of primary vowels allows us to express some optimistic conclusions we have proposed the continuation of the research, which we want to achieve in several stages: • first of all, increasing the volume of the data collection (at least three to four times) compared to the current volume; there is already a fairly large number (over 20% of the total volume of audio recordings on the ALAB site) of manually annotated recordings at the phoneme level (in the next stage we want an increase of at least 33%). We want a significant increase in the set of phonetic transcribed records; • a possible manual validation of the data collection (or at least of a subset that will be used as training data); we are even considering automatic data collection validation (performing clustering, using, for example, the k-means algorithm and then validating a subset of phonetic transcripts); • the approach of two stage recognition will be considered; in the first stage a more general class (which contains several classes) will be recognized, followed by a second stage for the primary vowel recognition.
References 1. Bejinariu, S., Apopei, V., Dumistrăcel, S., Teodorescu, H.-N.: Overview of the integrated system for dialectal text editing and Romanian Linguistic Atlas publishing – 2009. In: The 13–th International Conference INVENTICA 2009, pp. 564–572. Editura Performantica, Iaşi (2009) 2. Bejinariu, S., Apopei, V., Luca, R., Botosineanu, L., Olariu, F.: Instrumente pentru consultarea Atlasului Lingvistic si editarea textelor dialectale, Lucrarile atelierului RESURSE LINGVISTICE SI INSTRUMENTE PENTRU PRELUCRAREA LIMBII ROMANE, Iaşi, noiembrie 2006, pp. 107–112. Editura Universităţii, Alexandru Ioan Cuza (2006) 3. Apopei, V., Rotaru, F., Bejinariu, S., Olariu, F.: Electronic linguistic atlases. In: IKE 2003, The 2003 International Conference on Information and Knowledge Engineering (2003 International Joint Conference on CS and CE - Las Vegas, pp. 628–633, June 2003 4. Botosineanu, L., Olariu, F., Bejinariu, S.: Un projet d’informatisation dans la cartographie linguistique roumaine: Noul Atlas lingvistic român, pe regiuni. Moldova şi Bucovina en format électronique, (e-NALR) – réalisations et perspectives, XXVIe Congrès International de Linguistique et de Philologie Romanes, 26 CILFR, València, 6–11 September 2010, pp. 456–457 (2010)
152
I. Păvăloi and A. Ignat
5. Olariu, F.-T., Olariu, V., Bejinariu, S., Apopei, V.: Los atlas lingüísticos rumanos: entre manuscrito y formato electrónico. Revista española de lingüística 37, 215–246 (2007) 6. Research Institute for Artificial Intelligence “Mihai Drăgănescu”. www.racai.ro/speech-tools 7. International Phonetic Alphabet. https://en.wikipedia.org/wiki/International_Phonetic_ Alphabet 8. Any Video Converter (AVC). http://www.any-video-converter.com/download-avc-free.php 9. Boersma, P., Van Heuven, V.: Speak and unSpeak with PRAAT. Glot Int. 5(9–10), 341–347 (2001) 10. Hidden Markov Model Toolkit. https://www.htk.eng.cam.ac.uk/ 11. Teodorescu, H.-N., Păvăloi, I., Feraru, M.: Methodological aspects of data organization and statistical analysis of the emotional voice. In: Iftene, A., Teodorescu, H.-N., Cristea, D., Tufiş, D. (eds.) Proceedings of ConsILR 2010, Bucharest, Romania, May 2010, pp. 35–44. Universităţii Alexandru Ioan Cuza din, Iaşi (2010) 12. Petr, D.: Experiments with speaker recognition using GMM, speechlab. Department of Electronics & Signal Processing, Technical University of Liberec, Hálkova, Internal report (2002) 13. Ingale, A.B., Chaudhari, D.S.: Speech emotion recognition using hidden Markov model and support vector machine. Int. J. Adv. Eng. Res. Stud. 1(3), 316–318 (2012) 14. New, T.L., Wei, F.S., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Elsevier Speech Commun. J. 41(4), 603–623 (2003) 15. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New Jersey (2012) 16. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000) 17. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines, version 2.3. (2001) 18. Păvăloi, I., Ciobanu, A., Luca, M., Muscă, E., Barbu, T., Ignat, A.: A study on automatic recognition of positive and negative emotions in speech. In: 18th International Conference on System Theory, Control and Computing, Sinaia, Romania, 17–19 October 2014 19. Păvăloi, I., Muscă, E., Rotaru, F., Bolea, S.C.: Acoustic analysis methodology on Romanian language vowels for different emotional states. In: ISSCS, 11–12 July 2013, Iaşi, Romania (2013)
Experiments on Iris Recognition Using Partially Occluded Images Ioan Păvăloi1 and Anca Ignat2(&) 1 2
Institute of Computer Science, Romanian Academy Iaşi Branch, Iaşi, Romania Faculty of Computer Science, University “Alexandru Ioan Cuza” of Iași, Iaşi, Romania [email protected]
Abstract. In this paper we present some experiments we have made on iris recognition and retrieval on partial occluded images using color features. Our experiments were done on two known iris databases, UBIRIS and UPOL, on HSV color space, using as classifier k-NN (k-Nearest Neighbors). For the k-NN method, three classic measures, Canberra, Euclidean and Manhattan, and two new measures were used in our work. In experiments performed in image retrieval we were interested to see if the color indexing methods we have tested can be used as a simple, fast filter, whose output can be then processed by methods more computationally expensive, with new color or texture indexing methods that can improve the robustness of finding iris images with a similar occluded one. Keywords: Iris identification Color indexing Color features Occluded iris images K-NN
1 Introduction Nowadays, a common way for person authentication in biometric recognition is iris identification, this being considered one of the most stable and suitable biometric technologies. More, there is a large interest on using iris recognition on smart phones and tablets, to authenticate a person. In [1–4] we explored image recognition and retrieval using color features and statistics on pixel position. Experiments we have made in image classification and retrieval using statistics on pixels position on Corel 1000 database [4, 5] have shown that this approach gave better results than other recent methods (Local Tetra Pattern [6]). Using average precision obtained by retrieving 20 images from the database as means of comparison, our proposed method yielded a very good result of 0.69, taking into consideration the results obtained in [7] with Image based SIFT–LBP features (0.66) [7], Patch based SIFT-LBP features (0.56), or with the Histogram Based method proposed in [8] (0.36), or Color, Shape and Texture based features from [9] (0.55), and being very closed to Directional Binary Patterns and Oriented Gradients Histogram method (0.71) [10]. Taking into account these results, we decided to experiment image recognition and retrieval on partially occluded iris images using color features and statistics on pixels position. We did not address in this work the problem of iris © Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 153–173, 2021. https://doi.org/10.1007/978-3-030-52190-5_11
154
I. Păvăloi and A. Ignat
segmentation. The problem of occluded iris images was also approached in [11]. The authors propose two segmentation methods and a wavelet-based feature extraction procedure suitable for infra-red iris images. They test their methods on CASIA dataset. In this paper we present the results of some of our experiments using two public iris databases (UBIRIS and UPOL), and as features the above-mentioned combination of color and statistics on pixel position characteristics. The first problem was deciding if these features are also suitable for iris images with missing information. We tested if the recognition rate is directly proportional with the size of the occluded part. Another issue we addressed was if the location of the missing information region/regions influence the classification/retrieval results (assuming we have occluded parts of the same size). How important is the continuity of the occluded area (which is better one connected part or several connected fragments)? We also studied the effect of the shape of the occluded area on the recognition rate.
2 Databases We have done experiments on two databases, UBIRIS and UPOL. The UBIRIS database [12–14] we are using in our experiments is part of the UBIRISv1 database and contains images with the same dimension of 360 100 pixels (see Fig. 1) for 241 persons, 5 images for each one, the images being automatically segmented [15]. The UPOL database [16–18] contains images of high quality having a better resolution (576 768) that the previous used dataset. The dataset contains images for 64 persons, three images for the left eye and three for the right eye. For UPOL we have considered two datasets: – the original unsegmented images (see Fig. 2); – the manually segmented images; Using the bright marking form the middle of the pupil, we segmented uniformly all the images in order to obtain an annulus of the same size with iris information, for all images in the database (see Fig. 3); the images have 404 404 size, the radius of the annulus is 120. In this paper we’ll present the results obtained in experiments made on the third UPOL dataset. The results obtained on the second dataset are very similar to those obtained on the third dataset, while the results obtained on the first UPOL dataset are lower. In all our experiments the classes are defined by iris images belonging to the same person.
Fig. 1. Examples of images in the UBIRIS database
Experiments on Iris Recognition Using Partially Occluded Images
155
Fig. 2. Examples of images in the unsegmented UPOL database.
Fig. 3. Examples of images in the segmented UPOL database.
3 Feature Extraction In the Swain and Ballard color indexing scheme [19], considering HSV color space as a cube with the side of 256 units, we can divide each channel (H, S and V) into “n” equally spaced intervals and we get m = n n n bins. So, a bin is defined as a region of colors, being a cube with a side of 256/n units. Counting for each bin, the number of pixels from the image that fall into it, we’ll get a vector with “m” features. Comparing two images really means to “compare” the two feature vectors, so they should have the same number of partitions and the same image size. The normalization of the feature values is required if the images are not of the same size. For a division m ¼ n n n, a dataset of feature vectors is obtained by generating a feature vector for each image in the database. In [9, 19–21] we extended the Swain and Ballard color indexing scheme by also taking into consideration features related to the position of the color pixels in each bin or in the image, thus we obtained better results on image retrieval and recognition. We have considered, using beside the features generated by the classical color indexing above mentioned scheme, one of the next features related to the position of the pixels, as we did in [9, 19–21]: – the position of pixels in each bin, computing the average for all pixels in each bin for each channel – the position of pixels in each image, computing the standard deviation for pixels in image for each channel for all pixels in each bin – the position of pixels in each bin and compute the standard deviation for all pixels in each bin for each channel – the position of pixels in each image and compute the standard deviation (for each channel value) for pixels in each bin
156
I. Păvăloi and A. Ignat
So, for example, considering the position of pixels in each bin and standard deviation for each channel, we shall get three vectors, in an “m” dimensional space, m ¼ n n n: • ðP1 ; P2 ; P3 . . .; Pm Þ, • ðQ1 ; Q2 ; Q3 . . .; Qm Þ, • ðR1 ; R2 ; R3 . . .; Rm Þ, where ðPi ; Qi ; Ri Þ are the standard deviation values for each color component, of pixels in the “i-th” bin. So, Pi is the standard deviation for H channel values for pixels in the “ith” bin, Qi is the standard deviation for S cannel values and Ri is the standard deviation for V channel values for pixels the “i-th” bin. As a result of this, a feature vector for an image is obtained by concatenating the features generated using the Swain and Ballard color indexing scheme with the three m-dimensional vectors described above: X = ðX1 ; X2 ; . . .; Xm ; P1 ; P2 ; . . .; Pm ; Q1 ; Q2 ; . . .; Qm ; R1 ; R2 ; . . .; Rm Þ In this paper we’ll use SB_CIS notation for the set of feature vectors generated using Swain and Ballard color criteria. Using the CIS feature vectors set and one of the four set of features described above we’ll get other four sets of feature vectors we have denoted as AVG_BIN, AVG_IMG, STD_BIN and STD_IMG.
4 Classifiers, Measures We have considered two images as being similar if they belong to the same class. In our experiments we have chosen to use the well-known k-NN method. We made computations only with k = 1 and k = 3 because the small number of images in each class. The results obtained with k = 3 were lower than those obtained with k = 1. In this work we present results obtained for k = 1. To estimate the error rate in the usage of k-NN we used a validation technique named Leave-one-out cross validation (LOOCV), in which, considering the set of feature vectors, for a test is used only a single vector, all the other vectors being used as a training set and the procedure is repeated, each vector of the dataset being used for testing exactly once. So, averaging the errors for all the runs provides the classification error rate. This type of validation gives a more accurate estimate of a classifier’s error rate. Three measures have been selected to compute differences between feature vectors for k–NN method, Manhattan, Euclidean, and Canberra. We also defined [21] other two measures, a “Manhattan like” measure dML (1) and an “Euclidean like” one dEL (2). dML ðx; yÞ ¼
m X
jxi yi j ðjPi Xi j þ jQi Yi j þ jRi Zi jÞ
ð1Þ
i¼1
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi m X dEL ðx; yÞ ¼ jxi yi j2 jPi Xi j2 þ jQi Yi j2 þ jRi Zi j2 i¼1
ð2Þ
Experiments on Iris Recognition Using Partially Occluded Images
157
where the vector X ¼ ðx1 ; x2 ; . . .; xm ; P1 ; P2 ; . . .; Pm ; Q1 ; Q2 ; . . .; Qm ; R1 ; R2 ; . . .; Rm Þ and the vector Y ¼ ðy1 ; y2 ; . . .; ym ; X1 ; X2 ; . . .; Pm ; Y1 ; Y2 ; . . .; Ym ; Z1 ; Z2 ; . . .; Zm Þ are two image feature vectors from one dataset. The process of testing used in our experiments is the following: for each iris image we generate an image with occluded parts using one of the methods described below. We’ll consider the two sets of images, the occluded one and the original from where they come from. We’ll take one by one all images from the occluded set as being the test image. For each test image we’ll consider for training all the original (full) iris images from which we exclude the image from which the occluded came from and we classify the test image. Doing this for all occluded images we’ll obtain a score for classification. So, it is very important to notice that the full information image to which the occluded belongs to is never used in the training set when the corresponding occluded image is used as test image. In experiments made in image retrieval we were interested to see, considering the set of occluded images and the set of full iris images, how many occluded irises images have a similar one in a number of retrieved irises. In this way, this color methods can be used as a simple, fast filter, whose output can be then processed by methods more computationally expensive. In this context, new color or texture indexing methods can be added in a second step of image retrieval thus improving the robustness of image retrieval.
5 Results In this paper we made computations to respond to the following questions related to particular use of color features for iris images: – does the recognition rate vary directly proportional with the dimension of the occluded part of the image? – in the case of having one region or several regions with information from an iris image, how important is the location in the original image from where the parts came from? If two regions have the same dimension and come from different locations in the iris image, are the result in recognition the same? – does the form of the region matter? For two parts of the same size but with different shape are the result in recognition the same? – what is more important, related to a particular size, to have a single compact piece or more small pieces? – are the spatial features we used with success in image retrieval and recognition useful to use in solving the above stated questions and in partially occluded iris images recognition and retrieval? We have done experiments on three color spaces, HSV, LAB and RGB. The best results we obtain on the HSV color space. On RGB we get results close enough to HSV color space and on LAB we obtained the lowest results. Considering measures, the best results we obtained using Manhattan and corresponding “Manhattan like” measure. The results obtained using the Euclidean distances are a little bit smaller than those obtained with the Manhattan measure [1–4]. Canberra measure gave the smallest
158
I. Păvăloi and A. Ignat
results for almost all the experiments we made. In this paper we’ll present some results obtained on HSV color space using 7 7 7 division and Manhattan and “Manhattan like” measures (Manhattan for SB_CIS set of feature vectors and “Manhattan like” for the other datasets). As we already mentioned, the experiments were done on five feature datasets described above, SB_CIS, AVG_BIN, AVG_IMG, STD_BIN and STD_IMG using k– NN classifier with five measures, except for SB_CIS where we used three measures. The results obtained using full images for UPOL and UBIRIS in recognition and retrieval are presented in the next table (Table 1): Table 1. UPOL and UBIRIS results on recognition and retrieval UPOL recognition UPOL retrieval UBIRIS recognition UBIRIS retrieval
5.1
SB_CIS 89.32 98.70 94.11 97.93
AVG_BIN 96.35 99.74 95.44 97.51
AVG_IMG 97.40 98.96 96.18 98.09
STD_BIN 89.32 99.22 95.52 98.09
STD_IMG 96.61 99.22 97.43 98.42
Experiments Made on Parts of the Iris Image with Different Size
We first considered cutting out the annular images from UPOL database in a circular way, with angle h = 36° starting from the north point, thus achieving missing parts starting from 10% to 90% of the total region with iris information (see Fig. 4). The results for this type of iris images with missing information are in Table 2.
Fig. 4. UPOL iris image circular cutouts examples
Experiments on Iris Recognition Using Partially Occluded Images
159
Table 2. Recognition on UPOL dataset using 10% to 90% of image parts SB_CIS 10% 21.35 20% 35.16 30% 45.05 40% 53.65 50% 66.15 60% 72.40 70% 76.82 80% 82.29 90% 86.46 Average 59.93
AVG_BIN 32.03 41.15 56.77 70.57 81.25 83.59 87.76 91.15 94.01 70.92
AVG_IMG 22.40 33.33 44.79 54.43 72.92 79.95 82.81 92.19 95.31 64.24
STD_BIN 25.26 32.55 46.61 58.07 74.74 77.34 80.99 86.72 89.32 63.51
STD_IMG 23.18 33.85 44.27 52.86 70.83 77.60 85.42 93.75 95.57 64.15
The best average result is obtained for AVG_BIN. As can be easily seen, there are cases (10% for example) in which using spatial features (AVG_BIN) we increase the recognition rate obtained using SB_CIS feature vectors dataset with 50%. If we consider the image retrieval problem and count how many similar images with the occluded one can be found in the first 10 retrieved images (using the Manhattan like distance and the same five sets of features vectors) we get the result from Table 3. Taking into account the above computations, if the size of the occluded part is decreasing, we expect the number of elements that have a similar one in the first 10 retrieved images will increase. In the next table we’ll show the results up to 50% of the dimension size. In the last row, for a comparison, are the results obtained in retrieval for full size original images.
Table 3. Retrieval results for top 10 images on UPOL database using 10% to 100% occlusion 10% 20% 30% 40% 50% 100%
SB_CIS 58.59 66.93 79.43 91.93 96.35 98.70
AVG_BIN 63.02 75.26 82.29 95.05 97.92 99.74
AVG_IMG 58.07 66.93 80.21 94.01 96.09 98.96
STD_BIN 59.64 75.00 82.81 93.49 97.40 99.22
STD_IMG 56.51 67.45 78.91 93.23 97.40 99.22
From Table 3 we observe that, a image with 40% information of the iris has a similar one in Top 10 images retrieved in more than 90% of the cases and for 50% the results are very closed to those obtained for full images. We have performed similar experiments on UBIRIS database. We made vertical cutouts in order to have 10% to 90% iris information in the occluded images (see Fig. 5). The recognition results are in Table 4.
160
I. Păvăloi and A. Ignat
Fig. 5. UBIRIS iris image vertical cutouts examples
Table 4. Recognition on UBIRIS database using 10% to 100% of image parts SB_CIS 10% 20.91 20% 25.15 30% 46.22 40% 62.57 50% 69.05 60% 70.21 70% 75.60 80% 86.14 90% 93.36 Average 61.02
AVG_BIN 24.56 33.94 59.42 68.38 69.38 70.71 74.77 90.46 95.10 65.19
AVG_IMG 24.81 28.71 50.04 63.65 67.22 68.38 74.94 88.80 95.60 62.46
STD_BIN 25.73 32.03 58.01 66.80 70.54 72.20 76.35 90.37 94.85 65.21
STD_IMG 17.26 23.65 43.15 63.32 73.53 78.84 85.98 92.28 95.44 63.72
Considering the Top 10 retrieval problem, we obtained the results from Table 5. As can be seen, parts of 60% of the images gave a very good result of 93.69%, comparing with 98.42% obtained for full images. Table 5. Results for first ten retrieved images on UBIRIS database 10% 20% 30% 40% 50% 60% 70% 80% 90% Average
SB_CIS 40.58 49.54 74.19 85.31 88.71 88.38 93.20 97.26 97.84 79.45
AVG_BIN 47.47 58.42 79.59 82.99 83.73 85.56 88.88 96.43 97.59 80.07
AVG_IMG 45.98 55.68 76.76 84.23 86.81 88.55 93.11 97.93 98.17 80.80
STD_BIN 52.37 63.98 82.57 86.31 86.97 88.22 92.03 97.93 97.93 83.15
STD_IMG 35.02 44.81 72.61 86.81 90.79 93.69 96.51 97.93 98.09 79.58
Experiments on Iris Recognition Using Partially Occluded Images
161
As can be seen, using parts of 30% of the original images, more than 70% have a similar one in first 10 retrieved, and even 80% for STD_BIN feature dataset and for parts of 70% of original images the obtained results are very closed to those obtained for full images. Studying the so far obtained results, we can think that having iris information regions with greater area will provide better results, but really this is not true, and we’ll show that in the following computations. 5.2
Variations in Recognition Rates Depending on the Position of the Iris Image Parts
In this section we study the variations in recognition rates depending on the position of the iris image parts. For testing the influence of the location, we use several types of cutouts. For UPOL we eliminated successively iris parts from the exterior or interior of the annular iris image (see Fig. 6). We considered cutouts of 50% to 90% of the iris information.
Fig. 6. Cutouts from the interior or exterior part of UPOL images
The recognition rates for this type of occluded images are in Table 6.
162
I. Păvăloi and A. Ignat Table 6. Results for interior/exterior cutouts on UPOL 50 ext 50 int 60 ext 60 int 70 ext 70 int 80 ext 80 int 90 ext 90 int 100%
SB_CIS 57.03 42.45 63.28 57.81 73.70 75.00 82.03 78.65 88.54 84.90 89.32
AVG_BIN 72.92 55.47 80.73 69.53 87.24 80.99 93.23 89.32 95.57 94.79 96.35
AVG_IMG 77.08 58.85 84.64 75.78 91.15 84.11 94.27 91.15 96.88 94.53 97.40
STD_BIN 62.76 50.26 74.48 67.97 79.43 75.78 85.42 85.16 89.06 90.36 89.32
STD_IMG 65.63 46.09 80.73 64.84 89.84 76.30 94.53 85.94 95.57 93.75 96.61
As for previously occluded images, a good result is obtained using AVG_BIN features dataset, but the best average result of 84.844 is obtained using AVG_IMG features dataset. In Table 7 are the results of image retrieval for this set of occluded images (as usual, we computed the top 10 precision). Table 7. Retrieval top 10 results for exterior/interior cutouts in UPOL 50 ext 50 int 60 ext 60 int 100%
SB_CIS 91.41 81.51 95.05 90.37 98.70
AVG_BIN 95.31 84.90 96.88 93.23 99.74
AVG_IMG 96.09 88.54 97.92 95.57 98.96
STD_BIN 94.27 84.38 96.61 92.45 99.22
STD_IMG 95.31 81.77 97.66 91.15 99.22
Note that, 60% of the image from exterior gave results near to 98% in finding a similar one in first 10 retrieved images, which is a very good result. Next, we considered image recognition and retrieval for UPOL half part images, up/down and left/right (Fig. 7).
Fig. 7. Right, left, up down half parts of UPOL images
Experiments on Iris Recognition Using Partially Occluded Images
163
The recognition results for half parts in UPOL database are in Table 8. Table 8. Recognition rates for UPOL half part images SB_CIS Left 66.41 Right 62.76 Up 45.31 Down 48.18
AVG_BIN 81.77 79.95 64.84 63.28
AVG_IMG 72.40 67.45 46.35 55.99
STD_BIN 74.22 69.53 53.39 60.16
STD_IMG 71.35 69.27 53.65 55.73
It can be observed the differences between Left/Right results and Up/Down results, first being better in average with more than 30%. In Table 9 are the retrieval results for these types of missing information images (precision for top 10). Table 9. Precision values for UPOL half part images Left Right Up Down
SB_CIS 96.35 94.01 80.73 85.42
AVG_BIN 97.92 97.66 86.98 90.36
AVG_IMG 96.09 96.35 80.73 90.10
STD_BIN 97.66 97.14 87.50 92.19
STD_IMG 97.40 95.83 84.90 91.93
We consider successively extracting 10% to 100% from the half upper part and the lower part separately (this mean between 5% to 50% of the full image, Fig. 8).
Fig. 8. Cutouts from UBIRIS images – upper and lower parts separately
The recognition results for the upper part are in Table 10 and for the lower part are in Table 11.
164
I. Păvăloi and A. Ignat Table 10. Recognition rates for 50% interior iris part (upper regions) – UBIRIS 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
SB_CIS 12.86 16.10 14.11 12.20 12.70 13.03 13.03 13.44 13.86 14.52
AVG_BIN 10.79 13.44 12.53 13.44 14.61 14.77 14.27 14.69 15.27 15.44
AVG_IMG 12.86 15.68 12.37 11.95 13.11 13.36 14.11 14.85 16.43 18.76
STD_BIN 11.37 13.03 13.44 13.94 14.44 14.52 14.94 15.60 16.02 16.27
STD_IMG 11.87 14.36 13.03 11.95 13.28 13.61 13.94 13.78 14.11 17.01
Table 11. Recognition rates for exterior iris part (lower regions) – UBIRIS 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
SB_CIS 03.98 05.81 07.55 08.63 10.79 12.28 13.44 17.68 20.00 19.25
AVG_BIN 03.32 04.65 12.45 19.09 23.90 26.80 29.05 40.91 51.37 50.04
AVG_IMG 04.48 06.31 07.97 11.62 15.60 17.10 20.66 30.95 39.50 39.92
STD_BIN 04.40 04.90 11.37 18.59 22.82 26.22 30.04 42.49 52.78 51.20
STD_IMG 03.57 04.40 06.06 07.97 08.80 09.46 12.37 18.92 27.05 33.11
As can be easily observed, the lower part of iris images gave better results than the upper part. 80% of the upper region of the image (that means 40% of the full image) gave recognition rate of only 42.49% but this result is much better than the corresponding result of 15.6% obtained for the same size on the upper region of the iris images. These results show that regions located only in the interior or exterior part of an iris image don’t give good recognition results using this technique. It is better if the areas contain information from both interior and exterior regions of an iris image, as we shall show in our next experiments. In Table 12 are the top 10 retrieval rates for images with 50% information from exterior and interior parts, respectively. For the exterior region the best result is 77.01%, and for the interior the best precision is only 30.29%.
Experiments on Iris Recognition Using Partially Occluded Images
165
Table 12. 10 retrieved images results for 100% interior and exterior iris parts (upper and lower regions) SB_CIS AVG_BIN AVG_IMG STD_BIN STD_IMG 100% interior 30.29 26.31 29.29 29.38 26.89 100% exterior 40.50 76.85 68.63 77.01 61.91
Let’s consider for UBIRIS images, nonoverlapping vertical regions with 10% iris information, as can be seen in Fig. 9. In Table 13 are the results of the recognition rates considering occluded images with 10% iris information, but the position of these 10% regions changes as explained above. As can be easily seen from Table 13, even though the parts are of the same size, there is a great difference in recognition rate, from 15.10% to 37.84%. The best average result was obtained with AVG_IMG features.
Fig. 9. Vertical cutouts for UBIRIS images - 10%
Table 13. Recognition rates for occluded images with 10% information - UBIRIS 1 2 3 4 5 6 7 8 9 10 Average
SB_CIS 20.91 29.13 38.67 32.37 30.54 24.15 27.05 18.42 12.20 21.58 25.50
AVG_BIN 24.56 25.89 44.07 36.60 21.66 19.00 15.44 14.69 17.26 23.90 24.31
AVG_IMG 24.81 30.62 42.41 37.84 29.88 25.31 27.55 21.41 15.10 23.90 27.88
STD_BIN 25.73 28.96 40.50 33.20 24.98 22.57 22.24 16.93 17.34 24.48 25.69
STD_IMG 17.26 28.05 37.51 30.71 30.12 24.40 27.55 17.68 10.62 19.09 24.30
The similar computations using UPOL database led to the same conclusion, however the differences between recognition rates are not so significant. Comparing with our previous computations, in which iris region belongs only to interior or exterior area of iris, we remark that the average results from Table 13 for 10% size are better than the results for regions of the same size located only in the exterior or interior part of the iris.
166
I. Păvăloi and A. Ignat
We consider cutouts with 5% information but with half region from the upper part and the other half from the lower part as shown in Fig. 10. We performed two types of selections, one with disjoint regions from the upper and lower part and one with a continuous region (see Fig. 10).
Fig. 10. UBIRIS 5% cutouts with lower and upper part information
Table 14. Combined up-down and middle regions iris information results (5%) 10% recognition 5% up-down recognition 5% middle recognition 10% retrieval 5% up-down retrieval 5% middle retrieval
SB_CIS 20.91 19.17 09.13 40.58 36.85 23.15
AVG_BIN 24.56 21.99 11.04 47.47 44.32 26.39
AVG_IMG 24.81 19.42 09.29 45.98 39.34 26.72
STD_BIN 25.73 21.83 09.29 52.37 47.55 28.30
STD_IMG 17.26 10.87 06.06 35.02 28.80 20.66
With this type of cutouts, we studied which part of the iris provides better results, the interior, the exterior or the middle part. The results from Table 14 in both recognition and retrieval situations, combining interior and exterior information yields better results than the middle part of the iris. The recognition rates are twice bigger for the combined interior-exterior regions than the middle zone. We performed tests with this type of cutouts, considering vertical regions extractions from left to right and the results are the same. 5.3
Variations in Recognition Rates Depending on the Shape of the Iris Information Region
For UBIRIS database, we extracted feature vectors from regions with different shapes. We considered rhomb and clepsydra shapes as in Fig. 13. from the same area of the original images (the left part). We obtain the results from Table 15. The dimension of the extracted rhomb and clepsydra shapes are of 5000 units, comparing with 36000 units (100 360) of the UBIRIS iris images, that means these shapes represent about 13.88% of the entire image (Fig. 11).
Experiments on Iris Recognition Using Partially Occluded Images
167
Fig. 11. UBIRIS – cutouts with different shapes Table 15. Recognition rates and top 10 retrieval for rhomb and clepsydra shapes RHOMB recognition CLEPSYDRA recognition RHOMB retrieval CLEPSYDRA retrieval
SB_CIS AVG_BIN AVG_IMG STD_BIN STD_IMG Average 19.92 20.41 20.66 21.83 18.84 20.33 35.19 45.31 37.84 42.41 31.04 38.36 40.17 59.09
44.32 73.44
45.06 67.97
48.96 74.69
38.92 55.10
43.49 66.06
In average, for all five types of features, the clepsydra shape gave better recognition and retrieval results. 5.4
The Influence of the Connectivity of the Information Iris Regions on Recognition
For UBIRIS we consider a mask like a chess table mask (dividing every image in rectangles, each side of the entire image was divided in 10 parts and consider from left to right one rectangle with information and the next one without, for each row and column). In Fig. 12 are example of these types of cutouts and in Table 16 the recognition results. The images have 50% iris information.
Fig. 12. UBIRIS chess table masks Table 16. Recognition rates for UBIRIS – chess table information SB_CIS 1YS1-1 93.86 1YS1-2 93.94 AVG_1YS1 93.90
AVG_BIN 92.70 94.61 93.66
AVG_IMG 95.85 95.93 95.89
STD_BIN 93.94 95.52 94.73
STD_IMG 97.10 97.34 97.22
168
I. Păvăloi and A. Ignat
Analogously, we consider the situation depicted in Fig. 13, the modified chess table has one small rectangle with iris information followed by two without. We have three possible cutouts, depending on the starting position of the rectangle with information. We obtain the results from Table 17. The occluded images have 33% iris information.
Fig. 13. UBIRIS – one rectangle with information followed by two without iris information
Table 17. Recognition rates for UBIRIS images - one yes step two 1YS2-1 1YS2-2 1YS2-3 AVG_1YS2
SB_CIS 93.20 93.11 93.28 93.20
AVG_BIN 91.78 83.15 93.11 89.35
AVG_IMG 96.10 94.02 95.60 95.24
STD_BIN 93.53 89.96 94.27 92.59
STD_IMG 96.76 95.60 96.10 96.15
We also computed for the situation with 25% iris information (one small rectangle with information followed by three without). Examples of occluded images of this type are in Fig. 14 and the recognition results are in Table 18.
Fig. 14. UBIRIS – one rectangle with information followed by three without iris information
Table 18. Recognition rates for UBIRIS 1 yes step 3 SB_CIS 1YS3-1 90.46 1YS3-2 91.12 1YS3-3 89.21 1YS3–4 92.37 AVG_1YS3 90.79
AVG_BIN 81.99 83.73 81.33 89.46 84.13
AVG_IMG 93.61 93.03 91.37 94.94 93.24
STD_BIN 89.54 89.88 84.98 93.03 89.36
STD_IMG 95.19 94.85 89.88 95.10 93.76
Experiments on Iris Recognition Using Partially Occluded Images
169
In all these chess table type masks, the recognition results are very good, the features STD_IMG usually providing the best results. We consider also occluded images containing only the diagonal of the chess table (Fig. 15). The recognition results are in Table 19.
Fig. 15. UBIRIS with diagonal cutouts
Table 19. Recognition rates for UBIRIS 1 yes step nine SB_CIS AVG_BIN AVG_IMG STD_BIN STD_IMG Diagonal 55.19 51.87 62.32 56.18 49.29
Similar to the chess table masks we consider instead of small rectangles pixels. Thus we build occluded images starting from the original one with one pixel with information followed by s pixels without information (s = 1,2,…,10). We applied this procedure to UBIRIS images. The recognition results are in Table 20. Table 20. UBIRIS - Recognition rates for separated pixels 1Y1N 1Y3N 1Y4N 1Y9N
SB_CIS 94.27 93.11 92.95 87.47
AVG_BIN 94.19 90.95 90.54 69.63
AVG_IMG 96.02 95.44 95.52 93.03
STD_BIN 95.44 94.44 92.37 81.49
STD_IMG 97.18 96.93 96.60 90.46
The results are very good. Analyzing the situation of 1pixel out of 10 pixels with iris information, that is 10% of iris information, we obtain an excellent 93.03% best recognition rate. 5.5
Are the Spatial Features Useful in Recognition and Retrieval with Partially Occluded Iris Images?
In all previous tests, the occluded image has the same size with the original images, with complete information. In this section we tested what’s happening if we extract a part from an image that has a different size than the other iris images. We have done some tests in which we extract a part of an image and we resize it. For example, we extract the middle 50% part from images of the UBIRIS database and we resize it to be
170
I. Păvăloi and A. Ignat
the same size as all images from UBIRIS database. Consider missing information images that contain a quarter information of the original ones cropped from the center of the images. We tested two types of interpolation methods that are usually employed when an image is resized, namely, nearest neighbor and bicubic. The recognition rates are in the next Table 21, the best result (30.54%) being obtained using the AVG_IMG feature vectors. In Table 22 are the obtained retrieval scores. Table 21. Recognition rates – resize case SB_CIS 25% middle part 23.73 Bicubic interpolation 23.57 Nearest neighbor 23.73
AVG_BIN 23.32 23.82 23.32
AVG_IMG 27.80 30.54 30.46
STD_BIN 23.98 24.32 23.98
STD_IMG 24.23 23.73 23.65
Table 22. Retrieval rates - resize case SB_CIS 25% middle part 50.79 Bicubic interpolation 50.12 Nearest neighbor 50.79
AVG_BIN 45.39 46.81 45.39
AVG_IMG 55.02 58.92 58.34
STD_BIN 52.20 52.20 52.20
STD_IMG 51.37 47.63 48.46
As can be seen from Table 21 and 22, the results are very similar, are even identical in many cases. So, we can conclude that the usage of these color features is not influenced by the size of the image from where the part come from.
6 Conclusions, Future Works Regarding the goals we have pursued in these experiments, using color features and the above described methods, we can make some remarks. Sure, it seems that the size of the region with iris information is very important but, surprisingly, in our experiments we found a lot of cases in which small parts gave superior results that parts 3-4 times larger. For example, for UBIRIS images, comparing the 45.31% (Table 15) recognition rate result obtained for clepsydra shape, which has 13.88% iris information, with the result 12.53% (Table 10), obtained for a 30% region from the upper part (that means 15% of the original image), the first result is more than 3.6 times better than the second result. Moreover, considering the whole upper (interior) part, the recognition result obtained using AVG_BIN dataset, 15.44% (Table 10) is 3 times smaller. So, a part three times smaller gave results three times greater. Considering the result obtained for images where the information was one of ten pixels, which means 10% of the image gave 93.03% in recognition rate, an amazing result.
Experiments on Iris Recognition Using Partially Occluded Images
171
If we consider the location in the iris images from where the information came, considering 5% of the image, left upper in UBIRIS database, and the 5% lower part, for the two parts of the same size we obtain in recognition rate 12.86% (Table 10) and 3.98% (Table 11), respectively. Regarding the location of the regions with iris information, as a rule we can conclude that is more important to have iris information from the interior area than a region with iris information, of the same dimension, situated in the exterior part. The ideal is a region that covers both the interior and the exterior parts of the iris images. Not only the location it is very important but, as it was shown, the shape of the region is very important, as we have shown for rhomb and clepsydra shaped regions with iris information. All the computations suggest that is better to have several smaller regions than a compact one. Surprising was the results obtained for the images that contains only points, one from each ten points, that gave a recognition result of 93.03% for AVG_IMG feature dataset. The recognition rate does not vary proportionally with the size of the occluded image. The results obtained using statistics on spatial coordinates gave usually gave better results, in some situations even 2.5 times better as for the down part (Table 11), 50.04% obtained for AVG_BIN dataset comparing with 19.25% in case of using feature vectors without statistics features. As can be easily seen regarding the results obtained both for UPOL and UBIRIS, the recognition rates for the exterior area of the irises are better, but it is preferable to have regions from the upper and the lower part of the iris. A final remark, for UPOL images, considering for example, regions containing 10% information, we get better results than those obtained for UBIRIS database, because UPOL has a better resolution. Regarding the retrieval results we consider that are very good. The results are better using spatial coordinates and “Manhattan like” measure. Sure, in some cases AVG_BIN seems to be a better choice, in other cases AVG_IMG, and also STD_IMG. There is no impediment to use feature vectors with all color features. We have to distinguish between two cases, an iris image partially occluded and one or more pieces (parts) of and iris image for which we don’t know from what part of an iris image come (we cannot locate it). In first case seems that that AVG_IMG and STD_IMG are more useful, in the second case AVG_BIN and STD_BIN. We are continuing to do new experiments. We have generated SIFT features [20] and we get very good result in iris recognition for full images using color features and SIFT features. Now we try to use SIFT features for extracted parts we try to locate the position in the iris images from where come (and also the rotated angle). The first results are positive and the next intention is to mix color features with SIFT also in recognition process for occluded images. In next steps we’ll use this extension criterion on optimal bin boundaries for the HSV color space [21]. Concerning a CBIR system, one advantage of this simple global color extended criterion is that it can be used as a preliminary step in order to reduce the search space and then apply a finer method for image retrieval. We shall consider for future experiments not only iris images databases, but also skin disease images databases.
172
I. Păvăloi and A. Ignat
Sure, for practical reasons it is better to also take the texture into consideration and that will be one of our future work.
References 1. Păvăloi, I., Ignat, A.: A simple global color criterion for image retrieval. In: 13th International Conference on Development and Application Systems, Suceava, Romania, 19– 21 May 2016 (2016) 2. Păvăloi, I.: Iris recognition using spatial color indexing. Buletinul Institutului Politehnic, Iaşi, Tomul LX (LXIV), Fasc. 1, 35–49 (2016) 3. Păvăloi, I., Ignat, A.: Iris recognition using statistics on pixel position. In: 2017 E-Health and Bioengineering Conference (EHB), Sinaia, pp. 422–425 (2017) 4. Păvăloi, I., Nita, C.D.: Experiments on image classification and retrieval using statistics on pixels position. In: 2017 International Symposium on Signals, Circuits and Systems (ISSCS), Iaşi, pp. 1–4 (2017) 5. Corel 1000 Database. http://wang.ist.psu.edu/docs/related.shtml 6. Narayankar, N., Dhaygude, S.: Texture extraction for image retrieval using local tetra pattern. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 3(7) (2014) 7. Yuan, X., Yu, J., Qin, Z., Wan, T.: A SIFT-LBP image retrieval model based on bag-offeatures. In: 18th IEEE International Conference on Image Processing (ICIP-2011), pp. 1061–1164 (2011) 8. Rubner, Y., Guibas, L.J., Tomasi, C.: The earth mover’s distance, multi-dimensional scaling and color-based image retrieval. In: Proceedings of DARPA Image Understanding Workshop, pp. 661–668 (1997) 9. Hiremath, P.S., Pujari, J.: Content based image retrieval using color, texture and shape features. In: International Conference on Advanced Computing and Communications (ADCOM 2007), pp. 780–784 (2007) 10. Nagaraja, S., Prabhakar, C.J.: Low-level features for image retrieval based on extraction of directional binary patterns and its oriented gradients histogram. Comput. Appl. Int. J. (CAIJ) 2(1), 13–28 (2015) 11. Poursaberi, A., Araabi, B.N.: Iris recognition for partially occluded images: methodology and sensitivity analysis. EURASIP J. Appl. Sig. Process. 2007, 12 (2007) 12. Proença, H., Alexandre, L.A.: UBIRIS - Noisy Visible Wavelength Iris Image Databases (2004). http://iris.di.ubi.pt/ubiris1.html 13. Proenca, H.: Iris recognition in the visible wavelength. In: Burge, M.J., Bowyer, K.W. (eds.): Chapter 8 in Handbook of Iris Recognition, pp. 151–171. Springer (2013) 14. Proença, H., Alexandre, V.: UBIRIS: a noisy iris image database. In: Roli, F., Vitulano, S. (eds.) Image Analysis and Processing – ICIAP 2005, vol. 3617, pp. 970–977. Springer, Heidelberg (2005) 15. Radu, P., Sirlantzis, K., Howells, W.G.J., Hoque, S., Deravi, F.: A versatile iris segmentation algorithm. In: BIOSIG 2011, Darmstadt, Germany (2011) 16. Dobeš, M., Martinek, J., Skoupil, D., Dobešová, Z., Pospíšil, J.: Human eye localization using the modified hough transform. Optik 117(10), 468–473 (2006) 17. Dobeš, M., Machala, L., Tichavský, P., Pospíšil, J.: Human eye iris recognition using the mutual information. Optik 115(9), 399–405 (2004) 18. Dobeš, M., Machala, L.: Iris Database. http://www.inf.upol.cz/iris/ 19. Swain, M.J., Ballard, D.H.: Color indexing. Int. J. Comput. Vision 7(1), 11–32 (1991)
Experiments on Iris Recognition Using Partially Occluded Images
173
20. Bejinariu, S., Costin, M., Ciobanu, A., Cojocaru, S.: Similarities identification in logo images. In: International Workshop on Intelligent Information Systems, IIS 2013, Chişinău, Republica Moldova, 20–23 August 2013, pp. 53–59 (2013) 21. Ciobanu, A., Luca, M., Păvăloi, I., Barbu, T.: Iris identification based on optimized lab histograms applied to iris partitions. Buletinul Institutului Politehnic, Iaşi, Tomul LX (LXIV), Fasc. 1 (2014)
Feature Extraction Techniques for Hyperspectral Images Classification Asma Fejjari1(&), Karim Saheb Ettabaa2, and Ouajdi Korbaa1 1
2
MARS (Modeling of Automated Reasoning Systems) Research Laboratory, ISITCom, University of Sousse, 4011 Hammam Sousse, Tunisia [email protected], [email protected] IMT Atlantique, ITI Department, Telecom Bretagne, 655 Street of Technopôle, 29200 Plouzané, France [email protected]
Abstract. Recently, several feature extraction techniques have been exploited to resolve the hyperspectral dimension reduction issue. Feature extraction methods can be widely categorized as either linear or nonlinear methods. In this paper, we are interested to present and assess the two families’ yields, during hyperspectral classification tasks. We empirically compare the most popular feature extraction approaches based on classification accuracies and speed output. The tests are performed on two real hyperspectral images (HSIs). Experiment results show that nonlinear techniques provide better classification results compared to linear methods. However, linear approaches require less computing load. Keywords: Feature extraction techniques Classification Hyperspectral images
Dimensionality reduction
1 Introduction The hyperspectral imagery [1, 2] is interested to measure, analysis and interpret specters acquired from a given scene or a specific object by various platforms (an airborne, a satellite sensor, or a UAV). The same scene is received in multiple (more than 2000 wavelengths), close (a nominal spectral resolution of 10 nm) and contiguous spectral bands for each image pixel. Hence, a significant amount of data is produced. The occurred data poses challenges during processing and storage tasks. Dimension reduction [7, 8] is one of the most efficient solutions which tends to overcome this issue; it can be defined as a finding of a transformation from a data set X 2 ƦD to a new data set Y 2 Ʀd , where d D: The new transformation seeks to preserve the data geometry as much as possible. Various dimensionality reduction approaches have been suggested in the last few years. These techniques can be categorized into two main sections [9–12]: feature extraction techniques and spectral band selection methods. The first section targets to pick the most significant data set on subspaces, while the second class consists of extracting an optimal band set from the original band one, under performance standards. The essential difference between these two sections is whether © Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 174–188, 2021. https://doi.org/10.1007/978-3-030-52190-5_12
Feature Extraction Techniques for Hyperspectral Images Classification
175
a dimensionality reduction method converts or keeps the signification of the initial data set in the reduction process. In this paper, we focus only on feature extraction techniques, also called projection methods or transformation methods. Feature extraction techniques can be defined as a feature selection in the transformed space [1, 13]; they hold two types of projection: linear projection and nonlinear projection. This paper presents and discusses the most known feature extraction techniques (linear and nonlinear), and then evaluates their effectiveness during hyperspectral classification tasks. The rest of the paper is arranged in this way: the second and the third section introduce the most popular linear and nonlinear feature extraction techniques. In Sect. 4, we recap and discuss all methods presented in the two previous sections. The experimental analyses are provided in Sect. 5 and the paper is summarized in Sect. 6.
2 Linear Techniques Linear techniques [8, 9] process data which is on or near a vector subspace of high dimension space. In mathematical terms, linear data projection is a linear combination of the original data according to (1): ybi = ab1 x1i + ab2 x2i þ . . . + abD xDi
ð1Þ
8b 2 f1; 2; . . .; d g, and i 2 f1; 2; . . .; ng. This linear combination can be written in a matrix form as following: Y ¼ xX
ð2Þ
Where X ¼ xij ; 1 j D; 1 i n , Y¼fybi ; 1 b d; 1 i ng, x ¼ abj ; 1 b d, 1 j Dg is the linear projection matrix. D, n and d represent the dimension of the original space, the X’s elements number and the reduced space dimension, respectively. In the last decades, many linear feature extraction approaches have been suggested to reduce data redundancy or to extract specific information. This section, describes the most powerful linear techniques for HSI dimensionality reduction. 2.1
Principal Component Analysis (PCA)
PCA [1, 8] is the most noted unsupervised linear technique in the remote sensing field. In a hyperspectral image, the adjacent bands are highly correlated and frequently transmit the same information. PCA is deployed to transfer the origin variable set to a new lower dimension variable set, by removing the correlation between the bands. The main components (PCs) are uncorrelated and ordered in such a way that the first PCs keep most of the variation offered by the original variables. Mathematically, the PCA tends to find a linear mapping M which maximizes the cost function ðtrace (MT cov (XÞMÞÞ, where cov (XÞ is the data X’s covariance matrix. This linear mapping ðMÞ is composed by the vectors (PCs) of the covariance matrix of zero mean data. PCA presents a good performance in remote sensing applications; furthermore, it
176
A. Fejjari et al.
provides a straightforward extracted feature interpretation. On the other hand, this technique remains a global operation that needs great computational resources and a large memory space. PCA has a complexity of O(D3 ) where D is the input dimensionality. In order to resolve these problems, several variants have been suggested such as Segmented PCA (S-PCA) [14] and Folded-PCA (F-PCA) [15]. 2.2
Locality Preserving Projections (LPP)
LPP [16, 17] is another method for the linear unsupervised dimension reduction, it is based on constructing a neighborhood graph using the Laplacian graph concept, then computing a weighted matrix to map data points to the reduced dimension subspace. Unlike the PCA technique, which tends to preserve the global data structure, LPP seeks to preserve the intrinsic geometry and the local data structure. The LPP algorithm presents good impact and robustness during HSIs classification; it is relatively insensitive to outliers and consumes fewer resources. On the other side, LPP suffers from three main problems: firstly, the adjacency graph is created in advance without taking into account HSIs properties. Consequently, it will be difficult to determine an appropriate neighborhood size to construct the adjacency graph. Secondly, LPP retains only local feature, which affects the classification accuracy. Finally, it suffers from sensitivity to noise. To solve these problems, modified versions of the conventional LPP have been suggested, among them, the local and global geometric structure preserving projection (LGGSP) technique [16], the modified LPP (MLPP) [17] approach and the modified Schroedinger Eigenmap Projections (MSEP) [39] method. The LPP is O(D2 ) complex. 2.3
Multidimensional Scaling (MDS)
The MDS technique [19, 20] is an unsupervised dimensionality reduction approach which uses embedding to look for a transformation, from the high dimensional space to the low dimensional space, while preserving the similarities between data point pairs as faithful as possible. The inputs are mapped into the subspace which preserves better their squared distances. The outputs of MDS are computed from the first eigenvectors of the Gram matrix (the equivalent of the PCA covariance matrix). MDS shows a good performance, it is widely used for multidimensional data processing. Because of high complexity: O(n3 ), where n is the input data set, most of MDS algorithms are only applicable to small images. In order to overcome these limits, the multi-resolution MDS algorithm [21] and the vector quantization (LVQ) approach have been proposed [22].
3 Nonlinear Techniques Unlike linear techniques [8], nonlinear approaches do not rely on the linear hypotheesis, i.e., the initial data belong to nonlinear subspaces, which helps to exploit and process the complex nonlinear properties in hyperspectral data. Nonlinear approaches can keep more the data structure and maintain sufficient local or global information.
Feature Extraction Techniques for Hyperspectral Images Classification
3.1
177
Global Techniques
A global technique can be defined [16, 23] as a method that attempts to preserve the original data geometry at all scales, i.e., the data global geometric properties: near data points projections in the high dimension subspace with the nearest data points in the low dimensional space, and the distant points with the distant points. Global techniques offer more faithful data representation. They include: Isometric Feature Mapping (Isomap). Isomap [24, 25] is a nonlinear unsupervised dimension reduction method that seeks to preserve the intrinsic data geometry by maintaining the geodetic distance between data points as much as possible. Isomap uses the MDS [19] technique with the geodetic distance instead of the Euclidean one. The Isomap global coordinates provide a simple way to examine and process largescale observations. Isomap can find significant global coordinates as well the nonlinear structures which PCA or MDS do not detect. Nevertheless, the Isomap approach suffers from a high computational burden, which increases with the number of pixels. Its O(n3 ) complex, making it relatively slow. This technique suffers from another problem which is the short circuit, occurs when the neighborhood distance is greater than the distance between the manifolds. To deal with these limits, landmark Isomap (LIsomap) and upgraded landmark- Isomap (UL-Isomap) [26] have been suggested. Diffusion Maps (DM). DM [27, 28] is a semi-supervised nonlinear dimensionality reduction technique that based on Markov matrix eigenvalues, as a system coordinate, to obtain an efficient data representation. In the diffusion maps, the kernel is chosen by our prior geometry definition, in contrast to the principal component analysis (PCA) in which the kernel is chosen before the procedure. In PCA, all correlations between the values are considered; while only the high correlation values are taken into account in the DM method. Diffusion maps processes complex hyperspectral data and can discover the underlying datum manifold, but it suffers from computational problem with a O(n3 ) complex. Kernel PCA (KPCA). KPCA [29, 30] is considered as the most popular kernel-based technique, it is the nonlinear approximation of the PCA approach. This technique is more appropriate to describe higher order complex nonlinear distributions. Unlike the PCA, which is based on a function that calculates the distance between points, KPCA deploys the distance between the spectral bands. KPCA offers much better classification accuracy than the traditional PCA. Moreover, it can eliminate the most noise affecting an image processed by the original PCA. Practically, KPCA needs a high computational complexity (O(n3 ) complexity), to extract feature from hyperspectral scenes. To overcome this burden, a fast iterative version of KPCA has been suggested in [30]. AutoEncoder (AE). AE [32, 33] is, an unsupervised nonlinear reduction technique, based on the neural networks [31]; it is composed of one or more hidden layers that project the input functionality onto itself. An AutoEncoder consists of two parts: an encoder and a decoder. The encoder takes an input and projects it to a hidden representation. The latter often has a lower dimensionality which implies that the encoder compresses the information. The decoder is matched with an output layer that has the same size as an input layer, so-called “reconstruction”. The hidden representation can be considered as a compression of the feature containing the most important
178
A. Fejjari et al.
information. AE gives good classification precision, it is approximately O(inw) where w is a number of weights and i represents the number of iterations. Several researches [32, 33] have been suggested to improve the AE method. 3.2
Local Techniques
Local techniques [16, 23] tend to preserve the local data geometry: they seek to project the closest points of the high dimensional representation on the nearest points in the low dimensional representation. Local methods have been shown to be computationally practical, since they involve a sparse matrix computing. From the most well-known local methods, we can mention: Locally Linear Embedding (LLE). LLE [34, 35] is an unsupervised dimension reduction method that seeks to preserve local topological data structures after the projection. Each data point can be constructed from its nearest neighbors through linear coefficients. Thus, a weight matrix is produced, by the neighborhood graph, containing the local topological information of the input data. This matrix also serves to reconstruct the low dimensional embedding. LLE presents a good yield during classification and detection tasks, but it suffers from many problems. Firstly, several free parameters need to be set, which means a prior knowledge of the scene. Secondly, LLE suffers from high computing complexity and memory consumption; it has a practical time complexity O(pn2 ) where p is the ratio of nonzero elements in the weight matrix. Recently, new variants of LLE have been proposed, such as: the robust locally linear Embedding (RLLE) [34] and the improved locally linear Embedding (ILLE) [36], to overcome these problems. Laplacian Eigenmaps (LE). LE [18, 40, 41] is an unsupervised nonlinear reduction approach where a graph represents the original data. The construction of the low dimensional space is made by minimizing the distances between the data point and its nearest neighbors. This is done in a weighted way from eigenvectors generalized from the Laplacian graph matrix. The new reduced space retains the local properties of the original data. Laplacian Eigenmaps is very effective in keeping the mapping structures in which the original data reside. Moreover, LE is robust to aberrant values. On the other hand, the graph construction and the eigenvectors calculation for LE can be prohibitive, especially when the number of pixels increases. This problem has been solved via a simple linear iterative grouping (SLIC) [18]. LE is also O(pn2 ) complex. Hessian LLE (HLLE). HLLE [10, 38] is an unsupervised nonlinear reduction method. It can be considered as an improvement of the LLE technique, in which a Hessian quadratic form replaced the Laplacian quadratic one. This method recovers the inherent structure of dispersed data, by relying on the variety integrated into the large Euclidean space. The HLLE method is derived from a conceptual framework of local isometry in which the variety, considered as a Riemannian sub-variant of the ambient Euclidean space, is locally isometric to an open connected Euclidean space. The coordinates of the lower dimension representation can be reached by implementing a proper analysis of an H matrix of Hessian mapping. The matrix H describes the curvature of the mapping around the data point. The HLLE algorithm is relatively insensitive to neighborhood sizes, but it is rarely used when processing HSIs because it destroys their forms. The HLLE algorithm has a complexity O(pn2 ) too.
Feature Extraction Techniques for Hyperspectral Images Classification
179
Local Tangent Space Analysis (LTSA). LTSA [5, 6] is another nonlinear unsupervised reduction technique, which preserves the local properties of the original data in the low dimensional representation. The local data geometry is described using a local tangent space of each data point. Computing the local tangent space is done by performing the PCA algorithm on the data point’s neighborhoods. The choice of neighborhoods for LTSA is less sensitive compared to the LLE technique, but it suffers empirically from a computational complexity problem (O(pn2 )). Moreover, the LTSA method is unable to generate or implemented new data. To overcome these issues, W. Sun et al. have proposed a new technique called multi-strategy tangent local space alignment [5].
4 Synthesis In this survey, we provided state-of-the-art feature extraction techniques. Table 1 summarizes the presented approaches properties described above. Feature extraction techniques can be tabulated into two main classes: linear and nonlinear techniques. Linear feature extraction methods (PCA, MDS, and LPP), process data found on or near a linear subspace, are considered as the fastest and most simple techniques for HSI dimension reduction tasks [1, 11]; they are also looked as a reversible application which make the data interpretation easier. Linear techniques suffer essentially from three main problems. Firstly, they have a high computational complexity and require large memory spaces. Secondly, they fail to keep the local (PCA) or global (LPP) geometry data which leads to losing some important information. Finally, they are limited by their linear nature, so they do not exploit the nonlinear data properties, and consequently waste some significant information after the dimension reduction process. Several studies have tried to overcome these challenges by proposing new variants [14–17, 21, 22]. Unlike linear techniques, nonlinear approaches do not depend on the linearity assumption, so they can capture the relevant input structure and process the complex data [26, 27]. Moreover, these approaches can find the significant properties and nonlinear structures that linear techniques (PCA or MDS) do not detect. Nonlinear techniques offer mostly better classification accuracy, compared to linear techniques [26, 29, 33] and they can detect aberrant values. On the other side, nonlinear feature extraction approaches have complex formulation (Isomap, diffusion map, K-ACP, LLE), a high computing complexity and need a huge space memory [26, 27, 29, 36]. To solve these issues, several nonlinear variants have been invented recently [6, 18, 30, 36]. Table 1. Feature extraction techniques proprieties. Method
Naturea
Complexityb Contribution(s)
Principal Component - Linear O(D3 ) Analysis (PCA) [1] Unsupervised Locality Preserving - Linear O(D2 ) Projections (LPP) [16] Unsupervised Multidimensional - Linear O(n3 ) Scaling (MDS) [19]
-
Simple to apply Good performance yield Reversible transformation Good effects Strong robustness Insensible to outliers Good performance
Limit(s) - Needs a lot of memory space - Neglects local structures - Difficult to determine an appropriate neighborhood size - Sensitive to noise - Processes only small images - Heavy storage
(continued)
180
A. Fejjari et al. Table 1. (continued)
Method
Nature
a
Unsupervised Isometric feature - Nonlinear, mapping (Isomap) global [24] Unsupervised Diffusion maps - Nonlinear, (DM) [27] global - Semisupervised Kernel PCA (KPCA) - Nonlinear, [29] global Unsupervised AutoEncoder - Nonlinear, (AE) [32] global Unsupervised Locally Linear - Nonlinear, Embedding local (LLE) [34] Unsupervised Laplacian Eigenmaps - Nonlinear, (LE) [18] local Unsupervised Hessian Eigenmaps - Nonlinear, (HLLE) [10] local Unsupervised Local Tangent Space - Nonlinear, Alignment (LTSA) local [5] Unsupervised
Complexityb Contribution(s)
Limit(s)
O(n3 )
- Finds the meaningful global structures - Great classification accuracy
- High computational cost - Short circuit problem
O(n3 )
- Processes complex data.
- High computational issue
O(n3 )
- Classification accuracy much better than the PCA - Robustness to noise
- Resource storage issue - High computing cost
O(inw)
- Good classification accuracy
O(pn2 )
- Several free parameters must be set - Intense memory consumption
O(pn2 )
- Keeps the manifold structures - Robust to aberrant values
- Graph construction and eigenvectors calculation can be prohibitive
O(pn2 )
- Insensitive to neighborhood sizes
- Destroys the HSI form
O(pn2 )
- Neighborhoods choice is less sensitive compared to the LLE
- Lack of new data generalization - High computing complexity
a: Supervised techniques [11] are used when we have an idea about the class information, whereas, for unsupervised methods, no object information is available to classify tested dataset. b: n: the input data set, D: the n dimensionality, w: number of weight, i: number of iterations, and p: the ratio of nonzero elements in the weight matrix.
5 Experiments 5.1
Data Sets and Experimental Process
In this section, we evaluated the 11 feature extraction techniques mentioned above: PCA [1], LPP [16], MDS [19], Isomap [24], DM [27], KPCA [29], AE [32], LLE [34], LE [18], HLLE [10] and LTSA [5], with two real hyperspectral data sets those are: the Indian Pines and Salinas-A. The two HSIs were downloaded from [3]. Indian Pines data set was registered by the Visible/Infrared Aerial Imaging Spectrometry (AVIRIS) Sensor in 1992 in the north of Indiana (USA). It covers 145 145 pixels, 200 spectral bands in the wavelength 0.4 − 2.5 lm, and 16 classes. Salinas-A image was collected by the AVIRIS sensor in the Salinas Valley in California (USA); it is composed of 204 bands of size 86 83 pixels with a spatial resolution of 3.7 m/pixels. Salinas-A data
Feature Extraction Techniques for Hyperspectral Images Classification
181
set contains 6 classes. Figure 1 shows the pseudo-color images of the two chosen HSIs and their ground truth maps. Our goal in this paper is to compare the effects of the most popular feature extraction techniques during hyperspectral classification tasks. Once the dimension reduction methods are implemented, a classifier is adopted to generate classification accuracies and maps. Since it can deal with the high dimensional data set, such as hyperspectral images, and implement a limited number of training samples, the Support Vector Machine (SVM) [4] classifier was chosen to classify the test set. The performance was evaluated according to overall accuracy (OA), average accuracy (AA), Kappa coefficient [37] and speed out. All algorithms are implemented in Matlab language using a laptop with a 2-GHz processor and 4-GB memory. We used only 10% and 1% of pixels of each class, for the Indian Pines and Salinas-A data sets respectively, chosen randomly as training samples. We repeated each classification script 10 times and the mean of the classification rates was used to judge classification performance. Feature extraction techniques were implemented based on parameters mentioned in Table 2. 5.2
Results
Tables 3–4 give classification results and computing time for all examined approaches toward the two tested images. Figures 2 and 3 correspond to their diagrams. For the two tested scenes, we observed that the most of linear techniques (PCA, LPP and MDS) are the fastest compared to nonlinear techniques. PCA appears as the speediest one followed by MDS then LPP. Nonlinear techniques perform well than linear techniques, but with a much higher computing time. Global techniques (Isomap, DM, KPCA and AE) offer better classification results than local approaches (LE, LLE, HLLE and LTSA), while these latter are computationally more effective since they involve the computing of a sparse matrix which offers a polynomial speed up. Isomap provides the best classification accuracy for the two tested scenes. However, it is the most expensive in terms of computing time; it needs almost 10 min for the Indian Pines data set and more than 3 min for Salinas-A image. HLLE gives the worst classification results: it offers an overall accuracy (OA) near of 40% for the first scene and 28% for the second image. We remark clearly, as we mentioned, that HLLE damages the HSI form. Conserve distances (Isomap and DM) during data projection, in low dimension
Fig. 1. Color composites of hyperspectral images and their corresponding ground truth maps: (a) Indian Pines and (b) Salinas-A.
182
A. Fejjari et al.
space, is more effective than deploying the kernel’s notion (KPCA) and neural networks (AE), since they give a better classification yield, but they are the most expensive in terms of computing time. For local approaches, reconstruction weights (LLE) is more feasible than Laplacian graph (LE) and local tangent space approaches (HLLE and LTSA). The nonlinear approximation of PCA i.e. KPCA provides much better accuracy rates than PCA, it allows to gain almost 6% of overall accuracy for Indian Pines scene and about 5% for Salinas-A image.
Table 2. Details about implemented parameters for the two used hyperspectral scenes. Parameter PCA LPP D 20 20 – 20 Ka Kernel function – – Layers – – No. of iteration – – a: K is the number of nearest
MDS Isomap DM KPCA AE LLE 20 20 20 20 – 20 – 12 – – – 12 – – – Gb – – – – – – 10 – – – – – 50 – neighbors. b: G is the Gaussian kernel.
LE 20 20 – – –
HLLE 20 12 – – –
LTSA 20 12 – – –
Table 3. Classification results for Indian Pines data set. Classes
No. of samples PCA
LPP
MDS
Isomap DM
KPCA AE
LLE
LE
HLLE LTSA
Alfalfa Corn-N Corn-M Corn Grass-P Grass-T Grass-P M Hay Oats Soybeans-N Soybeans-M Soybeans-C Wheat Woods Buildings Stone OA (%) AA (%) Kappa (%) Computing time (S)
46 1428 830 237 483 730 28 478 20 972 2455 593 205 1265 386 93
99,97 89,69 93,83 98,47 98,94 97,58 99,93 99,66 99,82 96,33 89,38 95,13 99,15 97,81 97,17 99,8 76,29 97,04 72,62 23,62
99,97 87,86 92,8 98,18 97,73 96,12 99,77 99,1 99,82 94,87 85,99 94,75 99,6 95,68 96,73 99,77 69,35 96,17 64,55 16,54
99,98 92,07 95,22 98,99 99,21 98,58 99,85 99,41 99,86 97,39 92,97 96,04 99,82 98,5 97,93 99,78 82,78 97,85 80,24 590,07
99,93 92,47 94,91 98,98 99,14 98,44 99,93 99,61 99,9 96,26 90,91 96,1 99,78 96,71 97,04 99,8 79,90 97,49 76,89 89,19
99,99 90,68 93,83 99,19 99,06 97,84 99,87 99,77 99,82 96,21 88,27 95,08 99,72 98,34 96,94 99,83 77,19 97,15 73,64 70,33
99,98 89,50 94,03 98,44 98,83 96,48 99,92 99,21 99,48 95,27 84,09 94,82 99,38 96,28 96,39 99,89 68,92 96,37 65,17 28,35
99,80 83,81 88,77 97,96 95,74 92,09 98,48 96,43 99,63 91,22 63,84 91,89 97,93 89,64 94,32 98,31 39,93 92,49 29,12 41,91
99,98 90,26 93,9 98,45 98,69 97,24 99,9 99,37 99,83 95,57 87,56 95,48 99,39 95,13 96,54 99,77 73,51 96,69 69,39 10,90
99,97 92,89 95,31 99,27 99,43 98,87 99,9 1,00 99,82 96,67 91,95 96,29 99,71 99,57 97,34 99,9 82,30 97,79 79,72 253,7
99,88 90,42 94,68 99,01 99,17 98,56 99,84 99,52 99,86 97,35 90,54 95,37 99,58 97,71 97,82 99,74 79,45 97,44 76,16 222,3
99,50 89,00 92,02 97,56 98,43 95,96 99,75 98,90 99,77 93,62 85,04 94,41 99,05 98,22 96,28 99,78 68,67 96,08 63,70 67,87
Feature Extraction Techniques for Hyperspectral Images Classification
183
Table 4. Classification results for Salinas-A data set. Classes
No. of samples
PCA
LPP
Brocoli Corn Lettuce_romaine 4wk Lettuce_romaine 5wk Lettuce_romaine 6wk Lettuce_romaine 7wk OA (%) AA (%) Kappa (%) Computing time (S)
391 1343 616
98,01 99,94 98,58 96,32 75,01 74,47 75,12 76,29 86,60 89,50 86,18 92,65
96,92 77,02 89,97
99,95 75,25 88,35
99,98 98,22 98,61 94,89 75,2 75,32 74,43 63,11 89,38 88,18 87,84 90,63
97,86 75,65 86,74
1525
66,51 71,00 63,92 72,04
69,72
71,57
71,48 70,26 67,63 58,34
65,05
674
85,11 87,32 85,28 90,65
87,28
87,55
86,86 87,12 86,36 93,70
88,30
799
87,02 85,03 86,54 91,11
84,75
84,93
86,28 85,20 83,70 91,54
90,12
67,7 83,04 63,4 1,87
71,6 84,27 67,1 115,16
72,8 84,6 69,26 34,97
73,8 84,86 69,8 82,13
66,6 83,95 61,4 8,67
72,1 84,54 68,8 3,14
MDS Isomap DM
65,1 82,60 61,03 5,47
76,8 86,51 74,2 212,97
KPCA AE
LLE
70,2 84,05 64,4 12,78
LE
59,3 83,09 55,20 10,55
HLLE LTSA
27,6 82,03 25,2 7,54
Fig. 2. (a) Classification accuracies and (b) computing time of Indian Pines data set, obtained by different feature extraction techniques.
184
A. Fejjari et al.
Fig. 3. (a) Classification accuracies and (b) computing time results of Salinas-A data set, obtained by different feature extraction techniques.
PCA
LPP
MDS
Isomap
DM
KPCA
AE
LLE
LE
HLLE
LTSA
Fig. 4. Classification maps of Indian Pines data set, obtained by various feature extraction methods.
Feature Extraction Techniques for Hyperspectral Images Classification
PCA
LPP
MDS
Isomap
DM
KPCA
AE
LLE
LE
HLLE
185
LTSA
Fig. 5. Classification maps for Salinas-A data set, obtained by various feature extraction methods.
On the other hand, LPP technique, the linear variant of LE method, allows gaining 7% for the first image and about 13% for the second one compared to LE, it is also more effective in term of time processing. Figures 4 and 5 show the classification maps obtained by Indian Pines and Salinas-A images respectively, for all the tested approaches.
6 Conclusions and Future Works Recently, feature extraction based HSI dimensionality reduction has drawn a considerable attention in the remote sensing field and prove a significant performance. In this paper, we briefly presented the most well-known feature extraction techniques that are often used to reduce HSI dimensionality. These techniques can be divided into linear and nonlinear methods. In contrast to linear techniques, nonlinear approaches can process HSIs complex features. We have also compared and analyzed the performance of the studied methods using two real hyperspectral data. The classification accuracies obtained by the different feature extraction techniques demonstrate that linear methods outperform nonlinear ones in term of computing processing. However, nonlinear feature extraction techniques achieve the best classification performance. The development of nonlinear approaches with a low computing cost presents a promising research field for the future study. Acknowledgment. This work was supported and financed by the Ministry of Higher Education and Scientific Research of Tunisia.
186
A. Fejjari et al.
References 1. Laparra, V., Malo, J., Camps-Valls, G.: Dimensionality reduction via regression in hyperspectral imagery. IEEE J. Sel. Top. Sign. Process. 9(6), 1026–1036 (2015) 2. Kurz, T.H., Buckley, S.J.: A review of hyperspectral imaging in close range applications. In: The International Society for Photogrammetry and Remote Sensing Congress, pp. 865–870. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Prague, Czech Republic (2016) 3. Computational Intelligence search group site. http://www.ehu.eus/ccwintco/index.php?title= Hyperspectral_Remote_Sensing_Scenes. Accessed 05 Dec 2017 4. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011) 5. Ma, L., Crawford, M.M., Tian, J.: Anomaly detection for hyperspectral images using local tangent space alignment. In: 2010 IEEE International Geoscience and Remote Sensing Symposium, Honolulu, HI, USA, pp. 824–827. IEEE (2010) 6. Ma, L., Crawford, M.M., Tian, J.W.: Generalised supervised local tangent space alignment for hyperspectral image classification. Electron. Lett. 46(7), 497–498 (2010) 7. Huang, H., Yang, M.: Dimensionality reduction of hyperspectral images with sparse discriminant embedding. IEEE Trans. Geosci. Remote Sens. 53(9), 5160–5169 (2015) 8. Khodr, J., Younes, R.: Dimensionality reduction on hyperspectral images: a comparative review based on artificial datas. In: 2011 4th International Congress on Image and Signal Processing, Shanghai, China, pp. 1875–1883. IEEE (2011) 9. Khoder, J.: Nouvel Algorithme pour la Réduction de la Dimensionnalité en Imagerie Hyperspectrale. Ph.D. thesis, the University of Versailles St Quentin-en-Yvelines and the Lebanese University (2014) 10. Donoho, D.L., Grimes, C.: Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data. Nat. Acad. Sci. 100(10), 5591–5596 (2013) 11. Lodha, S.P., Kamlapur, S.M.: Dimensionality reduction techniques for hyperspectral images. Int. J. Appl. Innov. Eng. Manage. (IJAIEM) 3(10), 92–99 (2014) 12. Lu, G., Fei, B.: Medical hyperspectral imaging: a review. J. Biomed. Opt. 19(1), 1–23 (2014) 13. Zhang, L., Zhang, L., Tao, D., Huang, X.: Tensor discriminative locality alignment for hyperspectral image spectral-spatial feature extraction. IEEE Trans. Geosci. Remote Sens. 51 (1), 242–256 (2013) 14. Ren, J., Zabalza, J., Marshall, S., Zheng, J.: Effective feature extraction and data reduction in remote sensing using hyperspectral imaging [applications corner]. IEEE Signal Process. Mag. 31(4), 149–154 (2014) 15. Deepa, P., Thilagavathi, K.: Feature extraction of hyperspectral image using principal component analysis and folded-principal component analysis. In: 2015 2nd International Conference on Electronics and Communication System (ICECS), Coimbatore, India, pp. 656–660. IEEE (2015) 16. Luo, H., Tang, Y.Y., Li, C., Yang, L.: Local and global geometric structure preserving and application to hyperspectral image classification. Math. Probl. Eng. 2015, 13 (2015) 17. Zhai, Y., Zhang, L., Wang, N., Guo, Y., Cen, Y., Wu, T., Tong, Q.: A modified localitypreserving projection approach for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 13(8), 1059–1063 (2016) 18. Zhang, X., Chew, S.E., Xu, Z., Cahill, N.D.: SLIC superpixels for efficient graph-based dimensionality reduction of hyperspectral imagery. In: SPIE Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XXI, p. 947209 (2015)
Feature Extraction Techniques for Hyperspectral Images Classification
187
19. France, S.L., Carroll, J.D.: Two-way multidimensional scaling: a review. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 41(5), 644–661 (2011) 20. Long, Y., Li, H.-C., Celik, T., Longbotham, N., Emery, W.J.: Pairwise-distance-analysisdriven dimensionality reduction model with double mappings for hyperspectral image visualization. Remote Sens. 7(6), 7785–7808 (2015) 21. Fang, J., Qian, Y.: Local detail enhanced hyperspectral image visualization. In: 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, pp. 1092–1095. IEEE (2015) 22. Naud, A., Duch, W.: Visualization of large data sets using MDS combined with LVQ. In: The 6th International Conference on Neural Networks and Soft Computing, Zakopane, Poland, pp. 632–637 (2002) 23. De Silva, V., Tenenbaum, J.B.: Global versus local methods in nonlinear dimensionality reduction. In: The 15th International Conference on Neural Information Processing Systems, MA, USA, pp. 721–728 (2003) 24. Velasco-Forero, S., Angulo, J., Chanussot, J.: Morphological image distances for hyperspectral dimensionality exploration using Kernel-PCA and ISOMAP. In: 2009 IEEE International Geoscience and Remote Sensing Symposium, The Cape, South Africa, pp. III109–III-112. IEEE (2009) 25. Jin, C., Bachmann, C.M.: Parallel acceleration of ISOMAP algorithm applied in hyperspectral imagery using OpenCL on heterogeneous systems. In: The 35th Canadian Symposium on Remote Sensing (2014) 26. Sun, W., Halevy, A., Benedetto, J.J., Czaja, W., Liu, C., Wu, H., Shi, B., Li, W.: ULISOMAP based nonlinear dimensionality reduction for hyperspectral imagery classification. ISPRS J. Photogramm. Remote Sens. 89, 25–36 (2014) 27. Du Plessis, L., Xu, R., Damelin, S., Sears, M., Wunsch, D.C.: Reducing dimensionality of hyperspectral data with diffusion maps and clustering with k-means and Fuzzy ART. Int. J. Syst. Control Commun. 3(3), 232–251 (2011) 28. Xu, R., Du Plessis, L., Damelin, S., Sears, M., Wunsch, D.C.: Analysis of hyperspectral data with diffusion maps and fuzzy ART. In: 2009 International Joint Conference on Neural Networks, Atlanta, GA, USA, pp. 3390–3397. IEEE (2009) 29. Licciardi, G.A., Chanussot, J., Vasile, G., Piscini, A.: Enhancing hyperspectral image quality using nonlinear PCA. In: 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, pp. 5087–5091, IEEE (2014) 30. Liao, W., Pizurica, A., Philips, W., Pi, Y.: A fast iterative kernel PCA feature extraction for hyperspectral images. In: 2010 17th IEEE International Conference on Image Processing (ICIP), Hong Kong, China, pp. 1317–1320. IEEE (2010) 31. Ghedira, H.: Utilisation de réseaux de Neurones pour la cartographie des milieux humides à partir d’une Séries temporelle d’image RADARSAT-1. Ph.D. thesis, University of Quebec INRS-Eau, August 2002 32. Xing, C., Ma, L., Yang, X.: Stacked denoise autoencoder based feature extraction and classification for hyperspectral images. J. Sens. 2016, 10 (2016) 33. Wadstromer, N., Gustafsson, D.: Non-linear hyperspectral subspace mapping using stacked autoencoder. In: The 29th Annual Workshop of the Swedish Artificial Intelligence Society (SAIS), Malmö, Sweden (2016) 34. Ma, L., Crawford, M.M., Tian, J.: Anomaly detection for hyperspectral images based on robust locally linear embedding. J. Infrared Millim. Te. Waves 31(6), 753–762 (2010) 35. Zhang, L., Zhao, C.: Sparsity divergence index based on locally linear embedding for hyperspectral anomaly detection. J. Appl. Remote Sens. 10(2), 025026 (2016) 36. Chen, G., Qian, S.-E.: Dimensionality reduction of hyperspectral imagery using improved locally linear embedding. J. Appl. Remote Sens. 1(1), 1–10 (2007)
188
A. Fejjari et al.
37. Kang, X., Li, S., Fang, L., Benediktsson, J.A.: Intrinsic image decomposition for feature extraction of hyperspectral images. IEEE Trans. Geosci. Remote Sens. 53(4), 2241–2253 (2015) 38. Wang, J.: Hessian Locally Linear Embedding. In: Geometric Structure of High-Dimensional Data and Dimensionality Reduction, pp. 249–265. Springer, Heidelberg (2011) 39. Fejjari, A., Saheb Ettabaa, K., Korbaa, O.: Modified Schroedinger eigenmap projections algorithm for hyperspectral imagery classification. In: IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), Hamammet, Tunisia, pp. 809–814. IEEE (2017) 40. Fejjari, A., Saheb Ettabaa, K., Korbaa, O.: Modified graph-based algorithm for efficient hyperspectral feature extraction. In: the 32nd International Symposium on Computer and Information Sciences (ISCIS), Poznan, Poland, pp. 87–95. Springer (2018) 41. Fejjari, A., Saheb Ettabaa, K., Korbaa, O.: Fast spatial spectral Schroedinger eigenmaps algorithm for hyperspectral feature extraction. In: the 22nd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES), Belgrade, Serbia, pp. 656–664 (2018)
Multi-neural Networks Object Identification Nicolas Park1, Daniela López De Luise1(&), Daniel Rivera1, Leonardo M. Bustamante2, Jude Hemanth3, Teodora Olariu1,4, and Iustin Olariu1 1
4
CI2S Labs, Buenos Aires, Argentina [email protected] 2 CAETI, Buenos Aires, Argentina 3 Department of ECE, Karunya University, Coimbatore, India Faculty of Medicine, Vasile Goldis Western University of Arad, Arad, Romania
Abstract. This paper presents an Android prototype called HOLOTECH, a system to help blind people to understand obstacles in the environment. The goal of this paper is the analysis of different techniques and procedures to perform a fast and lightweight model able to detect obstacles in this context. The predictions and analysis are statistically evaluated, and results are used in order to improve the inference results. The model keeps working using low precision images drifted from an Android cell phone supported with ultrasonic sensors. Images are pre-processed on the fly, using multiple neural networks in conjunction with other heuristics, to infer obstacles and their displacement. Results indicate that it is possible to improve the prediction rate and at the same time to reduce extra processing load. Keywords: Neural networks Trajectory prediction
Image processing Obstacle identification
1 Introduction The work started with making a lightweight system to improve the overall life expectative of an individual with sight problem. Currently individuals with this problem need to use a stick or have the help of another person or animal to move to one place or another, and others tools to handle with this problem are expensive or are awkward to be used frequently. Our approach was using a commonly tool like a cellphone with some attachments to do complex detection and prediction to give information of the environment to the individual. Because of the scarce resource we opted using a similar approach like a Modular Neural Network [33] but using image processing, neural network and expert system to handle the problem that is complex to make prediction and need to use different approach like fuzzy system [34] to have a clear analysis. For the processing of the images is know that a continuous collection of images representing a time flow is called image sequence and there are several well proved algorithms used for a good predictions of trajectory. Werner Bailer [1] approach of tracking and clustering algorithms is the most similar example where adapting it could prove to have a better result, for that our adaptacion was for the tracking use © Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 189–196, 2021. https://doi.org/10.1007/978-3-030-52190-5_13
190
N. Park et al.
CUDA to accelerate the processing and a 4-parameter motion model for the clustering algorithms, for the treatment of the image was included a Multistage Hypothesis Testing (MHI) of every pixels value that is designed in such a way that it reduces the total error [2]. Stochastic filtering using Hyper-parameters to reduce the effect of outliers where the disadvantage of this filter is the effect of approximate distribution by particles [3]. Prediction of visual path is addressed by the deep learning framework. This will improve the scene understanding capability [4]. For small moving objects in an image sequence with Gaussian noise are detected using MHT algorithm and computational efficiency is achieved using hierarchically organized trajectory, most of the case this is not suited to use in a complicated backgrounds [5]. In some cases there are noises in the sequence and one of the approach use is a Nonlinear temporal filtering algorithm to reduce the noise. And it increases the image quality and efficiency of image coding operations [6]. Most of the processing and recognition are to predict an accidents and the calculation is done using a probability of the identified using three dimensional model-based vehicle tracking method. Then sample trajectories are given to self organizing neural network. Matching and locating methods are used to predict the vehicle [7]. And for the object tracking in a spatiotemporal domain one dimensional trajectory a filter is used here. Without the knowledge of the target this filter will track a moving object with linear or nonlinear trajectory [8] to make the prediction. And tracking similarity of the frames are measured in long term affine and translational methods for the bottom–up analysis of image sequence. For the complete sequence Camilo introduced a trajectory Tree as a single [9]. In cases when the motion is smooth a linear estimation algorithm is used, position and motion of the object can simulate a future behavior and recovering of short missing subsequences is also possible. Over determination and a least squares criterion will smoothen the noise signals [10]. Another approach is a feature-based analysis and optic flow techniques are briefly discussed in sequence of monocular and stereoscopic images and estimation techniques of structure and motion. In a plan image calculating two dimensional field of brightness, values are present under the optic flow and feature-based analysis and optic flow techniques are compared [11]. Autoregressive and optical flow are used to extract the moving objects and improve the prediction model. this approach is used for traffic control [12]. Because the systematic model of visual perception method is used for visually impaired and able-bodied persons to predict the ways this approach it has less accuracy this is suitable to select the best interface among a pool interface [13]. Another thing is a region based tracking method used to get the trajectory maps. This has a geometric filter and motion filter were is used to predicts then updates the region position and estimating the region of motion parameters [14]. In this case using linear minimum mean square error point filtering is more relevant. Motion estimation algorithm is used to compute the motion trajectories. Noise is removed by motion compensated method [15]. In a traffic road moving vehicle were the detection comes with its shadow, an approach were the motion model is used to estimate the motion of the vehicle. For this case Image segmentation is used to generate the model hypotheses [16]. Some samples of Low resolution (noisy image) image is restored as high resolution image with a higher sampling rate of the same scene and it is achieved by increasing the L factor. Frequency domain method and projections onto convex sets methods will reduce the noise and blurring of a image [17]. And for the tracking, Bayesian multiple hypothesis
Multi-neural Networks Object Identification
191
tracking (MHT) method is used in contour of segmentation and temporal tracking of the selected object. Nevertheless if there is any obstacle it will affect the tracking [18]. The sequences of images are analyzed by using “Optical flow” [19], “Axial motion stereo [20] and other methods. The idea is to demonstrate the object recognition efficiency by using screens of images or scenes [21]. Many investigations are driven towards the analysis of dynamic scenes, but due to the rules analysis of static images, it recommends imposing a self-restriction of only using 2 or 3 frames per scene [22]. The method of quasi-dynamic analysis is assumed, therefore, we have a 2D image where there is a static body motion that must be interpreted. Recent research shows that human vision needs more frames of a sequence to achieve a movement pattern [23, 24] and that the sensitivity to noise improves when the number of frames observed increase. [25] This method will improve and refine the data obtained [22].
2 Test The implementation to adapt and improve different approach were the main problem is the tracking of the object and depending of the sequence of say tracking to make prediction. To reduce the margin error of the detection it was taken into account approaches like the Grey Wolf Optimizer [35], Optimization of MNN [36], to granular the image for a better detection of the object and apply several methods in a parallel processing and machine learning. 2.1
Tools and Methods
The prototype classifiers used is a group of haar-cascade from the OpenCV [32]; these classifiers were selected for their fast response and high success rate to recognize the object that were trained for. Because in real-time the recognized object aren’t static and have random behavior, to trance and maintain the success rate recognition of the object, different approach and method was used. The main idea of the methods are classifying the object in separate pieces, decomposing and focusing in a specific part of the object in different occasions. This approach can improve the reaction, precision and tracing of the object removing irrelevant data noise. 2.2
Dataset
In order to test the system’s precision, the database from the University of Michigan called “Collective Activity Dataset” was used [31]. All of them are real situations walking in an outdoor environment. This database is a close example of the original situation capture in real-time video. The real-time videos were obtained by using a hand-held digital camera (5x). Each image were made with every 10th frame in all video sequences. The human faces were identified as objects in a region of interest of 50 50 pixels. The dimension of each image was 720 480 pixels, the average size was 50 KB and the raw image format was JPG.
192
N. Park et al.
Because classifiers don’t support multiple classifications of different objects at the same time, we used many classifiers to classifier all possible objects and then build an expert system to integrate all specific classifiers. 2.3
Tests Evolution
With the approach of obtaining a fast-precise expert system different methods was proposed: The first method was a classification in cascade, where the first detection was to take the object as a whole and later a part of the object. This first version of the expert system was made with the following main three steps: • Detect body. • Detect frontal face in the body area. • Detect eyes in the face area. For the second method changing the approach of decomposition and focus of the detection a new expert system was implemented with the following main steps: • Detect a front face. • Detect a profile face. • Face detected with an area of 50 50 pixels. 2.4
Test Results
The first version approach the method hadn’t been able to work with a good hit rate.
So, with the results of the first method having a low hit rate and the reason for it, was because the objects only detected frontal faces another approach was implemented. This new version of the expert system has been able to improve its hit rate. Because in this approach the expert system tries to recognize frontal faces, profile faces that is a transition of the object in a set of 858 people’s images, car and other objects. The method display an improvement in the result compared to the first method, the main
Multi-neural Networks Object Identification
193
differences between both method was that the second method the focus for the faces was an area of 50 50 pixels very close to the camera and the incorporation of the profile face classifier.
As can be seen in the test table, hit rate was very good. Method has been able to detect a 95,7% of the faces in an area of 50 50 pixels with a very low false alarm of 6,28%.
3 Results Analysis Between the different result each using methods and changed approach, focus, and classifiers displayed a significant change in the outcome. The first approach did not provide a good hit rate, 64.41% while the second approach, 95,70% this mean the detection of the object in the image was found each time was processed and the reason of this outcome was because with the first expert system the type of classifiers didn’t include a profile classifier of an object making a less precise in the detection of objects needed to detect. Meanwhile second approach while the precision of the detection of object improved in a significant means the drawback of this method is the need that the object to detect need to be in a close proximity. However both approach with its own strength can be used together because the expert system we design is a composition of different classifiers. A mix of both methods could improve the usability for multiple escenarios for example in a open space the use of the first approach to reduce the informacion need to be processed before using the second approach to a better detection and prediction.
194
N. Park et al.
The result presented were made using the same set of image where any previous filtering like reducing the information in the image wasn’t used, to measure each method separately how it behave in a real-time situation. So using multiple classifier were the result of each classification is taken like a filter for the others classifier, the efficiency and precision of the system will be far better for a more precise prediction. The use of both method at the same time or succession is possible because in both cases, even for the first approach were it hit rate is not that high, is because in both cases the false alarm is lower than 8%. This mean in both method the probability to detect wrongly an object is small, this helps in the future using other filters or classification, for example behavior tracking to improve the overall prediction and discard not dangerous situation.
4 Conclusion and Future Works The conclusion reached is that including a profile in a classifier can improve the detection of the object in real-time and mixing different classifier like a filter of each other can improve a better detection and tracking of the object that is changing its behavior constantly and the use of a single trained classifier is not enough to handle those changes. Having a constant detection of the object without losing it, is possible to analyze and trace the behavior to be used for prediction. Having different classifier to manage the different situation in time we could conclude that is way faster and precise that an multi situational classifier. With the conclusion reached, the immediate step is improving the first method or adding an acceptable threshold to be used like a filter and uniting both method and if necessary adding more classifier a for a complete tracing of the object from far to close and make a prediction of the behavior of the object. Because this is the case when we are tracing an singular object. Later the work need to trace multiple object at the same time so we need to establish a threshold of object detected to not reduce the performance and precision in real-time, record each behaviors and make predictions based in the situation and notify to the user if there is any danger. And Finally exposed the work in extreme situation like sudden movement, external stimulation and apply an auto correction method in those cases if needed.
References 1. Bailer, W., Fassold, H., Lee, F., Rosner, J.: Tracking and clustering salient features in image sequences. In: Conference on Visual Media Production (CVMP), pp. 17–24. IEEE (2010) 2. Blostein, S.D., Huang, T.S.: Detection of small moving objects in image sequences using multistage hypothesis testing. In: 1988 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1988, pp. 1068–1071. IEEE (1988) 3. Ichimura, N.: Stochastic filtering for motion trajectory in image sequences using a Monte Carlo filter with estimation of hyper-parameters. In: 16th International Conference on Pattern Recognition, Proceedings, vol. 4, pp. 68–73. IEEE (2002) 4. Huang, S., Li, X., Zhang, Z., He, Z., Fei, W., Liu, W., Tang, J., Zhuang, Y.: Deep learning driven visual path prediction from a single image. IEEE Trans. Image Process. 25(12), 5892– 5904 (2016)
Multi-neural Networks Object Identification
195
5. Blostein, S.D., Huang, T.S.: Detecting small, moving objects in image sequences using sequential hypothesis testing. IEEE Trans. Signal Process. 39(7), 1611–1629 (1991) 6. Dubois, E., Sabri, S.: Noise reduction in image sequences using motion-compensated temporal filtering. IEEE Trans. Commun. 32(7), 826–831 (1984) 7. Hu, W., Xiao, X., Xie, D., Tan, T., Maybank, S.: Traffic accident prediction using 3-D model-based vehicle tracking. IEEE Trans. Veh. Technol. 53(3), 677–694 (2004) 8. Pei, S.-C., Kuo, W.-Y., Huang, W.-T.: Tracking moving objects in image sequences using 1D trajectory filter. IEEE Signal Process. Lett. 13(1), 13–16 (2006) 9. Dorea, C.C., Pardais, M., Marqués, F.: Hierarchical partition-based representations for image sequences using trajectory merging criteria. In: IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2007, vol. 1, pp. I–1077. IEEE (2007) 10. Weng, J., Huang, T.S., Ahuja, N.: 3-D motion estimation, understanding, and prediction from noisy image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 3, 370–389 (1987) 11. Aggarwal, J.K., Nandhakumar, N.: On the computation of motion from sequences of images-a review. Proc. IEEE 76(8), 917–935 (1988) 12. Crespo, J.L., Zorrilla, M., Bernardos, P., Mora, E.: Moving objects forecast in image sequences using autoregressive algorithms. Vis. Comput. 25(4), 309–323 (2009) 13. Biswas, P., Robinson, P.: Modelling perception using image processing algorithms. In: Proceedings of the 23rd British HCI Group Annual Conference on People and Computers: Celebrating People and Technology, pp. 494–503. British Computer Society (2009) 14. Meyer, François, and Patrick Bouthemy. “Region-based tracking in an image sequence.” In European Conference on Computer Vision, pp. 476–484. Springer, Berlin, Heidelberg, 1992 15. Sezan, M.I., Ozkan, M.K., Fogel, S.V.: Temporally adaptive filtering of noisy image sequences using a robust motion estimation algorithm. In: 1991 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1991, pp. 2429–2432. IEEE (1991) 16. Koller, D., Daniilidis, K., Nagel, H.-H.: Model-based object tracking in monocular image sequences of road traffic scenes. Int. J. Comput. Vis. 10(3), 257–281 (1993) 17. Tekalp, A.M., Ozkan, M.K., Sezan, M.I.: High-resolution image reconstruction from lowerresolution image sequences and space-varying image restoration. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1992, vol. 3, pp. 169–172. IEEE (1992) 18. Tissainayagam, P., Suter, D.: Object tracking in image sequences using point features. Pattern Recogn. 38(1), 105–113 (2005) 19. Gibson, J.J.: The Ecological Approach to Visual Perception. Houghton Mifflen, Boston (1979) 20. O’Brien, N.G., Jain, R.: Axial motion stereo. In: Proceedings of Work-Shop on Computer Vision, Annapolis, MD (1984) 21. Johansson, G.: Spatio-temporal differentiation and integration in visual motion perception. Psych. Res. 38, 379–383 (1976) 22. Sethi, I.K., Jain, R.: Finding trajectores of feature points in a monocular image sequence. IEEE Trans. Pattern Anal. Mach. Intell. 1, 56–73 (1987) 23. Todd, J.T.: Visual information about rigid and nonrigid motion: a geometric analysis. J. Exper. Psychol. Human Percept. Perform. 8, 238–252 (1982) 24. Ramachandran, V.S., Anstis, S.M.: Extrapolation of motion path in human visual perception. Vision. Res. 23, 83–85 (1984) 25. Donner, J., Lappin, J.S., Perfetto, G.: Detection of three-dimensional structure in moving optical patterns. J. Exper. Psychol. Hum. Percept. Perform. 10(1), 1 (1984) 26. Koperski, K., Han, J.: Discovery of spatial association rules in geographic databases. In: SSD 1995, Portland, Maine, 6–9 August 1995, pp. 47–66. Springer, Heidelberg (1995)
196
N. Park et al.
27. Ester, M., Frommelt, A., Kriegel, H.-P., Sander, J.: Spatial data mining: database primitives, algorithms and efficient dbms support. Data Min. Knowl. Disc. 4(2/3), 193–216 (2000) 28. Ester, M., Kriegel, H.-P., Sander, J.: Knowledge discovery in spatial databases. In: Proceedings of the 23rd German Conference on Artificial Intelligence, KI 1999, pp. 61–74 (1999) 29. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD 2000: Proceedings of the 2000 ACM SIGMOD International Conference on Management of data, pp. 1–12. ACM Press, New York (2000). https://doi.org/10.1145/342009.335372 30. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: mining sequential patterns by prefix-projected growth. In: ICDE 2001, Heidelberg, Germany, 2–6 April 2001, pp. 215–224. IEEE Computer Society, Los Alamitos (2001) 31. Choi, W., Shahid, K., Savarese, S.: Collective Activity Dataset (2009). http://vhosts.eecs. umich.edu/vision//activity-dataset.html 32. OpenCV (2017). https://opencv.org/, https://github.com/opencv/opencv 33. Sotirov, S., Sotirova, E., Atanassova, V., Atanassov, K., Castillo, O., Melin, P., Petkov, T., Surchev, S.: A hybrid approach for modular neural network design using intercriteria analysis and intuitionistic fuzzy logic. Complexity 2018, 3927951:1–3927951:11 (2018) 34. Melin, P., Miramontes, I., Prado-Arechiga, G.: A hybrid model based on modular neural networks and fuzzy systems for classification of blood pressure and hypertension risk diagnosis. Expert Syst. Appl. 107, 146–164 (2018) 35. Sánchez, D., Melin, P., Castillo, O.: A grey wolf optimizer for modular granular neural networks for human recognition. Comp. Int. and Neurosci. 2017, 4180510:1–4180510:26 (2017) 36. Sánchez, D., Melin, P., Castillo, O.: Optimization of modular granular neural networks using a firefly algorithm for human recognition. Eng. Appl. AI 64, 172–186 (2017)
Development of a Testbed for Automatic Target Recognition and Object Change Tracking Gangavarapu Vigneswara Ihita and Vijay Rao Duddu(&) Institute for Systems Studies and Analyses Metcalfe House, Delhi, India [email protected], [email protected]
Abstract. Unmanned Aerial vehicles have developed great potential in the military and civil domains as an important platform for several mission objectives. Mission specific payloads on these unmanned platforms have facilitated a wide range of applications such as mapping, surveillance, transportation, target determination and supply of weapons. Automatic Target Recognition is the process of localizing and recognizing high-value targets from low value targets in noisy and complex backgrounds, which employ Content Based Information Retrieval techniques for object detection, identification and classification of ground systems such as aircraft. In this paper, we propose a methodology for identification and classification of ground objects such as aircraft on ground, military tanks and vehicles. The frames captured by the onboard sensors are processed for target recognition using MobileNet-SSD, a Deep Neural Network based algorithm. The change of object locations in the frames are recorded over time and these are modelled using a Finite State Machine. These state inputs form the basis of developing a Common Operational Picture over the area of Surveillance. Operations recorded and monitored by multiple distributed sensors are used to build the Geospatial intelligence over the geographical area of interest. The entire process is integrated using Apache Kafka, a distributed stream processing platform that facilitates asynchronous processing and communication. The proposed methodology is illustrated using several case studies in the paper. Keywords: Automatic Target Recognition Unmanned Aerial Vehicles Finite State Machines Object recognition Fuzzy image retrieval Geo-spatial intelligence
1 Introduction Geo-spatial intelligence is the intelligence obtained for geographical information, scientific investigations, resource management, Environmental Impact Assessment, Urban planning, cartography, mapping, route planning from exploiting, analysing and processing imagery and the Geo spatial information. Unmanned Aerial Vehicles provide a platform to collect data, perform geophysical surveys, transport cargo, monitor livestock, help in disaster relief for search and rescue operations. UAVs have multiple onboard sensors such as Infrared sensors, Electro-Optical sensors, Light Detection and © Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 197–210, 2021. https://doi.org/10.1007/978-3-030-52190-5_14
198
G. V. Ihita and V. R. Duddu
Ranging sensors, inertial sensors, sensors to indicate moving targets, sensors for detection of Chemical, Biological, Radiological, Nuclear and Explosives (CBRNE). One main application of the UAV is in the field of military and defence for surveillance and target monitoring as the use of these unmanned vehicles will make this job more efficient and less risky. These aircraft require image and optical sensors to help gather aerial images that will then be analysed and processed for object localization, detection, recognition and tracking. For long endurance missions of the UAVs, Automatic Target Recognition (ATR) technology demands an on-board intelligence and automation for Image processing applications. ATR is the process of automating the process of localizing and recognizing high-value targets from low value targets in noisy and complex backgrounds. The autonomy of this system helps in reducing the processing time, unreliability, vulnerability and in increasing accuracy. In this paper, Histogram of Oriented Gradients (HOG) method and a Deep Neural Network (DNN) module called MobileNet-SSD is explored and implemented for the military geo-intelligence application to identify and classify the ground objects such as aircraft on ground, military tanks and vehicles. This methodology comprises of three main phases: Target recognition, use of Finite State diagrams for object tracking and implementation of Apache Kafka, a distributed stream processing platform for asynchronous processing and communication. A Raspberry Pi is interfaced with Raspberry Pi Module V2 camera as the emulated device of the UAV and its sensors. MobileNet-SSD, a DNN based light weight object recognition algorithm is implemented on the Raspberry Pi. The changes in the number of detections in the frames captured and processed in time are modelled using a Finite State Machine (FSM) where the initial number of objects of interest represent the stable state The change in the states over observed time is represented by the transitions in the FSM. The FSM modelling helps in monitoring movement of targeted objects. This proposed methodology is applied to design an object recognition and target tracking simulation test-bed for evaluating the effectiveness of the UAV and performance of its on-board sensors for ATR operations.
2 Literature Survey Active research has been done in the area of object detection and recognition since the past decade. Some challenges faced while detecting and localizing objects in images are: • Unconstrained illumination • Complex Backgrounds • Variation in appearance Dalal and Triggs [1] in their paper propose the HOG algorithm for object detection. HOG is a feature descriptor and is often paired with linear SVM (Support Vector Machines) [2] for high accuracy object detection. The algorithm uses normalization and calculates the gradients within the detection window. The image is split into cells. The magnitude and orientation of the pixel gradient from each cell is accumulated to create a histogram of orientation of that cell. Here the structure, shape and appearance of an object is characterized by the positioning of the intensity gradients. The challenge of
Development of a Testbed for Automatic Target Recognition
199
intensity variation in images is addressed by combining several cells into a block. These overlapping blocks help in contrast normalization. HOG from multiple detection windows is collected and are arranged one after another in a big feature vector and then the SVM learning algorithm is used. From the results obtained after testing the detector on the INRIA data set [3], it was concluded that the HOG descriptor reduces false positives. Fine scale gradients, fine orientation binning, relatively coarse spatial binning, and high quality local contrast normalization in overlapping descriptor blocks are all important parameters for good performance. It is essential to maximize and efficiently analyse data captured by the cameras. Zhang et al. [4] describe two key characteristics that occur during frames or video processing: resource-quality trade-off with multi-dimensional configurations and variety in quality and lag goals. The first characteristic, resource-quality trade-off with multi-dimensional configurations, deals with the trade-off that occurs between the resource demanded and quality of output, when there is a variation in certain parameters such as video resolution and frame rate termed as Knobs. The second characteristic focuses on the variety in quality and lag goals. Scheduling large number of streaming video queries with diverse quality and lag goals, each with many configurations is a challenge. The paper proposes VideoStorm, a video analytics system that scales up to process thousands of video analytics queries on live video streams over large clusters. It defines a scheduler that allocates resources optimally based on the requirements of quality and lag of queries. Automatic Target Recognition of systems is their capability to automatically detect and recognize objects or targets as explored in a survey paper [5] that analyses various ATR implementation techniques and algorithms. Howard et al. [6] present a class of lightweight deep neural network models called MobileNets. These are based on a streamlined architecture that uses depth wise separable convolutions to build lightweight deep neural networks. The MobileNet architecture unlike the standard convolutional layer, which has batch normalization and ReLU (Rectified Linear Unit), has a depth wise separable convolution with depth wise and pointwise layers which are then followed by batch normalization and ReLU. The work also uses a set of two hyper-parameters: width multiplier and resolution multiplier in order to build a small, low latency model. The role of the width multiplier a is to thin the network uniformly at each layer whereas the resolution multiplier q reduces the computational cost of a neural network. Liu et al. [7] present the first deep network based object detector, SSD, that does not resample pixels or features for bounding box hypotheses and is shown to be as accurate as approaches that do so. Faster R-CNN [8] uses a region proposal network to create boundary boxes and utilizes those boxes to classify objects. It has a low frames per second (fps) rate which is below real time processing requirements. SSD outperforms this comparable state-of-the-art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size or lowresolution images. Improvements in accuracy of SSD are due to the use of multiscale features and default boxes. SSD has eliminated the region proposal network thereby improving the efficiency of object detection. In the following sections, we propose a methodology that utilises HOG and MobileNetSSD for object detection and recognition algorithms that are light-weight
200
G. V. Ihita and V. R. Duddu
and have a low latency. Further, once the objects are detected and recognised, we construct a finite state machine to represent the state of the objects at a given time, and on the event of changes in the type and location of objects, changes the state of the FSM.
3 Object Detection and Tracking In this section, we explore and implement HOG and MobileNet-SSD for object detection and recognition algorithms that have a high accuracy, low latency and is light weight. 3.1
Histogram of Oriented Gradients to Detect Objects
Dalal and Triggs [1] apply the HOG for the purpose of person detection where structure, shape and appearance of an object can be characterised by the positioning of the intensity gradients. It can be used for extracting features of other objects for applications such as coin matching but the performance of the algorithm may vary. Figure 1 shows the flow for extracting relevant features from input images to detect objects.
Fig. 1. Feature extraction and object detection chain [1]
Equations (1) and (2) are used to calculate the gradient magnitude and direction respectively. Gðy; xÞ ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Gx ðy; xÞ2 þ Gy ðy; xÞ2
hðy; xÞ ¼ arc tanð
Gx ðy; xÞ Þ Gy ðy; xÞ
ð1Þ ð2Þ
Development of a Testbed for Automatic Target Recognition
201
If v is the non-normalized vector containing all histograms in a particular block such that kvkk is its k-norm for k = 1, 2 and s be some small constant. Then, Eqs. (3), (4), (5) determine the normalisation factor. v ; L1 norm norm ¼ pffiffiffiffiffiffiffi2 kvk2 þ s2
ð3Þ
v L2 norm norm ¼ pffiffiffiffiffiffiffi ; kvk1 þ s2
ð4Þ
v L1sqrt norm norm ¼ pffiffiffiffiffiffiffi ; kv k1 þ s
ð5Þ
This algorithm can be implemented on platforms such as OpenCV, TensorFlow, Matlab. The algorithm was run using pre-trained models of OpenCV with Python. The following parameters were varied to improve the accuracy and increase the FPS of the algorithm. Win_Stride: It is the window stride. Higher the stride value, lower will be the accuracy. However, there is trade off here since higher stride value also means faster processing but with low accuracy. HitThreshold: Threshold for the distance between features and SVM classifying plane. Resize: HOGDescriptor: detectMultiScale function helps in performing object detection with a multi-scale window. Resize is a parameter of this function and varying this varies the resolution of the image. Scale: It is a factor for resizing the image. Higher the scale value, higher the number of levels to be processed and more processing time. The value of scale should be in the range of [1.01, 1.15]. Padding: It is a mock parameter to keep the CPU interface’s compatibility. This value is usually set to (0, 0). Optimal parameter values to achieve high accuracy are shown in Table 1. Table 1. HOG detection parameter for accuracy Win_Stride Scale Padding HitThreshold (6,6) 1.15 (8,8) −20
On varying the Resize parameter while keeping other parameters constant, high FPS was noticed as shown in Table 2. However, in certain frames the false positives continued to appear. Figure 2 is the frame from video captured by the Pi camera.
202
G. V. Ihita and V. R. Duddu Table 2. HOG descriptor parameter for low processing time Resize Time (s) 640 324.990 400 110.417 200 39.720 200 11.493 150 5.286
Frames per second 0.038 0.09 0.25 0.87 1.89
Fig. 2. Captured frame from Pi camera
Fig. 3. Output of HOG algorithm
Figure 2 shows the input image for the HOG algorithm. Figure 3 shows the result obtained on implementing the HOG algorithm on the input image. Following observations were made on implementing the HOG algorithm on the dataset:
Development of a Testbed for Automatic Target Recognition
203
• There was a lack of accuracy in the detection • Frames processing per second (FPS) was low, latency in processing. In view of the limitations of the above approach for object detection, we explore MobileNet-SSD which uses DNN for object detection and recognition. 3.2
MobileNet-SSD
MobileNet was designed for mobile and embedded vision applications using lightweight deep neural network models called MobileNets. Figure 4 shows the architecture of conventional convolution layer and the MobileNet depth wise separable convolution layers. Unlike conventional convolutional layers, the MobileNet has depth wise separable convolutions with depth wise and pointwise layers followed by batch normalisation and ReLU (Rectified Linear Units) after each convolutional layer [6].
Fig. 4. MobileNet Architecture [6]
Single Shot Detector [7], as the name suggests, focuses on the tasks of object localization and classification which are done in a single forward pass of the network. The SSD architecture is built on separable layers instead of connected ones. It is built on the VGG-16 architecture as VGG-16 can clearly classify images and use the concept of transfer learning to improve the results. At every layer of the architecture, the input size is reduced and features at multiple scales are extracted. Together, MobileNet with SSD is an accurate, efficient fast object detection module. MobileNet-SSD was implemented on a set of images. The algorithm was implemented on two different datasets using OpenCV and Python. Pre trained models MobileNetSSD were used containing multiple classes of objects such as bottles, airplanes, TV monitors, chairs, people, horses. The dataset used to implement MobileNetSSD was created in a lab environment using the Raspberry Pi camera under normal lighting. Figure 5 is the input image to module. Figure 6 shows the result of implementing the module. The algorithm accurately detects a chair (target object) with a confidence value of 98%. The output image contains the bounding box along with the label and the confidence value. When implementing recognition based tracking, the object pattern is
204
G. V. Ihita and V. R. Duddu
Fig. 5. Before implementing MobileNet-SSD
Fig. 6. After implementing MobileNet-SSD
Development of a Testbed for Automatic Target Recognition
205
recognized by analysing successive frames of images. The concept of Finite State Machines is implemented to notice deviations from a stable state. 3.3
Finite State Machines
Finite State Machine (FSM) is a computational model, an abstract machine that is defined by a list of states. A FSM can be in exactly one of a finite number of states at any given time. The change of states, called Transition, happens based on the conditions provided [9].
Fig. 7. FSM with binary stream as input
Figure 7 is the FSM for a system with a binary stream input. This input is a stream of 0’s and 1’s. The FSM has four states: 00, 01, 11 and 10. The state 00 is set as the stable state of the machine. The state transition depends on the incoming binary digit. For the FSM diagram as shown in Fig. 7, the state changes occur based on the following conditions (Table 3): Table 3. State transition table for binary stream FSM State Input 0 1 00 00 01 01 10 11 11 10 11 10 00 01
1. No deviation from Stable State S1 indicated by 0 2. Deviation from Stable State S1 indicated by 1
206
G. V. Ihita and V. R. Duddu
Figure 8 is a representation of the Unmanned Aerial Vehicles that are deployed at the borders. Four UAVs at four different locations, almost equally spaced to gather geospatial information are shown. Each UAV will have on board systems with running FSMs for locating and tracking objects.
Fig. 8. Representation of the satellite view of the ground with UAVs deployed.
Each state in the FSM represents the total count of the objects detected for each class. The stable state values are the initial count of objects detected per class at different instants and at fixed locations. The state transitions are determined by comparing the new total for each class with the stable state values at the same location. The FSM shown in Fig. 9 is designed for a two-class object detection such as aircrafts on ground, tanks. S1 represents the stable and initial state of the system. The state change occurs based on the following state transition conditions: a) Both the objects are in their stable state at any instant., indicated by 00 b) Object 1 doesn’t deviate from stable value, Object 2 deviates from stable value at any time instant; indicated by 01 c) Object 1 deviates from stable value, Object 2 doesn’t; indicated by 10 d) Both the objects are deviated from their stable values; indicated by 11 The States and the transitions for a two class FSM is shown in Table 4.
Development of a Testbed for Automatic Target Recognition
207
Fig. 9. FSM for change tracking Table 4. State transition table for FSM State Transitions 00 01 10 S1 S1 S2 S3 S2 S1 S2 S3 S3 S1 S2 S3 S4 S1 S2 S3
11 S4 S4 S4 S4
In an image consisting of an aerial view of three aircrafts, MobileNet-SSD detects four aircrafts with different probabilities as shown in Fig. 10 and Fig. 11. The Actual class vs Predicted class of the objects detected is shown in Table 5 with N = 3 where N indicates the total number of objects actually present in the image. False Positive or false alarm are predictions that should be false but were predicted as true. This can be calculated using: False Positive Rate ¼
False Positive False Positive þ True Negative
ð6Þ
The False Positive Rate for the error matrix is 1. The output obtained from the MobileNet-SSD algorithm are fuzzy outputs [10] because of the vagueness and
208
G. V. Ihita and V. R. Duddu
Fig. 10. Output of MobileNet-SSD with misclassification (Source: https://media.istockphoto. com/)
Fig. 11. Detected class and prediction percentage Table 5. Predicated vs actual class Predicated class Actual class Airplane yes Airplane no Airplane yes 3 1 Airplane no 0 0
uncertainty in the detections and confidence values. The values that define the stable state are also not absolute as the initial values obtained from the proposed algorithm are fuzzy. Categorising objects based on crisp boundaries is not always the best way to classify objects in the real world. Fuzzy logic is used to help systems/computers to make decisions more like a human brain. So the FSM used should be a fuzzy state diagram which can be represented using fuzzy graphs.
Development of a Testbed for Automatic Target Recognition
209
Multiple factors effect this uncertainty in object recognition such as the algorithm used, complex background, distance between the UAV and the ground objects. The model needs to be trained based on the uncertainty values to maximise the accuracy. Misclassification errors can be disastrous when dealing with real world deployment of such systems. 3.4
Testbed Simulation
Apache Kafka is a distributed stream processing platform that facilitates asynchronous processing and communication [11]. Each FSM runs on the Raspberry Pi system on board. The information about the total count of detections is sent across to the ground station whenever there is a deviation of the FSM from its stable state. Thus, information that is sent to the ground station helps in tracking the target objects on ground. Apache Kafka facilitates this process of transfer of state information from each UAV. Figure 12 shows the proposed test bed design for object detection and tracking.
UAV 1 (Raspberry Pi 1)
UAV 2 (Raspberry Pi 2)
Apache Kafka Broker (Ground Station)
Fig. 12. Proposed test bed design for object detection and change tracking
4 Conclusions In this paper a simulation test-bed for object detection and tracking is proposed. For the process of object detection, two object detection and recognition approaches, namely Histogram of Oriented Gradients (HOG) and MobileNet-SSD are implemented. MobileNet-SSD technique is used for the process of implementing Automatic Target Recognition (ATR) on the Unmanned Aerial Systems (UAV). The gathering and processing on data is done on board the UAVs. The data collected is used for Geospatial intelligence. To locate and track the movement of targeted bodies on ground such as tanks and aircrafts on ground, Finite State Machines (FSM) is used. Real world deployments of such systems lead to fuzzy values. We are currently in the process of implementing fuzzy states thereby leading to a modelling methodology to implement fuzzy graphs.
210
G. V. Ihita and V. R. Duddu
References 1. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005) 2. Cortes, C., Vapnik, V.: Mach. Learn. 20, 273 (1995). https://doi.org/10.1023/A: 1022627411411 3. INRIA Person Dataset (2005). http://pascal.inrialpes.fr/data/human/. Accessed 20 Feb 2018 4. Zhang, H., Ananthanarayanan, G., Bodik, P., Philipose, M., Bahl, P., Freedman, M.J.: Live video analytics at scale with approximation and delay-tolerance. In: USENIX Symposium on Networked Systems Design and Implementation, NSDI (2017) 5. Bhanu, B.: Automatic target recognition: state of the art survey. In: IEEE Transactions on Aerospace and Electronic Systems, vol. AES-22, no. 4, p. 364379 (1986) 6. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: efficient convolutional neural networks for mobile vision applications: arXiv:1704.04861 [cs.CV] (2017) 7. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot detector: arXiv:1512.02325v5 [cs.CV] (2016) 8. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks: arXiv:1506.01497v3 (2016) 9. Van Gurp, J., Bosch, J.: On the implementation of finite state machines. In: 3rd Annual IASTED International Conference Software Engineering and Applications, 6–8 October 1999 10. Zadeh, L.A.: A fuzzy-algorithmic approach to the definition of complex or imprecise concepts, pp. 202–282 11. Apache Kafka. http://kafka.apache.org. Accessed 20 May 2018
Comparative Analysis of Various Image Splicing Algorithms Hafiz ur Rhhman1,3(&), Muhammad Arif1, Anwar Ullah2, Sadam Al-Azani3, Valentina Emilia Balas4, Oana Geman5, Muhammad Jalal Khan3, and Umar Islam6 1
School of Computer Science and Educational Software, Guangzhou University, Guangzhou 510006, China [email protected], [email protected] 2 Wollongong Joint Institute, Central China Normal University, Wuhan, Hubei, China [email protected] 3 Department of Information and Computer Science, King Fahd University of Petroleum and Minerals, Dhahran, Kingdom of Saudi Arabia {g201002580,g201408880}@kfupm.edu.sa 4 Department of Automatics and Applied Informatics, Aurel Vlaicu University of Arad, Arad, Romania [email protected] 5 Faculty of Electrical Engineering and Computer Science, University Suceava, Suceava, Romania [email protected] 6 Department of Computer Science, COMSATS University Abbottabad, Abbottabad, Pakistan [email protected]
Abstract. Daily millions of images are uploaded and download to the web, as a result the data is available in the paperless form in the computer system for organization. Nowadays, with the help of powerful computer software such as Photoshop and Corel Draw, it is very easy to alter the contents of the authenticated image without leaving any clues. This led to a big problem due to the negative impact of image splicing. It is highly recommended to develop image tampering detection technique to recognize the authentic and temper images. In this paper, we propose an enhanced technique for blind images splicing by combing Discrete Cosine Transform Domain (DTC) and Markov feature in the spatial domain. Moreover, Principal Component Analysis (PCA) is used to select the most significant features. Finally, Support Vector Machine (SVM) is applied to classify the image as being tempered or genuine on the publicly available dataset using ten-fold cross-validation. By applying different statistical techniques, the results showed that the proposed technique performs better than other available detection techniques in the literature. Keywords: Image forensics Image splicing algorithms
Image splicing detection Temper images
© Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 211–228, 2021. https://doi.org/10.1007/978-3-030-52190-5_15
212
H. ur Rhhman et al.
1 Introduction and Background We are living in a digital era and daily million pictures are upload, download and share on the web [1]. Due to the increased popularity of digital images, image tampering also populated day by day. Image tampering is a digital art which is used to manipulate digital pictures with the aim of altering important regions in these images [2, 3]. It is becoming a problem for organizations as well as for individuals because it has a negative impact on society, and they can change the content of the image. With the growth of powerful computer software and technology such as Photoshop and Corel Draw, it is very hard to identify the difference between the tamper and authentic or original images [3, 5, 6]. This leads to developing a robust technique for detecting the forged pictures. Image splicing, image retouching, and image copy-move forgery are popular methods used in image tampering [1, 9, 18]. Image retouching technique is normally used in media such as movies and magazine photos to enhance some features of the photos which does not change an image. Such type of manipulation did not consider image forging [4, 9, 19]. Copy-move forgery also called cloning which takes place by copy some part and past in the same image to duplicate or hide information. Splicing detection in such case in challenging due to no visible clue in image qualities [8, 17]. In image splicing, different techniques like skewing, stretching, rotation, etc., are used to create a tampered image in which different part from more than one image is used [9, 10]. For checking the originality, a signature or watermark is implanted in the image. But the image quality can be weakened by using watermarks. On the other hand, blind methods directly trace the information left by the forgery operation, so they don’t need such advance information regarding the source images [7, 20]. Using statistical methods to analyze the whole image content image splicing detecting is possible. For image splicing detection active and passive (or blind) methods are proposed in the literature [9–11]. A lot of work has been done among these. The work presented in [10], Shi et al. proposed a framework based on Moments and Markov features which had two types of statistical features. The moment features were based on characteristic functions (i.e., 1D and 2-D) and Markov features used Transition Probability Matrix in Discrete Cosine Transformation (DCT). But, Moments based features are computationally expensive although Markov help to achieve 77% detection accuracy by using CISDE dataset [14]. Table 1. Comparison between different image splicing approaches Work
Approach
Shi et al. [10]
Markov & 1D and BDCT 2D moments of characteristic functions Markov & DWT BDCT Markov Markov & Spatial BDCT Markov
Zhongwei et al. [11] El et al. [9]
Dataset Classifier
Dimension Detection accuracy
TPR
TNR
AUC
266
Low
Low
Low
Low
SVM + RFE 100
Medium
Medium Medium Medium
SVM + PCA
High
High
SVM
50
High
High
Comparative Analysis of Various Image Splicing Algorithms
213
Zhongwei et al. [11] also used Markov features to find out the inter-block correlation in Discrete Cosine Transformation. As a result, they enhanced the enhance the accuracy (89%). Zhao et al. [13] used PCA and conditional co-occurrence probability matrix to detect image tempering. They performed well in Markov features in DCT. Guyon I. et al. [15] achieved the highest accuracy (93%) for classification of tempered and original images. They introduced Radial basis function and SVM with ranking criterion. For robust and reliable algorithm A. Srivastava et al. [15] used Markov features in the wavelet domain by using rotation, translation, and scaling techniques. In this paper, we focus on blind detection of image through Discrete Cosine Transform and Spatial domain splicing. We conduct comparative analysis of various image splicing algorithms. For validating our work, we extract Markov features in the spatial domain as well as in DCT domain because it helps in achieving the best result in dictation in terms of accuracy and sensitivity. Moreover, before building the detection model we applied principal component analysis (PCA) for selecting most significant features. The experimental results have shown that our recommend method achieves 98% accuracy regarding splicing detection as compared with other techniques in the literature by using the same data set and even with less number of features they select. The remaining paper is organized as follows. Section 2, provides the evaluation criteria. Section 3 describes our research methodology. We analyze and discuss the numerical results in Sect. 4. Finally, Sect. 5 concludes the paper.
2 Evaluation Criteria In this part, the evaluation criteria of the selected papers are described. To measure the splicing detection, we choose some performance metrics or evaluation criteria namely: Dataset, True Positive Rate (TPR), True Negative Rate (TNR), Feature Selection, Feature Dimension, Classifier, and Detection Accuracy (Acc). TPR metrics is commonly used for sensitivity while TNR is also used for specificity [9]. These metrics are calculated as follows: Acc ¼ ðTP þ TN Þ=ðTP þ TN þ FN þ FPÞ
ð1Þ
TPR ¼ ðTPÞ=ðTP þ FN Þ
ð2Þ
TNR ¼ ðTN Þ=ðTN þ FPÞ
ð3Þ
TP = True positive (tampered predicated as tampered), TN = True Negative (authentic predicated as authentic), FP = False Positive (authentic predicted as tampered), FN = False Negative (tampered predicated as authentic)
214
H. ur Rhhman et al.
Based on the above evaluation criteria here we present three different approaches as shown in Table 1. The first approach is Markov based on moments of characteristic functions [10] for detecting splicing images. This approach first selects features based on characteristic functions (1-D and 2-D), then used Transition Probability Matrix in Discrete Cosine Transformation Markov features for detection using SVM classifier. But moments-based features are computationally expensive while Markov help to achieve 89% detection accuracy by using CISDE dataset [14]. The second approach [11] also used Markov features but this approached to find the inter-block correlation in Discrete Cosine Transformation furthermore to find the intra block correlation using the same dataset with a dimension of 100, but they used SVM and RFE classifier for better detection accuracy. The third and final approach is Markov & Spatial Markov [9]. They also used the same dataset but a different number of dimension using SVM and PCA classifier. El et al. [9] combined the first two approaches by using DCT based Markov features with spatial domain Markov features. The aims of this approach are to achieve better detection performance, TPR, TNR, and AUR values. We can clearly see from Table 1, that the TPR, TNR and AUR value of El et al. [9] are higher than other two approaches. Therefore, we select this approach to check it statistically by applying different statistical techniques for validation of the results. The overall experiment results are listed in Table 2, 3, 4, and Table 5 in the appendix of this paper.
3 Research Methodology In the following sub secretions, we discussed the details of our research methodology. 3.1
Approach
1. We used MATLAB tool and utilized LIBSVM library for the SVM classifier. 2. We used freely available dataset called, Columbia Image Splicing Detection Evaluation Dataset (CISDED) for Markov features [12, 16]. It consists 1845 images such that 933 of them are authentic and 912 of them are spliced grey-scaled images (128 * 128 dimension). 3. We used Markov features and support vector machine (SVM) classifiers for the blind detection of image splicing. 4. We used the spatial domain as well as in DCT domain to extract Markov features, in order to achieve the best result in terms of accuracy and sensitivity. 5. We applied principal component analysis (PCA) for selecting most significant features. 3.2
Calculating Statistics
In this step, we have extracted different information. We calculated confidence intervals (C.I), Mean, Standard Deviation of different techniques, and ANOVA as presented in Table 2, Table 3, Table 4, and Table 5 in the appendix of this paper. We used different
Comparative Analysis of Various Image Splicing Algorithms
215
thresholds of T for selecting Markov features. The overall experiment results are listed in Table 2, 3, 4 and Table 5 at the end of this paper. We also plot the area under the curve (AUC) and receiver operating curve (ROC) in the range of 0 and 1 in terms of the change occurs in TPR and FPR. Finally, we proved the three techniques by apply ANOVA. 3.3
Hypotheses
We have three techniques T1, T2, and T3 for Shi et al. [10], Zhongwei et al. [11], El et al. [9] respectively. Furthermore, we proposed four hypotheses for each technique regarding average detection accuracy, TPR, TNR, and AUC. T1 = Shi et al. [10] T2= Zhongwei et al. [11] T3 = El et al. [9] Hypotheses: Ho: T1 Average detection accuracy = T2 Average detection accuracy = T3 Average detection accuracy H1: T3 Average detection accuracy > Average detection accuracy of T1 T2. Ho: T1 Average TPR = T2 Average TPR = T3 Average TPR H2: T3 Average TRP > Average TPR of T1 and T2 Ho: T1 Average TNR = T2 Average TNR = T3 Average TNR H3: T3 Average TNR > Average TNR of T1 and T2 Ho: T1 Average AUC = T2 Average AUC = T3 Average AUC H4: T3 Average AUC > Average AUC of T1 and T2 All the above hypotheses are tested and verified based on the data available in Table 2, Table 3, Table 4, and Table 5 at the end of the paper. 3.4
Experimental Settings
The experimental procedure [9] is shown in Fig. 1. They work in the following order: • In both domains (i.e., DCT and Spatial), the Markov features are calculated. • The transition probability matrix is calculated using the threshold T, where T is a calculated value which is used to reduce the dimension of transition probability matrix. • PCA is used to reduces the dimensionality of feature. • To avoid biasness in the classification, tenfold cross-validation is used. • The dataset is break up randomly into ten different blocks. • Finally, the training phase occurs on nine blocks while the testing occurs on the remaining blocks. • The whole process repeated nine time by taking different block.
216
H. ur Rhhman et al.
Fig. 1. Experimental procedure block diagram [9]
4 Results and Discussion We evaluated the classifier performance using three different methods, first we used spatial method for Markov feature calculation, then used DCT method for Markov features. Moreover, we combined both methods and calculate the Markov features for different values of the threshold T and different values of the dimension. As shown in Figs. 2, 3, and 4, the results reveal that using the spatial base domain for Markov features has lower performance than DCT domain-based feature. Furthermore, the results show significant improvement by combining both domains. The best performance is attained when N = 50 and T = 4. We ran the experiments for different dimensions N = 150; 100; 50; and N = 30 for each technique in terms of Acc, TPR, TNR, and AUC for T = 3; 4; 10 and T = 15. In addition, we repeated all the experiments 20 times and averages these results. Moreover, we calculated other statistical values such as standard deviation, variance, and confidence interval etc. The overall results of these experiments provided in Table 2, Table 3, Table 4, and Table 5. Furthermore, with 50 features, the combined approach achieved 99% accuracy, 99.06% TPR, 99.59% TNR, and 100% AUC. The ROC curve also indicates the highest performance which is very close to the upper left corner as shown in Fig. 2.
Comparative Analysis of Various Image Splicing Algorithms
217
Finally, we tested four different Null hypotheses regarding detection accuracy, TPR, TNR, and AUC. We rejected all the Null hypotheses and accept the Alternative hypotheses by applying one factor ANOVA. The details and supporting information are given in Table 2, 3, 4, and Table 5 in the appendix. We calculated and prove statistically that in each case the F Calculated value is greater than the F Tabulated values with 95% confident interval and the P value is also less than 0.05. In addition to this, none of the three techniques’ confidence intervals intersects with each other as shown in Fig. 5, Fig. 6, Fig. 7, and Fig. 8. So, we can conclude that El et al. [9] approach is better than Shi et al. [10] and Zhongwei et al. [11] in terms of detection accuracy, TPR, TNR, and AUC. The summary of the three techniques are given in Fig. 4. Thread to Validity In our experiment, we used the freely available dataset called CISDE [14]. This dataset only contained grayscale images which have a dimension of each 128 * 128, which is a limitation for our result. For testing, we need another dataset in which they have a mixture of images and different dimension which also contain black and white and color images. 1 1
0.9 0.95
0.8 0.9
0.6 0.8
TPR
True positive rate (Sensitivity)
0.7 0.85
0.5
0.75
0.4 0.7
0.3 0.65
0.2 0.6
0.15
0.2 0.25 False positive rate (1-Specificity)
0.3
0.35
0.4
0
0.45
Fig. 2. Receiver operating characteristic curve for Markov features (T = 4 and N = 50).
0
0.1
0.2
0.3
0.4
0.5 FPR
0.6
DETECTION TECHNIQUES
TPR
TNR
Fig. 4. Three techniques comparison summary
AUC
0.9988
0.9562
0.8607
0.9906
0.9078
0.7880
0.9884
0.8929
0.7745
Spatial
ACCURACY
0.7
0.8
0.9
1
Fig. 3. Receiver operating characteristic curve for the proposed technique (T = 4 and N = 50).
0.9863
0.1
0.8783
0.05
0.7613
0
Shi(2007) [12] He(2012) [15] our method
0.1
Spatial (dotted line) DCT (dashed line) Combined (solid line)
0.55
218
H. ur Rhhman et al.
CI for Mean (95%) (Accuracy) 0.9700 Spatial
DCT
Spatial + DCT
0.9200 0.8700 0.8200 0.7700
Fig. 5. CI for Mean (Accuracy)
CI for Mean (95%) (TPR) Spatial
DCT
Spatial + DCT
0.9700 0.9200 0.8700 0.8200 0.7700 Fig. 6. CI for Mean (TPR)
CI for Mean (95%) (TNR) Spatial
DCT
Spatial + DCT
1.0000 0.9000 0.8000 0.7000 0
0.5
1
1.5
2
Fig. 7. CI for Mean (TNR)
2.5
3
3.5
Comparative Analysis of Various Image Splicing Algorithms
219
CI for Mean (95%) (AUC)) Series1
Series2
Series3
1.0000 0.9500 0.9000 0.8500 0.8000 0.7500 0.7000 Fig. 8. CI for Mean (AUC)
5 Conclusion and Future Work In this paper, we did some experiments on image tampering detection through SVM and Markov features. We run the experiments for each technique 20 times and obtain different results for detection accuracy, TPR, TNR, and AUC. As compared to the highest detection accuracy obtained until now using the same dataset with 50 features, our test results validate the best performance. Furthermore, we proved statistically that the proposed technique is significant regarding Accuracy, TPR, TNR, and AUC. In our future work, we will test our approach on different dataset having a different dimension of images for more reliable and robust results, because the current dataset having only images of grayscale with a dimension of (128 * 128). Acknowledgement. This work was supported from the project GUSV, “Intelligent techniques for medical applications using sensor networks”, project no. 10BM/2018, financed by UEFISCDI, Romania under the PNIII framework.
220
H. ur Rhhman et al.
Appendix Table 2. Summary of results for Markov features with threshold T = 4, Features Dimension = 50 Accuracy with Dimension 50 and T=4
# Experiment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Average
Spatial 0.7740 0.7767 0.7767 0.7756 0.7783 0.7680 0.7713 0.7762 0.7718 0.7740 0.7729 0.7724 0.7707 0.7751 0.7756 0.7794 0.7745 0.7718 0.7762 0.7783 0.7745
DCT 0.8949 0.8943 0.8911 0.8916 0.8921 0.8927 0.8943 0.8911 0.8965 0.8965 0.8943 0.8943 0.8911 0.8938 0.8932 0.8883 0.8916 0.8911 0.8943 0.8905 0.89
Spatial + DCT 0.9886 0.9881 0.9881 0.9881 0.9870 0.9870 0.9870 0.9892 0.9875 0.9902 0.9875 0.9902 0.9875 0.9897 0.9886 0.9897 0.9892 0.9870 0.9875 0.9902 0.9884
Standard Deviation
0.0029
0.002
0.0012
Simple Size Confidence Coff Margin of Error
20 1.96 0.0013
20 1.96 0.0009
20 1.96 0.0005
Upper Bound
0.7757
0.8938
0.9889
Lower Bound
0.7732
0.8920
0.9879
Max
0.7794
0.8965
0.9902
Min
0.7680
0.8883
0.9870
Range
0.0114
0.0081
0.0033
Comparative Analysis of Various Image Splicing Algorithms Combining spatial and DCT based Markov features for enhanced blind detection of image splicing Anova: Single Factor (Accuracy) SUMMARY Groups Spatial DCT Spatial + DCT
ANOVA Source of Variation Between Groups Within Groups Total
Null Hypothesis Ho: Conclusion:
Count
Sum 20 15.4894 20 17.8575 20 19.7680
SS 0.4594 0.0003 0.4597 μ1=μ2=μ3
df 2 57
Average 0.7745 0.8929 0.9884
Variance 8.408E-06 4.4305E-06 1.373E-06
MS F P-value 0.2297 48489.1318 8.2213E-93 4.7372E-06
F crit 3.15884272
59 T1, μ1= 0.77447 ± 0.0013 T2, μ2= 0.892872 ± 0.0009 T3, μ3= 0.9884 ± 0.0005
Rejected Ho
T1 = Shi et al. [10] T2 = Zhongwei et al. [11] T3 = El et al. [9] Ho: T1 Average detection accuracy = T2 Average detection accuracy = T3 Average detection accuracy H1: T3 Average detection accuracy > Average detection accuracy of T1 and T2. According to the statistic above the average detection accuracy between authentic and temper images of technique T3 is greater than T1 and T2, furthermore the F calculated value also grater then F tabulated value (48489.1318 > 3.15884272) and the P value is also less than 0.05. It means there is significant difference among the averages of T1 , T2 and T3. Therefore we reject the null hypotheses and accept the alternative hypotheses.
221
222
H. ur Rhhman et al.
Table 3. Summary of results for Markov features with threshold T = 4, Features Dimension = 50 TPR with Dimension 50 and T=4 # Experiment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Average
Spatial 0.7884 0.7873 0.7873 0.7840 0.7939 0.7862 0.7884 0.7862 0.7862 0.7840 0.7884 0.7851 0.7851 0.7884 0.7895 0.7928 0.7895 0.7906 0.7862 0.7928 0.7880
DCT Spatial + DCT 0.9112 0.9901 0.9112 0.9912 0.9112 0.9901 0.9035 0.9901 0.9079 0.9890 0.9057 0.9890 0.9112 0.9868 0.9057 0.9912 0.9123 0.9901 0.9101 0.9934 0.9046 0.9901 0.9123 0.9923 0.9068 0.9890 0.9079 0.9923 0.9090 0.9912 0.9002 0.9923 0.9068 0.9923 0.9068 0.9868 0.9112 0.9901 0.9002 0.9934 0.91 0.9906
Standard Deviation
0.0029
0.004
0.0019
Simple Size Confidence Coff Margin of Error Upper Bound
20 1.96 0.0012 0.7892
20 1.96 0.0016 0.9094
20 1.96 0.0008 0.9914
Lower Bound
0.7867
0.9062
0.9898
Max
0.7939
0.9123
0.9934
Min
0.7840
0.9002
0.9868
Range
0.0099
0.0121
0.0066
Comparative Analysis of Various Image Splicing Algorithms Combining spatial and DCT based Markov features for enhanced blind detection of image splicing
Anova: Single Factor (TPR) SUMMARY Groups
Count
Sum
Average
Spatial
20
15.7599
0.7880
DCT
20
18.1557
0.9078
Spatial + DCT
20
19.8114
0.9906
ANOVA Source of Variation Between Groups
SS 0.4149
df 2
Within Groups
0.0005
57
Total
0.4154
59
Null Hypothesis Ho: Conclusion:
μ1=μ2=μ3
MS 0.2075 8.419E06
Variance 8.13446E06 1.36555E05 3.46767E06
F P-value F crit 24642.3431 1.9312E-84 3.1588
T1, μ1=0.78799 ± 0.0012 T2, μ2=0.90778 ± 0.0016 T3, μ3=0.99057 ± 0.0008
Rejected Ho
T1 = Shi et al. [10] T2 = Zhongwei et al. [11] T3 = El et al. [9] Ho: T1 Average True Positive Rate (TPR) = T2 Average TPR = T3 Average TPR H2: T3 Average TPR > Average TPR of T1 and T2. According to the statistic above the average TPR of T3 is greater than T1 and T2, furthermore the F calculated value also greater than F tabulated value (24642.3431 > 3.1588). It means there is significant difference among the averages of T1 , T2 and T3. Therefore we reject the null hypotheses and accept the alternative hypotheses.
223
224
H. ur Rhhman et al.
Table 4. Summary of results for Markov features with threshold T = 4, Features Dimension = 50 TNR with Dimension 50 and T=4 # Experiment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Average Standard Deviation Simple Size Confidence Coff Margin of Error Upper Bound
Spatial 0.7599 0.7663 0.7663 0.7674 0.7631 0.7503 0.7546 0.7663 0.7578 0.7642 0.7578 0.7599 0.7567 0.7621 0.7621 0.7663 0.7599 0.7535 0.7663 0.7642 0.7613 0.0050 20 1.96 0.0022 0.7634
DCT 0.8789 0.8778 0.8714 0.8800 0.8767 0.8800 0.8778 0.8767 0.8810 0.8832 0.8842 0.8767 0.8757 0.8800 0.8778 0.8767 0.8767 0.8757 0.8778 0.8810 0.88 0.003 20 1.96 0.0013 0.8796
Spatial + DCT 0.9871 0.9850 0.9861 0.9861 0.9850 0.9850 0.9871 0.9871 0.9850 0.9871 0.9850 0.9882 0.9861 0.9871 0.9861 0.9871 0.9861 0.9871 0.9850 0.9871 0.9863 0.0010 20 1.96 0.0004 0.9867
Lower Bound
0.7591
0.8770
0.9858
Max
0.7674
0.8842
0.9882
Min
0.7503
0.8714
0.9850
Range
0.0171
0.0129
0.0032
Comparative Analysis of Various Image Splicing Algorithms Combining spatial and DCT based Markov features for enhanced blind detection of image splicing Anova: Single Factor (TNR) SUMMARY Groups Spatial DCT Spatial + DCT
Count
Sum 20 15.2251 20 17.5659 20 19.7256
ANOVA Source of Variation Between Groups Within Groups
SS 0.5066 0.0006
Total
0.5073
Null Hypothesis Ho: Conclusion:
μ1=μ2=μ3
df 2 57
Average 0.7613 0.8783 0.9863
Variance 2.453E-05 8.28E-06 1.04E-06
MS F 0.2533 22449.1855 1.12842E-05
P-value F crit 2.743E-83 3.1588
59 T1, μ1=0.7613 ± 0.0022 T2, μ2=0.8783 ± 0.0013 T3, μ3=0.9863 ± 0.0004
Rejected Ho
T1 = Shi et al. [10] T2 = Zhongwei et al. [11] T3 = El et al. [9] Ho: T1 Average True Negative Rate (TNR) = T2 Average TNR = T3 Average TNR H3: T3 Average TNR > Average TNR of T1 and T2. According to the statistic above the average TNR of T3 is greater than T1 and T2, furthermore the F calculated value also greater than F tabulated value (22449.1855 > 3.1588) and the P value is also less than 0.05. It means there is significant difference among the averages of T1 , T2 and T3. Therefore we reject the null hypotheses and accept the alternative hypotheses.
225
226
H. ur Rhhman et al.
Table 5. Summary of results for Markov features with threshold T = 4, Features Dimension = 50 AUC with Dimension 50 and T=4 # Experiment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Average Standard Deviation Simple Size Confidence Coff Margin of Error Upper Bound
Spatial 0.8734 0.9041 0.8838 0.8332 0.8684 0.8764 0.8631 0.8495 0.8398 0.8811 0.8662 0.8733 0.8942 0.8342 0.8685 0.8281 0.8027 0.8474 0.8495 0.8766 0.8607 0.0248 20 1.96 0.0109 0.8715
DCT 0.9588 0.9552 0.9569 0.9534 0.9589 0.9582 0.9553 0.9572 0.9559 0.9567 0.9524 0.9565 0.9586 0.9537 0.9541 0.9582 0.9590 0.9561 0.9575 0.9506 0.96 0.002 20 1.96 0.0010 0.9572
Spatial + DCT 0.9994 0.9984 0.9987 0.9991 0.9984 0.9986 0.9985 0.9967 0.9995 0.9989 0.9972 0.9986 0.9988 0.9983 0.9994 0.9999 1.0000 0.9985 0.9998 1.0000 0.9988 0.0009 20 1.96 0.0004 0.9992
Lower Bound
0.8498
0.9551
0.9984
Max
0.9041
0.9590
1.0000
Min
0.8027
0.9506
0.9967
Range
0.1014
0.0084
0.0033
Comparative Analysis of Various Image Splicing Algorithms
227
Combining spatial and DCT based Markov features for enhanced blind detection of image splicing
Anova: Single Factor (AUC) SUMMARY Groups Spatial DCT Spatial + DCT
Count
Sum Average Variance 20 17.2136 0.8607 0.0006 20 19.1232 0.9562 5.54463E-06 20 19.9766 0.9988 7.55568E-07
ANOVA Source of Variation Between Groups Within Groups
SS 0.2001 0.0118
Total
0.2120
Null Hypothesis Ho: Conclusion:
df 2 57
MS 0.1001 0.0002
F 483.2033
P-value F crit 1.803E-36 3.1588
59
μ1=μ2=μ3 Rejected Ho
T1, μ1=0.86068 ± 0.0109 T2, μ2=0.95616 ± 0.0010 T3, μ3=0.99883 ± 0.0004
T1 = Shi et al. [10] T2 = Zhongwei et al. [11] T3 = El et al. [9] Ho: T1 Average Area Under the Curve (AUC) = T2 Average AUC = T3 Average AUC H4: T3 Average AUC > Average AUC of T1 and T2. According to the statistic above the average AUC of T3 is greater than T1 and T2, furthermore the F calculated value also greater than F tabulated value (483.2033 > 3.1588) and the P value is also less than 0.05. It means there is significant difference among the averages of T1 , T2 and T3. Therefore we reject the null hypotheses and accept the alternative hypotheses.
References 1. Abrahim, A.R., Rahim, M.S.M., Sulong, G.B.: Splicing image forgery identification based on artificial neural network approach and texture features. Clust. Comput., 1–14 (2018) 2. Li, C., et al.: Image splicing detection based on Markov features in QDCT domain. Neurocomputing 228, 29–36 (2017) 3. Zeng, H., et al.: Image splicing localization using PCA-based noise level estimation. Multimed. Tools Appl. 76(4), 4783–4799 (2017) 4. Javaid, Q., Arif, M., Awan, D., Shah, M.: Efficient facial expression detection by using the Adaptive-Neuro-Fuzzy-Inference-System and the Bezier curve. Sindh Univ. Res. J.-SURJ (Sci. Ser.) 48(3) (2016) 5. Javaid, Q., Arif, M., Shah, M.A., Nadeem, M.: A hybrid technique for De-Noising multimodality medical images by employing cuckoo’s search with curvelet transform. Mehran Univ. Res. J. Eng. Technol. 37(1), 29 (2018) 6. ur Rahman, H., Azzedin, F., Shawahna, A., Sajjad, F., Abdulrahman, A.S.: Performance evaluation of VDI environment. In: 2016 Sixth International Conference on Innovative Computing Technology (INTECH), Dublin, pp. 104–109 (2016)
228
H. ur Rhhman et al.
7. Jingwei, H., Dake, Z., Xin, Y., Qingxian, W.: Image splicing detection based on local mean decomposition and moment features. Electron. Meas. Technol. 4, 033 (2017) 8. Arif, M., Abdullah, N.A., Phalianakote, S.K., Ramli, N., Elahi, M.: Maximizing information of multimodality brain image fusion using curvelet transform with genetic algorithm. In: 2014 International Conference on Computer Assisted System in Health (CASH), pp. 45–51. IEEE (2014) 9. El-Alfy, E.S.M., Qureshi, M.A.: Combining spatial and DCT based Markov features for enhanced blind detection of image splicing. Pattern Anal. Appl., 1–11 (2014) 10. Shi, Y.Q., Chen, C., Chen, W.: A natural image model approach to splicing detection. In: Proceedings of the 9th Workshop on Multimedia and Security, pp. 51–62 (2007) 11. Zhongwei, H., Lu Wei Sun, W.: Digital image splicing detection based on markov features in DCT and DWT domain. Pattern Recognit. 45(12), 4292–4299 (2012) 12. Qureshi, M.A., Deriche, M.: A bibliography of pixel-based blind image forgery detection techniques. Signal Process. Image Commun. 39, 46–74 (2015) 13. Zhao, X., Wang, S., Li, S., Li, J.: A comprehensive study on third order statistical features for image splicing detection. In: Digit Forensics Watermarking, pp. 243–256 (2012) 14. Ng, T.-T., Chang, S.-F., Sun, Q.: A data set of authentic and spliced image blocks, Columbia University, ADVENT Technical Report, pp. 203–204 (2004) 15. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002) 16. Srivastava, A., Lee, A.B., Simoncelli, E.P., Zhu, S.-C.: On advances in statistical modeling of natural images. J. Math. Imaging Vis. 18(1), 17 (2003) 17. Amali, G.B., Bhuyan, S.: Aju: design of image enhancement filters using a novel parallel particle swarm optimisation algorithm. Int. J. Adv. Intell. Parad. 9(5–6), 576–588 (2017) 18. Chizari, H., et al.: Computer forensic problem of sample size in file type analysis. Int. J. Adv. Intell. Parad. 11(1–2), 58–74 (2018) 19. Muhammad, A., Guojun, W.: Segmentation of calcification and brain hemorrhage with midline detection. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC). IEEE (2017) 20. Ai, D., et al.: A multi-agent system architecture to classify colour images. Int. J. Adv. Intell. Parad. 5(4), 284–298 (2013)
Classification of Plants Leave Using Image Processing and Neural Network Hashem Bagherinezhad1, Marjan Kuchaki Rafsanjani1(&), Valentina Emilia Balas2, and Ioan E. Koles2 1
2
Department of Computer Science, Faculty of Mathematics and Computer, Shahid Bahonar University of Kerman, Kerman, Iran [email protected], [email protected] Department of Automatics and Applied Software, Faculty of Engineering, Aurel Vlaicu University of Arad, Arad, Romania [email protected], [email protected]
Abstract. Plants are one of the most widely used resources for humans in different fields. Therefore, the distinction between the plant species is important and it is referred to as the plant detection system. Until now, this task has been done by the expert botanists which is an overwhelming and time consuming task. Moreover, there is a lack of the memory and the human fault, so the researchers endeavored to solve these disadvantages using the AI algorithms. For this goal, in this paper, a system is proposed that includes four phases: preprocessing, feature extraction, training, and test. In this method, we use the combination of the useful features of the leave shape, the leave texture, the leave color, and we provide a method for the classification of a number of the plant species. Finally, the feature vectors will be created and then, the classification is performed by using feed-forward back-propagation multi-layer perceptron artificial neural network algorithm. Then, the results of this method compare with other methods. The obtained results show the high accuracy of this method for a large number of the species in different conditions (such as pests, season changes and lighting). Keywords: Accuracy Feature extraction
Artificial Neural Network (ANN) Leave detection
1 Introduction Plants are one of the most important and most used resources for the humans on earth, which they use the planets in different fields such as the botany and the therapeutics, so the first need in this field is the detection and the distinction between different plant species, which in many, there are so many similarities between them. This system is referred to as the plant detection system [10]. The necessity of the existence of this system is that, if in the detection, a toxic plant species is detected as non-toxic, then it will result in the irreversible consequences. So far, the response to this need has been done by the botanists and the experts, which they examined the plant manually and then they detected according to their experiences and specialties. This is not a decent © Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 229–238, 2021. https://doi.org/10.1007/978-3-030-52190-5_16
230
H. Bagherinezhad et al.
method because one person cannot remember all the species in the world and also the timing of the examination and the detection along with human error. Therefore, the researchers have tried to simulate the performance and the detection of the botanist by artificial intelligence algorithms and also they have tried to improve the mentioned disadvantages. The performed research has shown that for determining the type of the plant, the study of the leave features is sufficient and necessary [9]. The framework for the plant species detection systems is shown in Fig. 1. In this method, there are 4 main steps for detecting the plant species: 1) pre-processing; 2) feature extraction; 3) training, 4) test. Before extracting the features and in the preprocessing step, we do the below pre-processing on the images: i) image de-noising; ii) image segmentation, and so on. After the pre-processing step, the feature extraction operation is performed. The extractable features from the leave are divided into three types: the leave shape features, the leave texture features, the leave color features. In the training and test steps, the feature vectors are formed by using all extracted features and these feature vectors are classified by the classifier and the result of the detection is obtained [1].
Fig. 1. The framework for the plant species detection systems.
Leave recognition can be based on various features of the image. Here, in brief, we describe a few studies in this field. In [2], a new method including the segmentation, the combination of the feature extraction and the classification method (using linear discriminant analysis) is presented to identify the plant species from the leave. This method is evaluated on two datasets: Flavia [4] and Leafsanp [5]. It yielded accuracy of 93% and 71%, respectively. The presented method in [11] used four basic characteristics of the leave (leave length, leave width, leave area and leave diameter). In this method, the k-nearest neighbor is used. This method has been evaluated on 32 plant species with an accuracy of 93.17%. In [7], 9 features from the set of features related to the leave shape are used and also this method used the move median centers for classifying. The evaluation of this method on the collected dataset by the authors yielded an accuracy of 91%. The presented method in [8] is based on the characterization of the leave texture features. In this method, a combined classifier is used which is based on radial basis function and learning vector quantization. The results of the evaluation of this method on 60 images represent an accuracy of 98.7%. The presented method in [17] has used the features of the leave shape (in addition to the features of the leave texture) to detect the type of plant species. The study considers 5 features from the total of these two categories and also it uses Pulse-Coupled Neural Networks. To evaluate this method, three datasets were used: Flavia, ICL [6] and MEW2012 [3]. It yielded accuracy of 96.67%, 91.56% and 91.2%, respectively.
Classification of Plants Leave Using Image Processing and Neural Network
231
The structure of this paper is as follows. In Sect. 2, the details of the proposed method will be provided. Section 3 presents the evaluation of the results, and in Sect. 4, we will conclude the article.
2 The Proposed Method In this method, there are 4 main steps to detect the plant species: (a) pre-processing, (b) feature extraction (c) training, (d) test. In the feature extraction step, 17 features are used which is a combination from three feature categories. For the training and test steps of the proposed method, we use ANN. 2.1
Pre-processing
The pre-processing for the plant detection system depend on the type of the selected dataset and also the extracted features in the feature extraction phase. We apply the below pre-processing to the images: Step 1: Convert the RGB image to the grayscale image: The color of the leaves is usually green, but sometimes their color changes due to some reason such as the water changes, the seasonal changes, and so on. Hence, the features of the leave color have a low reliability. Therefore, the information of the leave color may be omitted. In this paper, the RGB images are transformed into the grayscale images by using the following equations: gray ¼
red 299 þ green 587 þ blue 114 100
ð1Þ
In the Eq. (1), the RGB weights are defined to delete the color of the main image while maintaining its luminance [15]. Step 2: Image de-noising: Since the leave images are obtained using the cameras, the scanners or other devices, so it may that there are the limitations or the noise in images. We use the median filter method to reduce noise which is a nonlinear digital technique. Image de-noising is a common step for image pre-processing which is used to improve the obtained results in the next steps. The median filter method is a common method in the digital image processing because it eliminates the noise while maintaining the edges [13]. The basic idea of the median filter method is that it uses m n neighbor. It works as follows: it arranges all neighborhoods of a pixel as ascending and then replaces the median element of these numbers with the central pixel. The neighborhoods pattern is called the window. For one-dimensional signals, the most obvious window is some previous and next inputs, while for two-dimensional signals (or higher), more complex window patterns will be created. Note that if the input number of the windows is equal to an odd number, then the median element is simply defined, but in the even number of inputs, there will be more than one median element [13].
232
H. Bagherinezhad et al.
Step 3: Segmentation: The image segmentation is done to amputate the leave from the background such that the resulted image can be used to extract the features of the leave shape. The output of the image segmentation is a binary image in which the leave is shown with 1 and the background is shown with 0. The selection of the segmentation method depends on the evaluated dataset because there are two types of the backgrounds: the complex background (Fig. 2a) and the simple background (Fig. 2b) [14]. Since our evaluated dataset is Flavia, and since Flavia have the simple background, we use the iterative threshold selection method which is provided in [14].
Fig. 2. (a) Leave with complex background; (b) Leave with simple background.
Step 4: Edge detection: The latest pre-processing in this article is the edge detection of the leave. This pre-processing is required to compute the leave perimeter and some other features that these features are used to extract some provided features in the next section [16]. Figure 3 shows the result of the pre-processing step on the Flavia dataset.
Fig. 3. The performed pre-processing on Flavia dataset: (a) the main image; (b) the image after applying the pre-processing of the image conversion to grayscale image; (c) the image after applying the image de-noising pre-processing; (d) the image after applying the image segmentation pre-processing; (e) the image after applying edge detection pre-processing.
2.2
Feature Extraction
The extractable features from the leave image are divided into three categories: the leave shape features, the leave texture features, the leave color features. In each of the executed researches, a number from these features have been used. In this section, we introduce the used features [16].
Classification of Plants Leave Using Image Processing and Neural Network
233
• Aspect ratio: it shows the ratio of the width to the length of the outer rectangle of the leave contour. It is described as below: RAr ¼ W=L
ð2Þ
C ¼ P2 =4pA
ð3Þ
• Circularity: it described as below:
which A is the area and P is the environment that show the number of pixels in the goal area and the number of pixels in the goal area boundary and are described as below: A¼
X
I ði:jÞ
ð4Þ
pffiffiffi P ¼ Ne þ 2No
ð5Þ
i:j
where I ði:jÞ is a pixel of image, Ne is the even-numbered chain code number, and No the odd-numbered chain code number. • Area convexity: It is the ratio between the area of the goal and the area of its convex hull which is described as below: RAConv ¼ A=AC
ð6Þ
where AC is the area of the convex hull (the least convex hull that includes all the pixels of an object). • Rectangularity: Rectangularity represents the similarity between the target region and its rectangle and it is described as below: R ¼ A=AR
ð7Þ
where AR is the rectangular area (the smallest enclosing rectangle of the goal area). • Perimeter Ratio of Physiological Length and Width: It is described as below: RPR ¼ P=L þ W
ð8Þ
• Zernike moment: it is as below: Zpq ¼
pþ1XX f ðx:yÞ Vpq ðx:yÞ p x y
ð9Þ
which V is the complex conjugate of V, Vpq ðx:yÞ ¼ Upq ðr cos h:r sin hÞ ¼ Rpq ðr Þeiqh
ð10Þ
• Energy: The energy shows the uniformity degree of the grayscale image which is described as below:
234
H. Bagherinezhad et al.
En ¼
G1 X G1 X
P2 ði:jÞ
ð11Þ
i¼0 j¼0
• Entropy: it shows the complexity or the non-uniformity of the image texture and its equation is as follows: Ent ¼
G1 X G1 X
P2 ði:jÞ log2 Pði:jÞ
ð12Þ
i¼0 j¼0
• Correlation: The correlation is applied to obtain the elements similarity of a Gray Level Co-occurrence Matrix (GLCM) to the row or column which is described as below: PG1 PG1 Cor ¼
i¼0
j¼0
ijPði:jÞ u1 u2
r21 r22
ð13Þ
where: u1 ¼
G1 X
iPði:jÞ
ð14Þ
jPði:jÞ
ð15Þ
i¼0
u2 ¼
G1 X j¼0
r21 ¼
G1 G1 X X ð i u1 Þ 2 Pði:jÞ i¼0
r22 ¼
ð16Þ
j¼0
G1 G1 X X ð j u2 Þ 2 Pði:jÞ j¼0
ð17Þ
i¼0
• Average: It is a measure of the mean luminance of the texture which is described as below: l¼
G1 X
zi Pðzi Þ
ð18Þ
i¼0
• Inverse difference moment: This property is also known as homogeneity which is described as below:
Classification of Plants Leave Using Image Processing and Neural Network
IDM ¼
G1 X G1 X
Pði:jÞ
i¼0 j¼0
1 þ ði jÞ2
235
ð19Þ
• Maximum probability: It is the strongest answer for the GLCM which is described as below: Pmax ¼ max½Pði:jÞ i:j
ð20Þ
• Uniformity: Its equation is as below: U¼
G1 X
P2 ðzi Þ
ð21Þ
i¼0
• Mean of color: The average is the most general feature for the data. For the color of the images, the average is used to describe the color average which it is as below: l¼
M X N 1 X Pði:jÞ MN i¼1 j¼1
ð22Þ
• Standard deviation of color: It is defined as below: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M X N u 1 X r¼t ½Pði:jÞ l2 MN i¼1 j¼1
ð23Þ
• Skewness of color: It shows the shape information about the color distribution. Also, it is a criterion for color symmetry. It is described as below: PM PN h¼
i¼1
j¼1 ½Pði:jÞ MNr3
l 3
ð24Þ
where r is the standard deviation of the color. • Kurtosis of color: It shows the shape distribution of information about the color. Also, it is a criterion for normal distribution with respect to sharpness or smoothness. It is described as below: PM PN c¼
2.3
i¼1
j¼1 ½Pði:jÞ MNr4
l 4
ð25Þ
Training
The ANN is applied for the training step of the proposed method. The dataset consists of two main parts: 1) the training part; 2) the test part. First, the appropriate network
236
H. Bagherinezhad et al.
design should be specified. The type of neural network is feed-forward MLP. Also, we use error back-propagation algorithm which is best known sample of an easy training algorithm. One of the important parameters of this algorithm is its epoch. In each epoch, the training part is given to the network and then, the real outputs and targets are contrasted and the error is computed. The surface gradient error is measured for adjusting the new weights and the procedure is iterated. The primary weights of the network are drawn from a predefined range and also, they randomly selected. The training step ends with the end of the number of epochs (or the error reaches a certain level or the performance improvement stops due to the error) [15]. To train the proposed method, an ANN with three layers of neurons (two hidden layers and one output layer) is used in which the neurons of the input layer are linear, while the output layer neurons and hidden layer neurons are sigmoid functions. To train the ANN, we also use the Levenberg–Marquardt function. The number of neurons in the input layer is equal to the number of features. Also, the output layer has 32 neurons and the hidden layers have 20 neurons (per layer). The primary weights are randomly selected and the process is repeated to calculate the final weights. After calculating the final weights, these weights are stored to classify new samples. 2.4
Test
In the previous section, the details for training the neural network were presented. As stated, the inputs of this method are the features, and its output is the type of the plant. The training operation is terminated by satisfying one of the mentioned conditions in above. After terminating the training operation, the initial weights are closed to the final values. These final weights are used to classify new samples. The training operation is done by using a training dataset. In the final step, the classification of new samples and the evaluation of the performance of the method is done by using the test dataset and the trained neural network model. The new samples are given as input vectors to the model and their output is calculated according to this model. Then, the performance of the method is evaluated by comparing this value and the real value of this input sample. We represent the results of the evaluations in the next section.
3 Evaluation Results As previously stated, the used dataset is Flavia. The images in this dataset are the scanned images from plant leave so their background is white. This dataset contains 1907 images that cover 32 plant species [4]. Figure 4 represents the sample of the leaves in the Flavia and their related images. We implemented our proposed method in MATLAB software.
Classification of Plants Leave Using Image Processing and Neural Network
237
Fig. 4. Images related to the sample of the leaves in the Flavia dataset.
Figure 5 shows the accuracy of the proposed method in comparison with the presented methods in [2] and [12]. These results are obtained from various conducted experiments with different percentages for training and test parts. The proposed results are an average of 20 runs on the Flavia dataset. Figure 5 illustrates the accuracy of our method in comparison with the presented methods in [2] and [12]. 96 94 92 90 88 86 84 82 80 Method in [12] Method in [2] 10% test
88.7
92.6
Proposed Method 95.1
15% test
86.3
92.1
94.9
20% test
88.6
92.2
95
Fig. 5. The results comparison of our method with other methods.
4 Conclusion In this article, we provided a leave detection system to determine the type of plant species. Our proposed method consists of four phases: 1) pre-processing; 2) feature extraction; 3) training, 4) test. In extraction feature phase, we used the combination of different feature categories to improve the accuracy of the detection. In the training phase, we used the ANN. By applying the features of all three categories, we observed that the accuracy of the detection was improved. One of the reasons for this is that the use of the leave texture features can be effective in improving the detection. Also, we
238
H. Bagherinezhad et al.
used the neural networks to train that have better performance. The simulation results and the performed experiments display that the proposed method is superior than other methods. Although, the proposed method is good, but there is still the probability for the improvement. For example, the use of the combined classifier can help to improve the training step and the test step. Also, the use of different methods for the segmentation has a direct effect on the results of the feature extraction step as well as the method final performance. In addition, other methods can be used to select the optimal subset of the features, because this can have a direct impact on the accuracy of the method, especially the run time.
References 1. Arunpriya, C., Thanamani, A.S.: A novel leaf recognition technique for plant classification. Int. J. Comput. Eng. Appl. 4(2), 42–55 (2014) 2. Kalyoncu, C., Toygar, Ö.: Geometric leaf classification. Comput. Vis. Image Underst. 133, 102–109 (2015) 3. http://b2find.eudat.eu/dataset/. Available 18 July 2012 4. http://flavia.sourceforge.net. Available 24 Dec 2009 5. http://leafsnap.com/dataset. Available 11 July 2014 6. http://www.intelengine.cn/English/dataset/indexxx.html. Available 20 May 2011 7. Du, J.X., Wang, X.F., Zhang, G.J.: Leaf shape based plant species recognition. Appl. Math. Comput. 185, 883–893 (2007) 8. Ghaseb, M.A.J., Khamis, S., Mohammad, F., Fariman, H.J.: Feature decision-making ant colony optimization system for an automated recognition of plant species. Expert Syst. Appl. 42, 2361–2370 (2015) 9. Amlekar, M., Manza, R.R., Yannawar, P., Gaikwad, A.T.: Leaf features based plant classification using artificial neural network. IBMRD’s J. Manag. Res. 3(1), 224–232 (2014) 10. Rashad, M.Z., El-Desouky, B.S., Khawasik, M.S.: Plants images classification based on textural features using combined classifier. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 3(4), 93–100 (2011) 11. Nidheesh, P., Rajeev, A., Nikesh, P.: Classification of leaf using geometric features. Int. J. Eng. Res. Gen. Sci. 3(2), 1185–1190 (2015) 12. Hu, R., Jia, W., Ling, H., Huang, D.: Multiscale distance matrix for fast plant leaf recognition. IEEE Trans. Image Process. 21(11), 4667–4672 (2012) 13. Huang, T., Yang, G., Tang, G.: A fast two-dimensional median filtering algorithm. IEEE Trans. Acoust. Speech Signal Process. 27(1), 13–18 (1979) 14. Wang, X.F., Du, J.X., Zhang, G.J.: Recognition of leaf images based on shape features using a hypersphere classifier. In: Advances in Intelligent Computing, Lecture Notes in Computer Science, vol. 3644, pp. 87–96 (2005) 15. Husin, Z., Shakaff, A.Y.M., Aziz, A.H.A., Farook, R.S.M., Jaafar, M.N., Hashim, U., Harun, A.: Embedded portable device for herb leaves recognition using image processing techniques and neural network algorithm. Comput. Electron. Agric. 89, 18–29 (2012) 16. Wang, Z., Li, H., Zhu, Y., Xu, T.F.: Review of plant identification based on image processing. Arch. Comput. Methods Eng. 24(3), 637–654 (2017) 17. Wang, Z., Sun, X., Zhang, Y., Ying, Z., Ma, Y.: Leaf recognition based on PCNN. Neural Comput. Appl. Forum 27, 899–908 (2016)
Computational Intelligence Techniques, Machine Learning and Optimization Methods in Recent Applications
A Comparative Study of Audio Encryption Analysis Using Dynamic AES and Standard AES Algorithms Amandeep Singh1,2 , Praveen Agarwal3 , and Mehar Chand4(B) 1
Department of Computer Science, Baba Farid College, Bathinda 151001, India [email protected] 2 Department of Computer Science, Singhania University, Pacheri Bari, Jhunjhunu, India 3 Department of Mathematics, Anand International College of Engineering, Jaipur 303012, India [email protected] 4 Department of Mathematics, Baba Farid College, Bathinda 151001, India [email protected]
Abstract. In this paper, we are using Dynamic key dependent AES algorithm and standard AES (Advanced Encryption Standards) to encrypt and decrypt audio file. To analysis the quality of algorithms both algorithms are successfully tested using histogram analysis, peak signal to noise ratio (PSNR), correlation analysis and entropy.
1
Introduction
Cryptography provides a secure communication between two parties without the knowledge of any third party or hacker. Cryptography consists of two processes called encryption and decryption. The technique to convert plain text (original data) into cipher-text is known as encryption of data. The technique to convert cipher-text into plain text is known as decryption of data. The strength of cryptography is to provide data confidentiality, data integrity, authentication and non repudiation. Cryptography consist of symmetric key cryptography and asymmetric key cryptography. Symmetric key cryptography encryption and decryption is done by using single key [1]. It is also known as conventional encryption. Encryption algorithm is used to encode the plain text or original text into cipher-text using a secret key. Decryption algorithm is used to decode the encoded text or cipher text into plain-text using the same secret key. Some of the most popular symmetric ciphers are Data Encryption Standards (DES), 3DES, Advanced Encryption Standards (AES), Blowfish, Rivest Cipher (RC4), RC6, Twofish and Threefish. In asymmetric key encryption uses the concept of public and private key to encrypt and decrypt the data [2]. Encryption algorithm is used to encode the plain text or original text into cipher-text using one of the keys. Decryption algorithm is used to decode the encoded text or cipher text into plain-text using c Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 241–249, 2021. https://doi.org/10.1007/978-3-030-52190-5_17
242
A. Singh et al.
the other key. Asymmetric encryption ensures confidentiality, authentication or both. Some of the most popular asymmetric ciphers are Elliptic Curve Cryptosystem (ECC), Rivest-Shamir-Adelman (RSA), Diffie-Hellman key exchange, ElGamal Cryptosystem.
2
Literature Review
Some of the research papers are explored in concern with audio encryption. Diaa S. et al. [3,4] compared the performance of algorithms like DES, AES, RC2, RC6, 3DES, and Blowfish and found that AES gives better performance. Akash K. M. et al. [5] compared AES and DES on strict avalanche criterion (SAC), memory management, and time taken for data encryption. AES has good memory management and faster encryption capabilities over DES. Verma O. P. et al. [6] compared AES, DES, and Blowfish on different software and hardware with varying data sizes. The Blowfish came out as the best algorithm in terms of securing data from unauthorized attacks. But Blowfish suffers with its weak key problem. Radha A. N. et al. [7] presented audio encryption techniques classified into complete encryption, selective encryption and combined compression-encryption approach. Sruthi B. A. et al. [8] used full encryption approach for encrypting multimedia data. The algorithm is successful and took some delay. The authors used AES algorithm and extracted secret key with iris features. They converted audio signals into binary. The 128 bit key is selected with iris feature with more randomness to enhance the security. Ganesh Babu S. et al. [9] proposed a higher dimensional chaotic system to encrypt audio data. Algorithm is capable to resist known/chosen plain-text attack. The algorithm is sensitive to key change, when a single bit of key is changed it generates a different encrypted audio data. Pavithra S. et al. [10] compared AES, DES and Blowfish on the basis of throughput and processing time for audio and video files. The result is that AES performed better. Bismita G. et al. and Raghunandhan K. R. et al. [11] applied selective AES encryption on audio files. First audio file compressed and selective encryption is used. They concluded that selective encryption is better in terms of performance. Raghunandhan K. R. et al. [12] proposed an efficient encryption technique based on transposition and substitution ciphers. Majdi A. et al. [13] performed AES on quantized audio data. The encryption technique applied to the entire audio data, which results that AES is highly resistant to cryptographic attacks. Saurabh S. et al. [14] applied selective encryption technique and compared the same with full encryption techniques. They found that the performance of selective encryption technique is good as compared to full encryption. Shine P. J. et al. [15] suggested that have suggested that the selective audio encryption method is best over block ciphers like DES, AES and public key systems due to implementation costs. After studying various research papers we found that two techniques can be implemented to encrypt audio data first is full encryption and second is selective encryption. In selective encryption overhead in selecting important data to encrypt is evolved with some compromise with security. So we found that AES
A Comparative Study of Audio Encryption Analysis
243
is already a strong algorithm in terms of security as it resists almost all statistical and algebraic attacks. In this paper we introduced a dynamic AES algorithm. As AES uses fixed S-Box in all rounds. The Dynamic AES algorithm enhances the security of exiting AES against various algebraic attacks. Earlier various algebraic attacks tested on AES to cryptanalysis the algorithm. The various algebraic attacks are linear cryptanalysis, differential cryptanalysis, Boomerage Attacks, Interpolation attacks, Slide attack, Multiset attacks include round 4, 6, 7, 8 and 9 rounds, XL and XSL attacks. The Dynamic AES algorithm makes all the attacks very difficult. We encrypted audio files with both the algorithms and results are compared.
3
Proposed Dynamic AES Algorithm
In this paper we developed a new Dynamic AES in which we not only make S-Boxes key dependent but also dependent on dynamic irreducible polynomial and affine constant. The proposed Dynamic Key Dependent S-Box algorithm is a permutation of existing AES S-Box. The Dynamic AES algorithm is dependent on three parameters i.e. key values, irreducible polynomial and affine constant [17]. • Key The construction of S-Box is dependent on key values. S-Box is highly sensitive to key values, if there is a single bit of key change then all values of S-Box will be permuted. • Irreducible Polynomial In existing AES only one irreducible polynomial is used to generate S-Box. But there are other irreducible polynomials which may be used to construct S-Boxes. In Dynamic AES we used all possible irreducible polynomials in random way to generate dynamic S-Boxes. • Affine Constant In existing AES only one affine constant i.e. 63 is used to generate S-Box. But there are total 256 affine constants, which may be used to construct S-Boxes. In Dynamic AES we used all possible affine constants in random way to generate dynamic S-Boxes. By using above methodology we developed Dynamic AES algorithm generates dynamic S-Boxes, which are highly sensitive to these three inputs. By using this methodology the algorithm is capable to generate 256! dynamic S-Boxes. Dynamic AES for encryption and decryption of audio files is shown in Fig. 1 and Fig. 2.
4
Sound Files Analysis with AES and Dynamic AES
Here, we implemented and analyzed new Dynamic AES algorithm to encrypt and decrypt sound files. The performances standard AES and Dynamic AES are compared on the basis of parameters like histogram analysis, Correlation Analysis, Peak signal to noise ratio (PSNR), audio entropy. In this test we are using two sound files of wav format named ‘welcome.wav’ and ‘days.wav’ of size 473 kbytes and 339 kbytes respectively. Both the files are encrypted with standard AES and Dynamic AES.
244
A. Singh et al. Original Audio File
Dynamic AES Encryption Add Round Key
1st Round
Dynamic S-Box Shift Row Mix Column Add Round Key
Repeat Nr1 Rounds
Dynamic S-Box Shift Row Add Round Key
Last Round
Encrypted Audio File
Fig. 1. Dynamic AES algorithm to encrypt audio files Encrypted Audio File
Dynamic AESDecryption Add Round Key
1st Round
Inverse Dynamic S-Box Inverse Shift Row Inverse Mix Column Add Round Key
Repeat Nr1 Rounds
Inverse Dynamic S-Box Inverse Shift Row Add Round Key
Last Round
Decrypted/Original Audio File
Fig. 2. Dynamic AES algorithm to decrypt audio files
4.1
Histogram Analysis
Histogram analysis of ‘welcome.wav’ is presented in Fig. 3, 4 and 5, in which Fig. 4 presents audio signal of original ‘welcome.wav’, Fig. 4 presents encrypted audio signal using standard AES algorithm and Fig. 5 shows encrypted audio signal using Dynamic AES. By analyzing both encrypted histogram we can see that both do not show any sign of original audio signal. In histogram analysis of ‘days.wav’ is presented in Fig. 6, 7 and 8, in which Fig. 6 shows audio signal of original ‘days.wav’, Fig. 7 shows encrypted audio signal using standard AES algorithm and Fig. 8 shows encrypted audio signal
A Comparative Study of Audio Encryption Analysis
245
Original Sound File
0.5
0.4
0.3
Audio Signal
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4 0
0.5
1
1.5
2
2.5
3 secs
2
2.5
3 secs
Time
Fig. 3. Original audio file Encrypted Sound File
1
0.8
0.6
Audio Signal
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 0
0.5
1
1.5
Time
Fig. 4. Encrypted audio file by standard AES algorithm Encrypted Sound File
1
0.8
0.6
Audio Signal
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 0
0.5
1
1.5
2
2.5
Time
3 secs
Fig. 5. Encrypted audio file by dynamic AES algorithm
using Dynamic AES. By analyzing both encrypted histogram we can see that both do not show any sign of original audio signal. In above histogram analysis of both the audio files we found that both algorithms worked well to resist statistical attacks because the encrypted audio signal does not proved any information of original audio signal in both the files.
246
A. Singh et al. Original Sound file
0.8
0.6
Audio Signal
0.4
0.2
0
-0.2
-0.4
-0.6 0
1
2
3
4
5
6
7
8 secs
6
7
8 secs
Time
Fig. 6. Original audio file Encrypted Sound File
1
0.8
0.6
Audio Signal
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 0
1
2
3
4
5
Time
Fig. 7. Encrypted audio file by standard AES algorithm
4.2
Correlation Analysis
The correlation represents the positive, negative or no relationship between the original data and encrypted data. The value of correlation ranges between −1 and +1. If value of correlation closes to 0 then there is no or less relation between original data and encrypted data. If value of correlation is equal to +1 that means there is strong positive relationship between original data and encrypted data and if it is −1 that means there is strong negative relationship between original data and encrypted data. Ideally for good encryption quality it should be equal or close to 0. In case of both audio files the correlation between their original data and encrypted data is close to 0. So we found both the algorithms performed well. The correlation between original data and encrypted one is represented by Eq. 1, where correlation r between two audio files stored in vectors X and Y respectively is calculated. The correlation of ‘welcome.wav’ and ‘days.wav’ with standard AES and Dynamic AES is presented in Table 1 r=
n (X[i] − X)(Y [i] − Y ) Σi=1 n (X[i] − X)2 Σ n (Y [i] − Y )2 Σi=1 i=1
.
(1)
A Comparative Study of Audio Encryption Analysis
247
Encrypted Sound file
1
0.8
0.6
Audio Signal
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 0
1
2
3
4
5
6
7
Time
8 secs
Fig. 8. Encrypted audio file by dynamic AES algorithm Table 1. Correlation coefficient analysis Sound files
Encrypted sound files AES Dynamic AES
Welcome.wav −0.0017 −0.0022 Days.wav
4.3
−0.0164 −0.0124
Randomness Analysis
Randomness of encrypted data is found by calculating entropy of encrypted data. The higher value of entropy shows the high randomness in encrypted data which leads to resists to statistical attacks. Entropy is calculated by Eq. 2.
En(y) = −
N
(pi (y))2 (log2 pi (y)))2 .
(2)
i=1
The entropy value should be closer to value 8 [16]. If the value is closer to 8 the quality of encrypted data is higher. Table 2 presents the entropy analysis of original and encrypted ‘welcome.wav’ and ‘days.wav’ files, which are encrypted by standard AES and Dynamic AES. The results are close to 8, which shows the higher encryption complexity. 4.4
Peak Signal to Noise Ratio (PSNR) Analysis
Peak Signal to Noise Raito (PSNR) is calculated to check the ratio of signal and noise in data. In case of data encryption the encrypted data should have lower PSNR to ensure good data encryption quality and more resistant to statistical attacks. To calculate PSNR first we need to calculate MSE (Mean Square Error). The MSE of two audio files X and Y, where X is original audio file vector and Y is encrypted audio file vector is calculated with Equation MSE and PSNR for same vectors are calculated with Eq. 4. The PSNR values for both audio files are shown in Table 2.
248
A. Singh et al. n
M ES =
1 (X[i] − Y [i])2 , n i=1
P SN R = 10 log10
(3)
M AX 2 . M SE
(4)
Table 2. Sound encryption quality analysis Algorithm
Sound files
AES
Entropy (original files)
Entropy (encrypted files)
PSNR (peak signal to noise ratio)
Welcome.wav 3.1059 Days.wav 1.4750
7.9998 6.7017
5.9638 2.7644
Dynamic AES Welcome.wav 3.1059 Days.wav 1.4750
7.9998 6.7111
5.9728 2.9750
In Table 2 the PSNR values for ‘welcome.wav’ for both standard AES and Dynamic AES are 5.9630 and 5.9728 respectively and for ‘days.wav’ for both standard AES and Dynamic AES are 2.7644 and 2.9750, which are very less and resistant to statistical attacks.
5
Conclusion
We introduced algorithm which generates dynamic key dependent S-Boxes with dynamic irreducible polynomial and affine constant. New Dynamic AES and standard AES algorithms are used to encrypt and decrypt audio files. It is observed that both algorithms performed well in terms of histogram analysis, correlation analysis, entropy analysis, PSNR tests. The results of both algorithms are comparable. But the Dynamic AES provides more security due to the fact that it uses dynamic S-Boxes which are not known to the attackers in advance.
References 1. Forouzan, B.: Traditional symmetric-key cipher. In Introduction to Cryptography and Network Security, 1st edn., pp. 55–96. McGraw-Hill, New York (2008) 2. Forouzan, B.: Traditional asymmetric-key cryptography. In Introduction to Cryptography and Network Security, 1st edn., pp. 293–335. McGraw-Hill, New York (2008) 3. Diaa, S., Hatem, A.K., Mohiy, M.H.: Evaluating the effects of symmetric cryptography algorithms on power consumption for different data types. IJNS 11(2), 78–87 (2010)
A Comparative Study of Audio Encryption Analysis
249
4. Diaa, S., Hatem, A.K., Mohiy, M.H.: Evaluating the performance of symmetric encryption algorithms. IJNS 10(3), 213–219 (2010) 5. Akash, K.M., Chandra, P., Archana, T.: Performance evaluation of cryptographic algorithms: DES and AES. In: IEEE Students’ Conference on Electrical, Electronics and Computer Science (2011) 6. Verma, O.P., Ritu, A., Dhiraj, D., Shobha, T.: Performance analysis of data encryption algorithms. IEEE (2011) 7. Radha, A.N., Venkatesulu, M.: A complete binary tree structure block cipher for real-time multimedia. In: Science and Information Conference (2013) 8. Sruthi, B.A., Karthigaikumar, P., Sandhya, R., Naveen, J.K., Siva Mangai, N.M.: A secure cryptographic scheme for audio signals (2013) 9. Ganesh Babu, S., IIango, P.: Higher dimensional chaos for audio encryption. IEEE (2013) 10. Pavithra, S., Ramadevi, E.: Throughput analysis of symmetric algorithms. IJANA 4(2), 1574–1577 (2012) 11. Bismita, G., Chittaranjan, P.: Selective Encryption on MP3 compression. MES J. Technol. Manag. (2011) 12. Raghunandhan, K.R., Radhakrishna, D., Sudeepa, K.B., Ganesh, A.: Efficient audio encryption algorithm for online applications using transposition and multiplicative non-binary system. IJERT 2(6) (2013) 13. Majdi, A., Lin, Y.H.: Simple encryption/decryption application. IJCSS, 1(1) (2007) 14. Saurabh, S., Pushpendra, K.P.: A study on different approaches of selective encryption technique. IJCSCN 2(6), 658–662 (2012) 15. Shine, P.J., Sudhish, N.G., Deepthi, P.P.: An audio encryption technique based on LFSR based alternating step generator. IEEE Connect (2014) 16. Shannon, C.E.: Communication theory of secrecy systems. Bell Labs Tech. J. 28, 656–715 (2006) 17. Agarwal, P., Singh, A., Kilicman, A.: Development of key dependent dynamic SBoxes with dynamic irreducible polynomial and affine constant. Adv. Mech. Eng. 10(7), 1–18 (2018)
Feasibility Study of CSP Tower with Receiver Installed at Base Chirag Rawat1(B) , Sarth Dubey2 , Dipankar Deb3 , and Ajit Kumar Parwani3 1
Indian Institute of Technology, Bombay 400076, Maharashtra, India [email protected] 2 Indian Institute of Technology, Gandhinagar 382355, Gujarat, India [email protected] 3 Institute of Infrastructure Technology Research and Management, Ahmedabad 380026, Gujarat, India {dipankardeb,ajitkumar.parwani}@iitram.ac.in
Abstract. In this paper, an alternate design of a concentrated solar power (CSP) tower is proposed with a receiver installed at the base of the tower, and a central tower with curved mirrors mounted at the top. A plurality of heliostats surrounding the tower and heat absorbers and exchangers are situated around the base. The heat from the Sun received by the heliostats is reflected towards the top of the tower, and is reflected back to the base by specially designed curved conic mirrors mounted on the top. The receiver collects and transfers heat to the working fluid which provides heat to a thermodynamic cycle. This arrangement can also act as a heat source to a furnace in the thermolysis of water for hydrogen production or for industrial heating. Keywords: CSP tower
1
· Solar energy · Thermolysis · Heat receiver
Introduction
Sun is the ultimate source of energy. A vast amount of energy of the order of 1015 watts is received by the earth of which a small fraction reaches the surface. Maxi2 mum normal surface irradiance is approximately 1000 W/m at sea level and the 2 2 daily average insolation for the Earth is approximately 6 kWh/m = 21.6 MJ/m . Solar energy is a versatile source with Megawatt level power plants for commercial use and domestic applications like solar cooker, solar water heater, photovoltaic (PV) lamps. It is useful in desalinization, drying, refrigeration and air conditioning and water pumping. Sun directly provides energy to plants for photosynthesis. Low-cost fossil fuel-based energy is used for its cost competitiveness but due to the growing concern of the greenhouse effect and other negative impacts, there is a widespread demand for solar energy [12]. It is unsteady in nature and fluctuates heavily with time [19,20]. With changes in day and night, summer and winter or clear and cloudy sky, the amount of energy received varies. A number of methods can be used to utilize solar power. Using PV cells in the most common method which uses the photovoltaic effect to generate electricity c Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 250–262, 2021. https://doi.org/10.1007/978-3-030-52190-5_18
Feasibility Study of CSP Tower with Receiver Installed at Base
251
from the light falling on the surface. Concentrated solar power (CSP) is another method in which heat flux received over a large area is concentrated over a small area/point. Hybrid systems form the future generation of energy sources in which solar power is combined with some other forms of generation like diesel, wind or biogas [17]. Concentrated solar power uses mirrors or lenses to converge solar heat received over large areas onto a specific point or area. Heat received at the converging point by heat absorbers can then be transferred to any working fluid like water, molten salts or air. These working fluid act as a source of heat in a thermodynamic cycle as used in power plants to generate electricity with varying degrees of conversion efficiencies. The first use of this technology dates back to around 212 BC when Archimedes used burning mirrors to protect Syracuse from invading Roman fleets. He lined up a number of Greek soldiers, each holding a mirror oriented in such a way that they received rays from the sun and reflect them back towards Roman ships. There are various types of CSP technology being widely used with varying efficiencies like line-focus parabolic troughs, central receiver tower system, enclosed troughs, linear Fresnel concentrators [29]. A remarkable rise in the use of concentrated solar power technology is witnessed around the globe due to a large increase in electricity demand. CSP solar panels can be used for large scale production of electricity at an economical rate. This study is performed to analyze the performance of the solar power tower (SPT) with the receiver at the base [13]. The early development of SPT technology originated from Planta Solar 10 (PS10) and Planta Solar 20 (PS20), using water as Heat Transfer Fluid (HTF). In 2011, Gemasolar became the first SPT to have molten salt as HTF. India has large potential for CSP [25]. Multi-criteria method for comparing different CSP technology were employed [6] and it was deduced that solar tower technology is the most promising CSP system. This system of CSP provides flexibility in working by providing the high temperature and low energy losses. On the other hand, it requires a large area of land, a large network of heliostats which needs high maintenance. In this paper, a feasibility study of CSP tower with the receiver at the base is performed. A detailed description of the modified arrangement of a solar power tower including its design considerations and receiver specifications of the novel arrangement is discussed in detail. A comparison of this arrangement is made with existing CSP technologies in Sect. 3. This arrangement has its application in fields other than power generation, and they are discussed in detail in Sect. 4. Modified equations of Power input and output were analysed and results were discussed in Sect. 5.
2
A Modified Arrangement of Solar Power Tower
Concentrated solar power tower design with the receiver installed at the base, is based on converging the widespread solar radiation falling on a large area to a very specific area using bi-axial mirror-based reflectors called heliostats which spread in a large area around the central receiver tower and are oriented to collect maximum radiation based upon the position of the sun in the sky
252
C. Rawat et al.
and reflect it to the top of the tower. Large specially designed curved mirrors are mounted at the top of the tower which receives the radiations sent from heliostats. They further converge this heat flux and reflect it towards the base of the tower where this flux is absorbed by the heat absorbers and transferred to heat exchangers which then transfer this heat to the heat transfer fluids which are further transported to the storage or energy conversion system. This system is capable of attaining high transfer efficiency at the receiver situated at the ground and transfers heat to heat transfer compatible fluids like water, oils or molten salts which can further act as the heat source for a thermodynamic cycle in conversion [13] or can be stored for later use when further generation cannot take place [21]. Such a modified arrangement of solar power tower with receiver installed at the base is shown in Fig. 1.
Fig. 1. Modified arrangement of solar power tower with receiver installed at the base
This novel arrangement also helps with hydrogen generation using solar power tower. Heat received by the absorbers at 42 the base of the ground is used to raise the temperature of a furnace where the reactions of a metal oxide cycle like cerium oxide cycle, copper chloride cycle or iron oxide cycle [15,26] which gives H2 as one of its products can be sustained. The raw materials for a reaction like metal oxides and catalysts can be easily delivered and the reaction products can be removed easily. The heat received by the respective metal oxide takes place in the presence of suitable catalysts. Therefore, this arrangement of the solar tower provides a comprehensive solution to the problem of transporting raw materials and products before and after the reaction. 2.1
Plant Design Considerations
In a solar tower with the receiver at the base, special design consideration is needed for the secondary reflector at the top of the tower. Size of the heliostat field will be the deciding factor for the size of the mirror and its curvature, which in turn is decided by the security distance and other factors constituting
Feasibility Study of CSP Tower with Receiver Installed at Base
253
the shading and blocking of the field [11]. The interaction of secondary reflector with respective bands of the heliostat field is shown in Fig. 2 and the variables are defined subsequently.
Fig. 2. Interaction of secondary reflector with respective bands of heliostat field
The curvature of the mirror depends on (i) Radial distance of the receiver from the base of the tower (ra ) - this is the linear distance of the main receiver surface and the central axis of the tower, (ii) Height of the tower (h) - larger height corresponds to a higher angle of reflection for each band of heliostats, (iii) Radial distance of the assigned heliostat from the base of the tower (ri ); larger distance corresponds to larger convection losses and smaller reflection angle, (iv) Concentration ratio of the collector, and planned capacity of plant - higher the planned capacity of the plant, higher should be the maximum (ri ), i.e., (rn ). The curved section of the mirror is divided in a number of bands m1 , m2 , . . . , mn , according to the size of the heliostat field. Similarly the heliostat field is divided into a number of bands f1 , f2 , . . . , fn . This division of mirror and heliostats is subjected to the size of the plant. Each heliostat band will be assigned a corresponding mirror band and they are so oriented that the reflected light rays from ith band of heliostats will reach the mirror only at the corresponding ith band of the mirror. Due to the defined curvature of the mirror, the light ray gets reflected towards the receiver at the base with the concentration ratio as per the design specifications of the plant.
254
2.2
C. Rawat et al.
Receiver Specification
The receiver converts the sun’s low-grade energy to high-grade heat. The efficiency of the collector and receiver directly decides the efficiency of the complete system. Properties like high thermal conductivity, absorptivity, the higher thermal inertia of the core, excellent resistance to thermal fatigue and low emissivity factor help in increasing the efficiency of a receiver [2,23]. A receiver consists of the following: (i) Transparent cover - it is a cover of transparent material which is used to reduce the losses. It is designed such that it traps the absorbed heat and reduces radiative and convective losses. This material should be able to withstand high temperatures, show high thermal fatigue resistance and it should protect absorber from UV rays. Any diathermanous material with UV inhibiting capacity can be used [2]. (ii) Absorber - it is the junction where primary heat transfer takes place. It is a thin layer of material having high thermal conductivity, thermal fatigue resistance and corrosion resistance. The exposed surface can be coated with black coating to reduce reflection and increase the absorption of heat. (iii) Heat transfer fluid - the heat is finally transferred to the HTF which carries heat to the storage or energy conversion system. This fluid should have very low solidification temperature, high evaporation temperature, low viscosity, high thermal conductivity, high heat capacity and density, high storage potential and non-corrosive. Sodium and Potassium salt, water (steam), Pressurized air, noble gases provide all these requirements [18]. (iv) Insulation - This prevents the heat loss to the surrounding due to convection. Insulation of the receiver around the surface prevents direct contact between atmosphere and high-temperature surface [27]. (v) Casing - it encapsulates complete assembly of the receiver and prevents any external damage. It also adds to the insulation of the core (Fig. 3).
Fig. 3. Design of a flat plate receiver
Feasibility Study of CSP Tower with Receiver Installed at Base
3
255
Comparisons with Existing CSP Technologies
Around 2% of the total capital cost of the plant is spent on annual repair and maintenance of CSP plant [25]. In Conventional solar power towers, the heat transfer fluids are transported up to the top of the tower using heavy pumps which increases the parasitic consumption of power (in-plant use of power). More piping, insulation, extra maintenance are the other issues that arise. Multicriteria analysis [6] can provide a technical-scientific decision making support tool that can justify its choices clearly and consistently [7]. There are 10 criteria which can be used as a tool to compare the alternative methods of concentrated solar power. These criteria are selected by the industrial and technical feasibility, economic viability and environmental concerns: (i) C1 : Investment costs - This includes all costs relating to the purchase of mechanical equipment, technological installations, engineering services, drilling and other incidental construction work. (ii) C2 : Operating and maintenance (O&M) cost which includes all the costs relating to plant, employees’ wages, materials and installations, transport and hire charges, and any ground rentals payable. (iii) C3 : Levelized cost of electricity (LCOE) - This measures the industrial production cost per kW h of the electricity generated by the plant, expressed as cost/MWh. This is an important and useful parameter for assessing how commercially competitive the system is compared with conventional energy production technologies. (iv) C4 : Efficiency of the process - This criterion accounts for the efficiency of the system. This includes the efficiency of receiving solar energy, transfer of energy to HTF and transport losses. (v) C5 : Electricity production - This criterion quantifies the level of electricity production (GWh) by the system proposed. (vi) C6 : Infrastructural requirements - this criterion quantifies the level of infrastructure required for the system. It accounts for the complexity of the structures involved. (vii) C7 : State of knowledge of innovative technology - This criterion represents the degree of reliability of the technology adopted, as well as how widespread the technology is. (viii) C8 : Environmental risk and safety - This criterion considers the environmental risk arising from accidental leaks of the HTF from tanks and the hydraulic plant. (ix) C9 : Land use - This criterion quantifies the area occupied by the CSP plants and not therefore available for possible alternative uses (i.e. agriculture or other commercial activities). (x) C10 : Robustness and versatility - this criterion accounts for the ability of a system to perform in varying conditions including weather changes or solar flux variations. It also takes into consideration the versatility in terms of usage in various fields like hydrogen generation, industrial heating, thermal reactions, etc. It also quantifies the system and its components’ resistance to failure.
256
C. Rawat et al. Table 1. Weighted comparative analysis of different CSP technologies Criteria for evaluation 5
CSP techniques
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
Parabolic trough
1
0
Solar tower
0
Solar tower (base receiver) 0
5
5
4
4
Weighted total
Weightage
3
2
2
1
1
−1 0
−1 1
1
−1 0
−1 0
1
1
−1 0
1
0
1
0
1
−1 −1 1
0 1
1
0
2
−1 1
11
A comparison of parabolic trough system, Solar tower (conventional) and a solar tower with the receiver at the base using Pugh matrix analysis [8], and evaluation of these systems based on the criteria C1 to C10 are performed. This is a relative comparison where −1 means the lowest level of fulfilment of the criteria, 0 means intermediate fulfilment and +1 means that the system being evaluated is better than other systems in that respective criteria. Weightage is given to each criterion by the analyst [6] based on their relative importance and relevance out of 5. A criterion weighing 5 means it is a major factor in deciding the quality of the system, while weightage 1 means the respective criteria have comparatively less influence in the quality factor of the system. Weightage of criteria is a subject to the purpose of the study and is assigned by an analyst based on the scientific and industrial priorities. These comparative scores are given for any criteria to respective technology based on the research performed in these fields [9,16,23,29] while for the proposed method, the score was given based on its assumed functions and application. According to the Pugh matrix analysis, as shown in Table 1, solar power tower with the receiver at the base stands out as the most viable option for solar thermal power plant for the future.
4
Application in Another Field: Thermolysis of Water
CSP tower is capable of attaining higher temperature as compared to other solar power techniques which makes them useful for industrial heating purpose. A special application of this heating involves Hydrogen production [24]. Thermolysis means breaking down a molecule by the application of heat. Thermolysis of water involves the reaction: H2 O + Δ −→ H2 + 12 O2 . Hydrogen is the required product in the reaction and oxygen, a by-product. Direct thermolysis of H2 O requires temperature as high as 2500 K which gives rise to several problems including material failure, low reaction rate and criticality in attaining the high temperature due to changing atmospheric and climatic conditions. Baykara [4] discussed in detail, the use of solar heat in thermolysis of water and presented results from Solar furnace at Odeillo, Canada in which the results were consistent. Nakamura [22] investigated the feasibility of producing H2 and O2 and studied the thermodynamic efficiency of the two-step thermolysis process which came to be much higher than direct thermolysis (Fig. 4). A variety of two-step metal oxide thermochemical cycle proposed for the use of the H2 production. These are based on the reduction and subsequent
Feasibility Study of CSP Tower with Receiver Installed at Base
257
Fig. 4. SPT arrangement for industrial heating
oxidation of metal ions. Steinfeld [28] described Zn oxide cycle in which Zn+2 first reduces to Zn and then again oxidizes to Zn+2 such that the metal oxide and catalysts are regenerating and ideally no raw materials need to be given other than water. Flamant proposed the use of Cerium oxide for H2 generation [1]. Iron oxide thermochemical cycle was proposed [10,26] in which Fe(III) reduces to Fe(II) and then oxidizes back to Fe(III). In a comparative study, Perkins found that two-step ZnO/Zn cycle is most promising for H2 generation [24]. The metal oxides and other reactant including catalysts needs to be transported to the top of tower for reaction and product formed during the reaction needs to be brought down for storage. The reactants involved are not completely regenerated and a considerable amount remains reduced, and increases the demand for more reactants to be fed and the unwanted chemical waste to be removed continuously which may require a complex structure [3].
5
Analysis and Discussions
Since the modification only deals with changes in tower design, the comparison is made with the conventional designs [5,11] and the field input constant; not taking into account the field optimization operations, and the following assumptions: (i) Performance is steady-state, Shading and blocking effects are neglected. (ii) Heat flow through a cover is one dimensional and any absorption is neglected. (iii) Temperature drop through a cover is neglected, and the radiations trapped by the cover is subjected to radiation. (iv) The sky is considered as a black body, and temperature gradients around tubes are neglected, and mean plate temperature conditions are assumed. (v) The field of heliostats is considered as continuous throughout the area, spanning from rmin to rmax and circular.
258
C. Rawat et al.
Consider the total area of the heliostat field to be Ah = π[rmax 2 − rmin 2 ].
(1)
For a circular field of heliostats from rmin to rmax , the energy reflected will be E1 = ρh IdA,
(2)
where dA = 2πrdr is an element of the continuous field distribution, ρh is the 2 reflectivity, and I is the local solar irradiation in W/m . The cosine and atmospheric efficiencies are assumed constant for both arrangements given by fcos and fat respectively. Further, the spillage efficiency is denoted by fsp and assumed to be a function of distance from the heliostat centre to the focal point and other dimensional and material properties. The incident energy at the focal point for a conventional SPT is rmax EdA, (3) P= rmin
where E = Iρh fsp fat fcos (accumulation of efficiency factors). For the modified arrangement, P is incident on the secondary reflector which re-reflects this energy flux towards the receiver aperture. Assuming net flux by means of common focal point (u, v) in reference to the central axis of tower at ground level, the incident energy is , (4) P = P ρr fsp is the spillage efficiency where ρr is the reflectivity of the secondary reflector, fsp for the re-reflected radiation, and P is the incident energy at secondary reflector. The energy flux to be considered as input for the receiver is
Econv = f(P, (ρ, τ )cover ), Emod = f(P , (ρ, τ )cover ),
(5)
where Econv and Emod are the receiver input flux for the conventional and proposed modification arrangement of SPT respectively. Also, ρ and τ are the reflectivity and transmittivity for the glass cover. This flux is expressed in terms of the incident radiation intensity and the reflection (internal) and transmission quality of the glass cover. Consider now, a single cover flat plate solar collector as receiver to subject to comparative analysis. For a basic configuration, we model the overall heat loss coefficient as UL = Ut + Ub + Ue ,
(6)
where UL is the overall heat loss coefficient, while Ut , Ub and Ue are the top, bottom and edge heat loss coefficients respectively. For the estimation of Ut , we define an iterative examination [14] for which the following heat transfer coefficients are hc,pc = f(Nu, Ra, k), hr,pc = f(Tp , Tc , εp , εc ), hr,ca = f(Tc , Ta , εc) .
(7)
The value for Ut is estimated for Ut = f(hc,pc , hr,pc , hw , hr,ca ),
(8)
Feasibility Study of CSP Tower with Receiver Installed at Base
259
where hc,pc and hr,pc is the convective and radiative heat transfer coefficient respectively between plate and cover, hr,ca is the radiative heat transfer coefficient between the cover and ambience. Tp , Tc and Ta are the plate, cover and ambient temperatures respectively at steady state. Also, Nu, Ra and k are the Nusselt Number, Rayleigh Number and Thermal Conductivity respectively for the plate-cover system. The bottom and edge losses Ub and Ue can be addressed [14] as (9) Ub = f(R1 , R2 ), Ue = f(Ac ), where Ac is the collector area, R1 is the conductive resistance to heat flow through the bottom insulation, and R2 is the combined (convective and radiative) resistance to heat flow at insulation-ambience junction. Thus, assuming a common sink temperature, Ta , the value for UL is defined. Balancing the energy equation for the two arrangement, we get Econv (P, (ρ, τ )cover ) = Emod (P , (ρ, τ )cover ) = UL ΔTA + X,
(10)
where X is the Energy flux associated with the heat transfer involving the fluid flow in tubes such that POutput = f(X), where, Poutput is the overall SPT output depending on X and the overall efficiency of the process involved. As seen from Eqs. (6) to (10), the geometry of receiver, material properties of its components and the established temperatures have a direct effect on SPT efficiency. By placing the receiver at base, it allows a more dynamic control over these variables. To sketch a comparative understanding between the two arrangement, we can model for (i) constant Power production, (ii) a given field dimension. 1. For a given heliostat field, the effect of the proposed modification on the net energy flux incident at receiver can be approximated to a simple general , or it can be expression. Using Eqs. (3) and (4), we can write P /P = ρr × fsp rewritten as ), (11) ΔP/P = (1 − ρr × fsp where ΔP/P denotes fractional loss of energy flux observed in the modified arrangement. The heliostat field is assumed to be continuous with uniform contribution throughout, but in a practical scenario, the above expression needs to be modified to account for the actual coverage due to the effects of heliostat distribution, geometry and design. It must also account for the shading and spillage factors. To illustrate the loss in effective energy flux, use = 0.9 [14], which gives, ΔP/P = 0.154 i.e., at least 15.4% loss ρr = 0.94 and fsp for the above defined heliostat field. 2. To attain same power output (P = P’) in the two arrangement, keeping the rmin constant for the two, the rmax for conventional and rmax for modified can be represented by an expression. Equating (3) and (4), considering that the incident energy flux remains same (i.e., E), which gives A /A = 1/ρr × fsp 2 2 2 2 (=C1 ,say), where A /A = rmax − rmin /rmax − rmin . Use rmin = 75 m [14], for attaining the same power output at some rmax , the respective increment in size of field is (12) rmax = C1 r2max − C2 ,
260
C. Rawat et al.
where C2 = r2min × (1 − C1 ). It is to be noted that this expression also considers the field to be continuous and uniformly contributing, so the effect of field design and decline in energy flux has to be considered as per local conditions (Table 2). Table 2. Percentage change in radius rmax Δr/r% rmax Δr/r% rmax Δr/r% 150
6.67
250
7.97
350
8.34
175
7.17
275
8.1
375
8.39
200
7.54
300
8.2
400
8.43
225
7.79
325
8.27
The major criteria which were employed for analysis like investment cost, O&M costs and levelized cost of electricity will also be influenced. As the tower need not be a load-bearing structure, the construction cost and design complexity get reduced, although an additional cost of secondary mirror adds up. The heliostat field area also increases by the fraction calculated. The overall operation and management cost drastically reduce, less parasitic energy consumption (with lesser pumping work required), maintenance efforts and cost becomes negligible with the receiver at ground level. This reduction in costs and expenses reduces the levelized electricity cost. Since the system proposed is subjected to dual reflection, the net energy flux at the receiver reduces owing to the loses as illustrated. But this modification provides flexibility to optimize receiver and reflector configuration to establish an optimal aiming strategy because of the overall life of the system, which helps with the associated maintenance and repair. This configuration of SPT facilitates scalability and flexibility to the system and the output capacity of the plant could be scaled to a higher level. Furthermore, the percentage reduction in output power was estimated to be reduced by nearly 15.4% while keeping the area of the field constant which results from the losses and inefficiencies due to dual reflection in the new arrangement. This reduction in output can be compensated by increasing the heliostats field area which in turn associates an extra cost and is a singular disadvantage of the new system. Also, the percentage change in the maximum radius of the heliostat field was estimated to be ranging from 6.67% to 8.43% varying with the rmax of the conventional system.
6
Conclusions
In this study, it is found that for the proposed system of a solar tower with the receiver at the base, there is an overall decrease in the efficiency but the system performs well in economic aspects of CSP. It directly cuts the cost of O & M, parasitic electricity consumption is reduced and the system is versatile and can be
Feasibility Study of CSP Tower with Receiver Installed at Base
261
easily used for industrial heating. An increase in the radius of the heliostat field is needed so as to match its working conditions with the conventional system. For the generated power to be economically viable, reduction in initial capital investment, O & M cost and LCOE are extremely important. Apart from power generation, Hydrogen production is another application where this technology can be employed. Hydrogen being the most efficient fuel present in nature has a great economic value and can be used for power generation or industrial purpose. Further research needs to be done in estimating the efficiency of heat transfer and convective losses in the proposed process. Design of curved conic mirror for the tower remains a manufacturing challenge for industries. Quality of mirrors also influences the overall efficiency of the process and research is going on in improving the reflectivity of these mirrors. A number of unforeseen factors like uneven climatic condition, cloudy weather, the dusty environment also affect the working of the system.
References 1. Abanades, S., Flamant, G.: Thermochemical hydrogen production from a twostep solar-driven water-splitting cycle based on cerium oxides. Sol. Energy 80(12), 1611–1623 (2006) 2. Alghoul, M., Sulaiman, M., Azmi, B., Wahab, M.: Review of materials for solar thermal collectors. Anti-Corros. Methods Mater. 52(4), 199–206 (2005) 3. Zillmer, A.J., Cap, D.P.: Lifting system for solar power tower components (2013) 4. Baykara, S.: Experimental solar water thermolysis. Int. J. Hydrog. Energy 29(14), 1459–1469 (2004) 5. Bergene, T., Løvvik, O.M.: Model calculations on a flat-plate solar heat collector with integrated solar cells. Sol. Energy 55(6), 453–462 (1995) 6. Cavallaro, F.: Multi-criteria decision aid to assess concentrated solar thermal technologies. Renew. Energy 34(7), 1678–1685 (2009) 7. Cavallaro, F.: Fuzzy TOPSIS approach for assessing thermal-energy storage in concentrated solar power (CSP) systems. Appl. Energy 87(2), 496–503 (2010) 8. Cervone, H.F.: Applied digital library project management. OCLC Syst. Serv. Int. Digit. Libr. Perspect. 25(4), 228–232 (2009) 9. Chaanaoui, M., Vaudreuil, S., Bounahmidi, T.: Benchmark of concentrating solar power plants: historical, current and future technical and economic development. Procedia Comput. Sci. 83, 782–789 (2016) 10. Charvin, P., Abanades, S., Flamant, G., Lemort, F.: Two-step water splitting thermochemical cycle based on iron oxide redox pair for solar hydrogen production. Energy 32(7), 1124–1133 (2007) 11. Collado, F., Tur´egano, J.: Calculation of the annual thermal energy supplied by a defined heliostat field. Sol. Energy 42(2), 149–165 (1989) 12. Deb, D., Brahmbhatt, N.L.: Review of yield increase of solar panels through soiling prevention, and a proposed water-free automated cleaning solution. Renew. Sustain. Energy Rev. 82, 3306–3313 (2018). https://doi.org/10.1016/j.rser.2017. 10.014 13. Deb, D., Rawat, C.: Concentrated solar power tower with receiver installed at the base. Indian Patent Office (2017)
262
C. Rawat et al.
14. Duffie, J.A., Beckman, W.A.: Solar Engineering of Thermal Processes. Wiley, Hoboken (2013) 15. Ehrhart, B.D., Muhich, C.L., Al-Shankiti, I., Weimer, A.W.: System efficiency for two-step metal oxide solar thermochemical hydrogen production – Part 1: thermodynamic model and impact of oxidation kinetics. Int. J. Hydrog. Energy 41(44), 19881–19893 (2016) 16. Gharbi, N.E., Derbal, H., Bouaichaoui, S., Said, N.: A comparative study between parabolic trough collector and linear fresnel reflector technologies. Energy Procedia 6, 565–572 (2011) 17. Han, W., Jin, H., Lin, R., Liu, Q.: Performance enhancement of a solar trough power plant by integrating tower collectors. Energy Procedia 49, 1391–1399 (2014) 18. Heller, L.: Literature Review on Heat Transfer Fluids and Thermal Energy Storage Systems in CSP Plants. Solar Thermal Energy Research Group, Stellenbosch University 19. Kapoor, D., Sodhi, P., Deb, D.: A novel control strategy to simulate solar panels. In: 2012 International Conference on Signal Processing and Communications (SPCOM). IEEE (2012). https://doi.org/10.1109/spcom.2012.6290002 20. Kapoor, D., Sodhi, P., Deb, D.: Solar panel simulation using adaptive control. In: 2012 IEEE International Conference on Control Applications. IEEE (2012). https://doi.org/10.1109/cca.2012.6402674 21. Kuravi, S., Trahan, J., Goswami, D.Y., Rahman, M.M., Stefanakos, E.K.: Thermal energy storage technologies and systems for concentrating solar power plants. Prog. Energy Combust. Sci. 39(4), 285–319 (2013) 22. Nakamura, T.: Hydrogen production from water utilizing solar heat at high temperatures. Sol. Energy 19(5), 467–475 (1977) 23. Padilla, R.V., Demirkaya, G., Goswami, D.Y., Stefanakos, E., Rahman, M.M.: Heat transfer analysis of parabolic trough solar receiver. Appl. Energy 88(12), 5097–5110 (2011) 24. Perkins, C.: Likely near-term solar-thermal water splitting technologies. Int. J. Hydrog. Energy 29(15), 1587–1599 (2004) 25. Purohit, I., Purohit, P.: Techno-economic evaluation of concentrating solar power generation in india. Energy Policy 38(6), 3015–3029 (2010) 26. Roeb, M., Sattler, C., Klu`Iser, R., Monnerie, N., de Oliveira, L., Konstandopoulos, A.G., Agrafiotis, C., Zaspalis, V.T., Nalbandian, L., Steele, A., Stobbe, P.: Solar hydrogen production by a two-step cycle based on mixed iron oxides. J. Sol. Energy Eng. 128(2), 125 (2006) 27. Singh, S.N.: Non Conventional Energy Resources. Pearson Education, New Delhi (2017) 28. Steinfeld, A.: Solar hydrogen production via a two-step water-splitting thermochemical cycle based on zn/ZnO redox reactions. Int. J. Hydrog. Energy 27(6), 611–619 (2002) 29. Zhang, H., Baeyens, J., Degr`eve, J., Cac`eres, G.: Concentrated solar power plants: review and design methodology. Renew. Sustain. Energy Rev. 22, 466–481 (2013)
Bi-Level Optimization Using Improved Bacteria Foraging Optimization Algorithm Gautam Mahapatra1(B) , Soumya Banerjee1 , and Ranjan Chattaraj2 1
Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Off-Campus Deoghar, Jharkhand, India [email protected], [email protected], [email protected] 2 Department of Mathematics, Birla Institute of Technology, Mesra, Off-Campus Deoghar, Jharkhand, India [email protected]
Abstract. Meta-heuristics are mimicry of natural phenomenon in the form of computational frameworks and these are used to find robust and global solutions of complex problems. Bacteria Foraging System (BFS) is one of such newly developed model based on the life structure of single cell bacteria that can follow basic computational instructions like chemotaxis, reproduction, etc. and using these in sequence it can fight and survive in the complex chemical environments. In this work some new improvements has been experimented successfully for the reproduction part of BFS and tested for Capacitated Vehicle Routing Problems, formulated as Bi-Level Optimization Problem. Experimental results are showing its effectiveness for the searching of robust and global solutions. Keywords: Meta-heuristics · Bacteria Foraging Optimization Algorithm · Bi-Level Optimization Problem · Bi-Level Improved Bacteria Foraging Optimization Algorithm · Chemotaxis · Adaptive reproduction · Bacteria rank · Elimination-dispersal · Capacitated Vehicle Routing Problem
1
Introduction
In social managements and administration activities most of the cases multilevel organizational structures are followed and for studies these can be modelled as hierarchical decisions making process to maintain their efficiency, long lasting, robustness and sustainable existences [1,2]. Since the early stage of the development of Linear Programming Problems (LPP), which are mathematical modelling of such decentralized optimized planning problems with high degree of conflicting objectives and targets, this multi-level decision making processes Supported by the Post Graduate Studies Section, Asutosh College, Kolkata, Email Address:[email protected]., Website: www.asutoshcollege.in. c Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 263–275, 2021. https://doi.org/10.1007/978-3-030-52190-5_19
264
G. Mahapatra et al.
are being studied and different solving mechanism are evolving for them. This experimental study is mainly focusing the Bi-Level mathematical structures of these systems. This structure is different from other decisions making optimization problems; here a particular order of taking decisions must be followed. In Bi-Level Optimization Problems (BLOPs), entry level called as Leader will take decisions first for the decision parameters under its own control and with these decisions next level called as Follower will take decision for its own parameters so that both Leader and Follower are performing optimally with their own objectives for the maintenance of best possible states of working of the system. In real world situations several existing and new systems can be modelled as two or more level decision making problems to study for the optimal structure and these are discussed in the later section of this article. For the advancement of electronics, computer and communication systems now simulations of randomness or stochastic systems come into the reality, so different natural and/or unexplainable like situations are now possible to simulate and nature inspired computations can be successfully modelled to solve the difficult NP-Hard like problems with optimal solutions in finite time and computational resources. Bacteria Foraging Optimization Algorithm is one of relatively new technique introduced for finding global and robust solutions for large scale system [3]. After introductions, different improved variants are coming out as better problem solving methods. In this study we have used one improved form of the Bacteria Foraging Optimization Algorithm (BFOA) to design the method for these multi-level optimal decision making systems. To validate our proposed technique we have tested the algorithm for very common Vehicle Routing Problems (VRPs) and the results are showing that method is very much effective and promising. There is also possible scope to extend the method with other improved form of BFOA, also hybridization with Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Bee Colony Optimization (BCO), Differential Evolutionary (DE), etc. are generally considered for the solution of optimization problems like optimum electrical load distributions to the different power generation stations to keep the system more functional [4]. Also, these can be used in the parallel and ensemble forms for the design of most efficient framework for the solutions of complex decision making problems are also challenging.
2
Bi-Level Optimization Problems (BLOPs)
In the year 1973 J. Bracken and J. McGill had shown how real life problems can be formulated as a Bi-Level Programming structure and later in the year 1977 W. Candler and R. Norton had shown how these problems can also be structured as Bi-Level as well as Multi-Level Optimization problems [5–7]. Any BLOP is a no-cooperation based sequential decision making process for hierarchical LeaderFollower structure; Follower can take decision only after the decisions taken by Leader. For the whole Leader-Follower based system there are fixed numbers of decision making parameters, but Leader will take decision on set of such parameters first and then for rest of parameters set Follower will take decisions. Both
BLOP Using IBFOA
265
Leader and Follower has their own Maximizing or Minimizing Objective functions on these all decision parameters of the system. Leader must take optimized decisions such that all defined constraints for it must be satisfied and under its constraints or feasible space for decision Follower’s optimized decisions with its own constraints must also be satisfied [7–10]. But, there is no cooperation in decision making of both Leader and Follower, these are completely independent and competitive is nature, though decisions of one will create a reaction on the decisions other and also its own objective. L.N. Vicente and P.H. Calamai in the year 1994 had proved that such complex, intrinsically non-convex interaction structure between two levels is a NP-Hard type problem and also different from common optimization problems [11]. 2.1
Mathematical Models of BLOPs
The standard mathematical formulations of BLOPs are as follows (minimizing type objectives): ⎧ Leader :minF (X, Y ) ⎪ ⎪ ⎪ X ⎪ ⎧ ⎪ ⎪ ⎪ ⎪ G(X, Y)≤0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ H(X, Y ) = 0 ⎪ ⎨ ⎪ ⎪ ⎪ ⎨Follower :minf (X, Y ) (1) ⎪ ⎪ s.t. Y ⎪ ⎪ ⎪s.t. ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ g(X, Y ) ≤ 0 ⎪ ⎪ ⎪ ⎪ ⎩ ⎩ h(X, Y ) = 0 where X = {xi |∀i = 1...m ∧ (Lli ≤ xi ≤ Uil ) ∧ Lli , Uil ∈ R} - Leader decision vector. Y = {yi |∀i = 1...n ∧ (Lfi ≤ yi ≤ Uif ) ∧ Lfi , Uif ∈ R} - Follower decision vector. Similarly, F (X, Y ) : Rm × Rn → R - objective functions for Leader and k > 1 for multiobjective Feader f (X, Y ) : Rm × Rn → R - objective functions for Follower and r > 1 for multiobjective Follower. G(X, Y ) : Rm × Rn → Rp - set of in-equality constraints for Leader. g(X, Y ) : Rm × Rn → Rq - set of in-equality constraints for Follower. H(X, Y ) : Rm × Rn → 0p - set of equality constraints for Leader. h(X, Y ) : Rm × Rn → 0q - set of equality constraints for Follower. For multi-objective problems Leader objective function minF (X, Y ) will be X replaced by min{F1 (X, Y ), F2 (X, Y ), ..., Fk (X, Y )} similarly Follower objective X function minf (X, Y ) will be replaced as min{f1 (X, Y ), f2 (X, Y ), ..., fr (X, Y )} Y Y on these decision vectors, where k and r are number of objectives for Leader and Follower respectively.
266
2.2
G. Mahapatra et al.
Types of BLOP
Depending on mathematical and/or logical structure of the objective and constraint functions BLOP can be divided into different categories like Linear BLOP where objective functions as well constraints of Leader and Follower both are linear, similarly these functions can be non-linear type and the category will be called Non-Linear BLOP. For Single objective BLOP such divisions are available in the literatures [9,10,12,13]. A Multi-Objective BLOPs (MOBLOPs) where both Leader and Follower have multiple conflicting objectives to meet. The Follower is the constrained for the Leader such that an optimal decision is feasible at the Leader only if it is in the Pareto-optimal front for the Follower [14]. Here also objective functions and constraints may be either linear or non-linear and hence difference combinations are generating in different sub-categories of MOBLOPs. 2.3
Applications of BLOPs
This multi-level optimal decision making models are being used in many fields of social developments, Colson et al. [15] had proposed seven major application fields: (1) Agriculture and water resource planning - primarily related with water supply models, Food Quality Assessment in Wholesale Markets [16], negotiations of multi-lateral agriculture, allocations of water-resources and designing the operation policies for multiple water reservoirs; (2) Government Policy Framing - resource distributions, strategies for both subsidies and penalty; (3) Economic System - setting of the optimal location distribution center, ceiling price and price-based clearing in dynamic structure of the oil industry; (4) Finance Model - bank assets portfolio problems in view of financial management; (5) Protection System - protection planning for homelands, critical infrastructures and internal security system; (6) Transport System - setting the optimal location of tolls on the highways, vehicle routing problems, location routing problems and highway network route management systems, Unmanned Aircraft System [17]; (7) Engineering Applications - Control System, Combinatorial Problems in engineering [18].
3
Meta-heuristics for Bi-Level Optimization Problems
In recent years meta-heuristics based computational techniques are used in many fields for their inherent parallelism, robustness and efficient global solutions searching capabilities. Though the BLOPs were identified long time ago in the year 1934 after formulation of present Stackelberg game [19], however due its complexity and practical feasibilities it is not gaining interests before nineties and after 2000 due to advancement of nature-inspired computational methods and their practical implementations using efficient computational hardware, different new techniques are continuously being tested and introduced successfully for complex computational systems [1]. A taxonomy shown by Talbi, for the applications of meta-heuristics in solving BLOPs is notably significant to show the
BLOP Using IBFOA
267
rapid growth of interests [20–22,24]. Other conventional solving techniques used for BLOP are Vertex Enumeration, Penalty Methods and Karush-Kuhn-Tucker (KKT) Conditions used to convert into single level optimization problems [23], but these are not suitable for large scale problems, so stochastic methods gaining acceleration and multi-level decision making processes are being successfully implemented. 3.1
Bi-Level Bacteria Foraging Optimization Algorithm (BiBFOA)
Bacteria Foraging based Optimization (BFOA) technique has been developed by Prof. K.M Passino and it is a relatively new in the nature inspired meta-heuristic intelligent computational framework [3]. BFOA has been designed after mimicking life cycle of common single cell simplest living elementary bacteria like E.Coli, that can perform some very basic operations for the best possible survival in complex bio-chemical environments of human body. These operations are i) exploitation related (equivalent to local search)- Chemotaxis for food foraging, which is an alteration of tumbling for random direction selection and swimming some steps in that direction, reproduction for continuation of life and ii) exploration (equivalent to global search) - elimination-dispersal due to unpredictable environmental interaction for global searching. Mahapatra G. et al. described the details of implementations of this technique with formal descriptions, improvements and some possible variants [25]. It is an inherent parallel and robust computational technique highly effective for high-dimensional complex system. Now it is becoming a new state-of-the art swarm intelligence based computation paradigm for research and developments. Using the Stretching Technology of Parsopoulos et al. [26] for BLOPs, Mahapatra G. et al. [13] has developed the Bi-level Bacteria Foraging Optimization (BiBFOA) as a new solving technique for BLOPs. In the present work we have used this computational technique with tested improvements in the reproduction phase for the design of BiBFOA [25]. The BLOP form of the Capacitated Vehicle Routing Problem (CVRP) has been used to study the effectiveness of this new algorithm. 3.2
Bi-Level Improved Bacteria Foraging Optimization Algorithm (BiIBFOA)
We have already studied the one improved form of BFOA where more natural biological behaviour for reproduction technique of bacteria has been introduced. Always all bacteria will not reproduce it after making a copy at its own place as in case of classical form of BFOA, reproduction process of bacteria should depends on health index, age and other biological parameters. Also, all bacteria will reproduce at least one and up-to a certain maximum number of children instead of fixed only one copy. After considerations of these natural phenomenon for bacteria life cycle we have shown some improvements are possible on the classical form of the algorithm and Improved BFOA (IBFOA) has been designed [25]. In the present work we have used this IBFOA to design new form of BiBFOA to solve BLOPs. This new BiIBFOA is considering the asymmetric co-operations
268
G. Mahapatra et al.
among the Leader and Follower in a nested sequential optimization form. In BiIBFOA Leader’s controlling decision vector X are sampled from the feasible space to find the Nl number of candidate solutions (Sl = [X, F ()]) for Leader. With these candidate Leader’s decisions we use IBFOA together with Stretching Technology proposed by Parsopoulos et al. [27] to obtain the Follower’s response in its own feasible space for every Leader’s decision. With algorithmic parameters like number of follower bacteria, values of the current decision parameters, iteration numbers, etc. IBFOA called as sub-function. Here individual bacteria are representing a solution, so a pool of solution for both Leader and Follower are generated S = [X, F (), Y, f ()], and based on the values of the respective objective functions current best and hence the global best solutions (S ∗ ), both are updated. Now, for each such solutions Stretching Technique is used to avoid the possibility of trapping into a local solution. These steps are repeated for a predefined number of Leader’s loop counter and finally we get the optimal decision vector V ∗ = (X ∗ , Y ∗ ) and corresponding optimal solution values for Leader and Follower F ∗ = F (X ∗ , Y ∗ ) and f ∗ = f (X ∗ , Y ∗ ) respectively and together all are forming the optimal solution S ∗ = [X ∗ , F ∗ , Y ∗ , f ∗ ] By the use of this improved form of the BFOA number of computational iterations and hence the number of computational steps can be reduced i.e. speedup is possible in comparison to the use of the classical form of the BFOA used for solving BLOPs [13]. Algorithm BiIBFOA Step 1: [Generate Nl random leaders] SL ⇐ φ S∗ ⇐ φ for l ⇐ 1 to Nl do begin while (true) do begin Xl ⇐ LL + rand ∗ (U L − LL ) if (G(Xl ) ≤ 0) then break end SL ⇐ SL [Xl , F ()] end Step 2:[Process all the Leader’s iterations] kl = 0 repeat Step 3 thru Step 6 while kl ≤ M L Step 3: [Evaluate Follower’s response for each Leader using BFOA with Stretching Technology] for l ⇐ 1 to Nl do begin S = S IBF OA(Nf , l, SL , f, g, M F ) end Step 4: [Update the best solution] S ∗ = Best(S, S ∗ )
BLOP Using IBFOA
269
Step 5: [Use Stretching Technology for Leader to avoid trapping into local optima] SL = φ for l ⇐ 1 to Nl do SL = SL Stretch(Xl , F, G) SL = SL Step 6: [Update the Leaders iteration counter] kl = kl + 1 Step 7: [Output the solution] write S ∗ It is a swarm based algorithm so fixed number of bacteria has been created in the feasible space, they are representing one possible upper-level solution, lowerlevel is considered as single level optimization problem using improved bacteria colony based computation for best Leader and Follower combinations. Now with the help of stretching of solutions vectors in the feasible space, same iterations are repeatedly used for finding global optima.
4
Bi-Level Capacitated Vehicle Routing Problem (BiLCVRP)
Vehicle Routing Problem (VRP) (also known as Capacitated Vehicle Routing Problem (CVRP)) is a very classical problem and modelled formally by Dantzig and Ramser [28] and latter due to the advancement of technologies like Global Positioning System (GPS) tracking, Internet of Things (IoT) etc. many variants are being studied to meet the requirements different complex and integrated systems. Also in literature different deterministic, non-deterministic or probabilistic, meta-heuristics etc. solving techniques are being studies [29]. Here it has been modelled in the meta-heuristics form for solution. For the test of the BiIBFOA the BLOP form of the CVRP has be considered and the results are showing its effectiveness. 4.1
Capacitated Vehicle Routing Problem (CVRP)
In standard and simple model of CVRP there is a depot of similar types of vehicles with fixed and finite capacity. These vehicles are used to attend a number of service points at different geographical locations. Any vehicle must start from depot and attend each assigned customers with required services carried (like school children or office staff pickup & drop, delivery of different industrial, factory or daily life commodities etc.) for it and finally finished at the depot. Operators of such system have different objectives like - total cost will be minimum, maximum number of customer will be served, time of service should be optimum, etc. There are also a number of constraints to satisfied like - depot has certain number of vehicles with a certain maximum capacity available at any particular point of time, clients must be visited once, restriction in path, for
270
G. Mahapatra et al.
vehicle total demand serviced never exceed the capacity defined for it, etc. This is a complex NP-Hard type optimization problem and most of cases statistical meta-heuristics technique are suitable to solve practical CVRP [30]. 4.2
Bi-Level Formulation CVRP
In the formulation of the CVRP by Fisher and Jaikumar [31] it is found that constraints of the problem can be divided into two sub-groups - most generalized constraints to meet that for a route the vehicle must start and finished at the depot, at any time load of any vehicle never exceed its defined capacity, and demands of every registered customer must be fulfilled by the vehicle; and second, specialized constraints are specific to a vehicle which has to visit the every pre-assigned geographical service points once in the complete tour and total cost of the must be minimum and so it is a Travelling Salesman Problem (TSP). Using these two constraints sets, a model of two level decision making or Bi-Level CVRP can be formulated: Upper-Level or Leader must take decisions first and forms the routes after assigning the designated customers to particular route and then in the Lower-Level or Follower will take decisions about the practical optimal routing of customers so that cost for Leader is optimum [20]. To model this problem as a graph structure we can assume all customer service locations including depot as vertices V and relations between these customer locations are different edges E ⊆ (V × V ) with different assigned costs and popularly this formal graph structure is represented as G = (V, E). The n number of customers including depot are indexed i, j ∈ {1, 2, ..., n} and i = 1 refers to the depot. The vehicles are indexed k ∈ {1, ..., K} with constant K number of vehicles in services. Qk is the capacity of the k th vehicle and for similar type of such vehicle this capacities are equal and a fixed constant Q. cij is the cost entry which is representing the constant travel cost for the demand qj of the customer vertex vi . A route is a sequence of service points of customers to be visited by each vehicle with starting and ending service points are the depot. For every single route there is a maximum capacity limit, input customers will be added to the route up-to the maximum limit, and the excess customers will be the seed customers for the newly assigned route. The main objective of this problem formulation is to find the best possible solutions after identifying the feasible solutions, so that cost is minimum with customer satisfactions. A mathematical formulation of the this Bi-Level CVRP is as follow:
BLOP Using IBFOA
271
⎧ K n n n Leader : minF (X, Y, Z) = K ⎪ k=1 dk zk + k=1 j=1 ckj xkj + i=1 j=1 cij yij ⎪ ⎪ ⎪ X,Z ⎧ ⎪ ⎪ K ⎪ ⎪ ⎪ ⎪ ⎪ k=1 xkj = 1, ∀j = 1, ..., n ⎪ ⎪ ⎪ ⎪ K ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ k=1 zk = K ⎪ ⎪ ⎪ ⎪ n ⎪ ⎪ ⎪ q x ≤ (Q − qk )zk , ∀kK ⎪ ⎪ ⎪ ⎪ j=1 j kj ⎪ n ⎪ ⎨ ⎪ ⎪ Follower :minf (X, Y, Z) = n ⎪ i=1 j=1 cij yij ⎨ Y ⎪ ⎪ s.t. s.t. ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ n ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪i=1 yij = 1, ∀j = 1, ..., n ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ n yij = 1, ∀i = 1, ..., n ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ j=1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ iV jV yij ≤ |S| − 1, ∀S ⊂ V, S = ∅ ⎪ ⎪ ⎪ ⎩ ⎩y ≤ x ∀i, j = 1, ..., n ∧ k = 1, ..., K ij i,k
(2) where variables are:
xkj =
1 for k as a seed customer zk = 0 otherwise
(3)
1 when both seed customer k and any customer j are in the same route 0 otherwise
(4) and
1 if ei,j ∈ {E} is in the route yij = 0 otherwise
(5)
Leader problem decide about the assignments for customer services with required constraints like seeds, available vehicles and its capacity, number of vehicles/operators will be assigned etc. to achieve maximum earning and/or minimum cost. To the Follower there is one sub-graph with multiple possible routes and the assigned vehicle must start from depot and then follow a sequence vertices once for each and finally terminate at the depot such that the total travel cost, time etc. must be minimum. 4.3
Bacteria Foraging Optimization Algorithm for CVRP
Classical Bacteria Foraging computation works in the continuous environment or in the domain of real numbers and the bacteria position vector i.e. the solution in the feasible space is a vector of floating point number. In the CVRP customer location are represented as pair of floating point numbers (x, y) in plane. Here for a mapping from continuous space to discrete space requires mapping and 1 for jth dimension of n sigmoid function can be used for it sigmoid(θji ) = −θ i 1+e
j
dimensional position vector θ of the ith bacteria. Each bacteria position in the feasible solution space represent one possible solution - where all n customers partitions into K or less numbers of different routes against different vehicle after maintaining the capacity limit of the vehicle and this is the functionality of
272
G. Mahapatra et al.
the Leader. In our proposed work the sweep algorithm, proposed by Gillett and Miller [26], is used to generate the initial solution. This algorithm starts with randomly selecting a customer k as reference. Then the line connecting between the depot and the customer k is the reference line for calculating the polar angles of all customers after converting the 2-D cartesian coordinate supply as input into polar form. All customers with polar angle magnitude are arranged in ascending order. Now the customers with required load are assigned to a route taking sequentially from this sorted list until exceed the load capacity, and new route is formed for next customers. In this way initial Upper-Level solutions are created. Polar coordinates are real numbers are used to defined a n-dimensional vector positions of the bacteria. For tumbling and swimming in chemotaxis phase of the bacteria randomization used in the Follower level is performed on the possible polar locations of the customers only. For stretching process as mention both in BiBFOA & BiIBFOA we also used these converted sorted polar coordinates. In the elimination-dispersal phase of the bacteria system one arbitrary bacteria will be eliminated from the colony and using above sweeping technique new random solution is created and inserted into the colony so that global solution can be obtained. After customer assigned to a route sequencing of customer in particular sequence is a TSP like problem will be solved in bacteria foraging system.
5
Experimental Results
The BiBFOA and BiIBFOA were implemented in C# language under the framework MS Visual Studio 15 under MS Windows 8 running Intel Core i3 processor based hardware system. Standard parameter settings for bacteria foraging system as proposed by K. M. Passino has been used and algorithms are executed for thirty number of bacteria. To validate the proposed new meta-heuristic based CVRP after framing as BLOP, implementations have been executed for fourteen benchmark CVRPs developed by Christofides et al. [32]. In the problems co-ordinates or positions of customers mentioned in (x, y) co-ordinate form and in the design process we consider each customer as dimension of n dimensional position vector of a bacteria, but planner points are distributed in total constant 360◦ angle so the dimensional values are defined by the polar coordinates. Same problem in the set of the benchmark has been executed for 18 times and in Table 1 we are showing one instance of result for Problem No. 1 with different 5 possible routes. This problem is defined with fifty number of customer (n = 50) and equal and fixed capacity of each truck or vehicle (Q = 160) and two dimensional position for customer locations. The result is showing the sequence customer service points, load served by the route and corresponding cost involved. This has the optimal solution with cost 524.61 and our algorithm has archived it. In Table 2 experimental the results all 14 Benchmark Problems has been shown. Note that except problem number CMT6–CMT10 & CMT13 all has known optimal cost or distance value and the proposed method is achieving this target value in most of the cases. The BKS i.e. Best Known Solution column
BLOP Using IBFOA
273
is for know optimal values. Table 2 is also representing a comparative study of our implementations with other existing algorithms available in the literatures. Table 1. Results of BiBFOA for benchmark CVRP tables Problem No. 1 (n = 50, Q = 160) Rts Sequence of customers in the route
Load Cost
R1: 0 → 38 → 9 → 30 → 34 → 50 → 16 → 21 → 29 → 2 → 11 → 0
159
99.33
R2: 0 → 32 → 1 → 22 → 20 → 35 → 36 → 3 → 28 → 31 → 26 → 8 → 0
149
118.52
R3: 0 → 27 → 48 → 23 → 7 → 43 → 24 → 25 → 14 → 6 → 0
152
98.45
R4: 0 → 18 → 13 → 41 → 40 → 19 → 42 → 17 → 4 → 47 → 0
157
109.06
R5: 0 → 12 → 37 → 44 → 15 → 45 → 33 → 39 → 10 → 49 → 5 → 46 → 0 157
109.06
Table 2. Comparisons of BiBFOA & BiIBFOA with other Meta-heuristics solutions for Christofides Benchmark instances (Cost) Prob.
6
n
Q
K
BKS
FJ
PSO
PSOBilevel
VRPBilevel
BiBFOA
BiIBFOA
CMT1
50
160
5
524.61
524
531.16
524.61
524.61
524.61
524.61
CMT2
75
140
10
835.26
857
835.26
835.26
835.26
835.32
835.26
CMT3
100
200
8
826.14
826.14
826.14
826.14
826.14
826.14
CMT4
150
200
12
1028.42
1014
1046.42
1028.42
1028.42
1028.42
1028.42
CMT5
199
200
17
1291.45
1420
1325.68
1291.38
1306.17
1291.61
1291.45
CMT6
50
160
6
555.43
560
555.43
555.43
555.43
555.43
555.43
CMT7
75
140
11
909.68
916
913.24
909.68
909.68
916.32
906.68
CMT8
100
200
9
865.94
865.94
865.94
865.94
866.75
865.94
CMT9
150
200
14
1162.55
1230
1173.25
1165.58
1177.76
1173.25
1165.56
CMT10
199
200
18
1395.85
1518
1431.16
1396.05
1404.75
1417.85
1404.75
CMT11
120
200
7
1042.11
–
1046.35
1043.28
1051.73
1046.35
1042.11
CMT12
100
200
10
819.56
819.56
819.56
825.57
819.56
819.56
CMT13
120
200
11
1541.07
1544.83
1544.07
1555.39
1545.98
1544.07
CMT14
100
200
11
866.37
866.37
866.37
875.35
890.67
866.37
833
885
824 – 848
Conclusions
Present work is showing how the Improved Bacteria Foraging Optimization Algorithm can be used for the formulations of BLOPs and for the test of this implementation popular VRP as a BLOP has been used. Results are showing that the use of improved bacteria system is effective and most of cases the tests are outperforming other commonly used algorithms and classical form of the BFOA. Consistent performances of the proposed algorithm is showing that it is effective can be used for this types of complex problems as effective BLOP solving technique. There are several issues for the different local improvements for fast computations and quick finding of global solutions which may be used in the implementation. Also, more analysis like execution time, convergency, robustness etc. should considered in the studies. We need to incorporate additional
274
G. Mahapatra et al.
features to support limited window size for CVRP and the dynamic behaviour as well. BFOA is inherently a parallel computation technique and it may be implemented in a multiprocessor based system like GPU based or distributed system for fast solving. Again for finding sequence of customer service in a particular route is also can be done by the different set of bacteria colony who are running in parallel for the exploitation of inner level fine grain of parallelism. We like to use this algorithm for other most recently developed benchmark problem to test the robust and scalability of the algorithm with different critical analysis to measure and compare the performance of this proposed method.
References 1. Beyer, I.: Information technology-based logistics planning: approaches to developing a coordination mechanism for decentralized planning. Commun. IIMA 6(3), 117–119 (2006) 2. Katsoulakos, N.M., Kaliampakos, D.C.: Mountainous areas and decentralized energy planning: Insights from Greece. Energy Policy 91, 174–188 (2016) 3. Passino, K.M.: Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Systems 22(3), 52–67 (2002) 4. Simo, A., Barbulescu, C.: GA based multi-stage transmission network expansion planning. In: International Workshop Soft Computing Applications, pp. 47-59. Springer, Cham (2016) 5. Bracken, J., McGill, J.T.: Mathematical programs with optimization problems in the constraints. Oper. Res. 21(1), 37–44 (1973) 6. Candler, W., Norton, R.: Multi-level programming and development policy. The World Bank (1977) 7. Colson, B., Marcotte, P., Savard, G.: An overview of bilevel optimization. Ann. Oper. Res. 153(1), 235–256 (2007) 8. Migdalas, A., Pardalos, P.M., V¨ arbrand, P. (eds.): Multilevel Optimization: Algorithms and Applications, vol. 20. Springer, Boston (2013) 9. Bard, J.F.: Practical Bilevel Optimization: Algorithms and Applications, vol. 30. Springer, Dordrecht (2013) 10. Dempe, S., Kalashnikov, V., P´erez-Vald´es, G.A., Kalashnykova, N.: Bilevel programming problems. Energy Systems. Springer, Berlin (2015) 11. Vicente, L.N., Calamai, P.H.: Bilevel and multilevel programming: a bibliography review. J. Global Optim. 5(3), 291–306 (1994) 12. Mahapatra, G., Banerjee, S.: Bilevel optimization using firefly algorithm. In: 5th International Conference (IEMCON 2014) Proceedings, pp. 1–7. Elsavier Publication, Kolkata, October 2014 13. Mahapatra, G., Banerjee, S., Suganthan, P.N.: Bilevel optimization using bacteria foraging optimization algorithm. In: International Conference on Swarm, Evolutionary, and Memetic Computing, pp. 351–362. Springer, Bhubaneswer, December 2014 14. Jia, L., Wang, Y., Fan, L.: Multiobjective bilevel optimization for productiondistribution planning problems using hybrid genetic algorithm. Integr. Comput.Aided Eng. 21(1), 77–90 (2014) 15. Colson, B., Marcotte, P., Savard, G.: Bilevel programming: a survey. 4OR 3(2), 87–107 (2005)
BLOP Using IBFOA
275
16. Hou, X., Haijema, R., Liu, D.: A bilevel stochastic dynamic programming model to assess the value of information on actual food quality at wholesale markets. Math. Probl. Eng. (2017) 17. D’Amato, E., Notaro, I., Silvestre, F., Mattei, M.: Bi-level flight path optimization for UAV formations. In: 2017 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 690–697. IEEE, June 2017 18. Kalashnikov, V., Dempe, S., Mordukhovich, B., Kavun, S.V.: Bilevel optimal control, equilibrium, and combinatorial problems with applications to engineering. Math. Probl. Eng. (2017) 19. Stackelberg, H.V.: Marktform und gleichgewicht. Springer, Vienna (1934) 20. Talbi, E.G.: A taxonomy of metaheuristics for bi-level optimization. In: Talbi, E.G. (eds.) Metaheuristics for Bi-level Optimization. Studies in Computational Intelligence, vol. 482. Springer, Heidelberg (2013) 21. Mathieu, R., Pittard, L., Anandalingam, G.: Genetic algorithm based approach to bi-level linear programming. RAIRO-Oper. Res. 28(1), 1–21 (1994) 22. Xu, J., Li, Z., Tao, Z.: Random-Like Bi-level Decision Making, vol. 688, pp. 1–38. Springer (2016) 23. Dempe, S., Zemkoho, A.B.: On the Karush-Kuhn-Tucker reformulation of the bilevel optimization problem. Nonlinear Anal. Theory Methods Appl. 75(3), 1202– 1218 (2012) 24. Sinha, A., Malo, P., Deb, K.: A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Trans. Evolut. Comput. 22(2), 276–295 (2017) 25. Mahapatra, G., Banerjee, S.: An object-oriented implementation of bacteria foraging system for data clustering application. In: 2015 International Conference and Workshop Computing and Communication (IEMCON), Vancuver, Canada, pp. 1–7. IEEE, October 2015 26. Parsopoulos, K.E., Vrahatis, M.N.: Recent approaches to global optimization problems through particle swarm optimization. Natural Comput. 1(2–3), 235–306 (2002) 27. Pan, J., Manocha, D.: Bi-level locality sensitive hashing for k-nearest neighbor computation. In: 2012 IEEE 28th International Conference Data Engineering (ICDE), pp.378–389. IEEE, April 2012 28. Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Manag. Sci. 6(1), 80–91 (1959) 29. Gillett, B.E., Miller, L.R.: A heuristic algorithm for the vehicle-dispatch problem. Oper. Res. 22(2), 340–349 (1974) 30. Gendreau, M., Potvin, J. Y., Br¨ aumlaysy, O., Hasle, G., Løkketangen, A.: Metaheuristics for the vehicle routing problem and its extensions: a categorized bibliography. In: The vehicle routing problem: latest advances and new challenges, pp. 143–169. Springer, Boston (2008) 31. Fisher, M.L., Jaikumar, R.: A generalized assignment heuristic for vehicle routing. Networks 11(2), 109–124 (1981) 32. Christofides, N.: The traveling salesman problem. Comb. Optim., 131–149 (1979)
Cubic Hesitant Fuzzy Heronian Mean Operators and Their Application in Multi Criteria Decision Making Faisal Mehmood1(B) , Khizar Hayat2 , Tahir Mahmood3 , and Muhammad Arif4 1
Beijing Key Laboratory on MCAACI, School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 102488, China [email protected], [email protected] 2 School of Mathematics and Information Sciences, Guangzhou University, Guangzhou 510006, China [email protected] 3 Department of Mathematics and Statistics, International Islamic University, Islamabad 44000, Pakistan [email protected] 4 Department of Computer Science and Technology, Guangzhou University, Guangzhou 510006, China [email protected]
Abstract. This paper contains Heronian mean operators for cubic hesitant fuzzy sets. These operators include cubic hesitant fuzzy Heronian mean, cubic hesitant fuzzy geometric Heronian mean, cubic hesitant fuzzy weighted heronian mean, cubic hesitant fuzzy weighted geometric Heronian mean operators that help to aggregate the given information and show the association amongst given arguments. In the end, a multi criteria decision making problem has been solved through cubic hesitant fuzzy weighted geometric Heronian mean operators. Keywords: Cubic hesitant fuzzy sets · Heronian means · Cubic hesitant fuzzy weighted geometric Heronian means · Decision makings
1
Introduction
L. A. Zadeh developed the theory of fuzzy sets [39] which helps to tackle ambiguous information. A fuzzy set has a membership value under certain criteria and that value belong to [0, 1]. Since its beginning many extensions for this theory has been determined which includes hesitant fuzzy sets (HFSs) [24,25], interval valued fuzzy sets (IVFSs) [40], interval valued intuitionistic fuzzy sets (IVIFSs) [3], interval valued hesitant fuzzy sets (IVHFSs) [6,8], intuitionistic fuzzy sets (IFSs) [2], cubic sets (CSs) [12], cubic hesitant fuzzy sets (CHFSs) [20,21] etc. V. Torra defined HFSs [24,25] which describes a very vital generalization of fuzzy sets. HFSs are defined in view of the fact when someone hesitate or unable to reach a final decision, so we have a collection of some values between [0, 1] that c Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 276–288, 2021. https://doi.org/10.1007/978-3-030-52190-5_20
Cubic Hesitant Fuzzy Heronian Mean Operators and Their Application
277
shows the hesitancy in view of certain criteria. L. A. Zadeh introduced IVFSs [40] that allows the membership value to be a subinterval of [0, 1]. N. Chen et al [6] defined IVHFSs which describes the hesitation amongst subintervals of [0, 1]. Y. B. Jun [12] defined cubic sets (CSs) which have two membership values one is the subinterval of [0, 1] and other is the fuzzy value that belong to [0, 1]. T. Mahmood et al. [20,21] defined CHFSs and also gave their generalized aggregation operators which helped to solve complicated real life problems of multi criteria decision making (MCDM). Many scientists and researchers have applied MCDM [1,5,7,9–11,13,14,17– 20,22,23,26,32,33,36,38] method to solve their problems by using fuzzy techniques, so it is an important and widely used method in recent times. In view of this fact, many aggregation operators (AOs) [20,21,29,30] have been defined. These include hesitant fuzzy aggregation operators [27,28], intutionistic fuzzy aggregation operators [31], weighted averaging and weighted geometric aggregation operators and its generalization for CHFSs [20,21] etc. The generalized aggregation operators introduced by Mahmood et al. [21] are just the operators which used to aggregate the cubic hesitant fuzzy information and do not describe the relationship among the arguments in a MCDM problem. So there was a need to define such an operator which has the characteristic to show the relationship amongst the arguments in a MCDM problem. There is a productive aggregation operator which shows association amongst given subject matter called Heronian mean (HM) [34] operator. HM is between arithmetic and geometric mean and has a most powerful result as compare to the other defined operators due to its interrelationship ability amongst the given arguments. Heronian mean operators (HMO) [15–18,35] have been widely used to aggregate the fuzzy information. In this work we defined HMOs for CHFSs and applied them in a MCDM problem. D. Yu and Y. Wu [34] introduced HMOs for IVIFSs. D. Yu [36] defined HMOs for HFSs and practiced them to solve a MCDM problem. S. Yu et al. [37] introduced a MCDM method for linguistic hesitant fuzzy information through HM. P. Liu and S. M. Chen [17] defined group MCDM based on Heronian aggregation operators (HAOs) of intuitionistic fuzzy numbers. P. Liu and L. Zhang [18] introduced MCDM method based on neutrosophic hesitant fuzzy HAOs. This paper is arranged as follows: Sect. 2 describes fundamental concepts which are used in proposed results. Section 3 contains Heronian mean operators for CHFSs. In Sect. 4 an algorithm is defined in order to solve a MCDM problem through CHFWGHM operators. In Sect. 5 proposed algorithm of Sect. 4 is used to solve an example of real life problem. Section 6 provides conclusion of this paper.
2
Preliminaries
This section contains the basic notations which are useful to originate the new results for this paper. Definition 2.1 [39]. Let G = ∅. A function f : G → Q, where Q = [0, 1]. Then f is called a fuzzy set.
278
F. Mehmood et al.
Definition 2.2 [40]. Let G = ∅. A function p from G to the set of closed intervals in [0, 1] is called interval valued fuzzy set (IVFS). Definition 2.3 [12]. Let G be a non-empty set. A cubic set on G is defined by, A = {< g, p(g), f (g) > /g ∈ G}, Where p(g) represents interval valued fuzzy set (IVFS) on G and f (g) is a fuzzy set on G. Definition 2.4 [24,25]. Let G be a non-empty set. A hesitant fuzzy set (HFS) is a mapping that when applied on G obtains a finite subset of [0, 1], which is defined by, H = {< g, r(g) > /g ∈ G}, Where r(g) is a set of distinct values in [0, 1] which represents possible membership values of the element g ∈ G. For convenience, Xia and Xu [27] called r(g) a hesitant fuzzy element (briefly, HFE). Definition 2.5 [6]. Let G = ∅ and S[0, 1] denotes the collection of closed subintervals of [0, 1]. An interval valued hesitant fuzzy set (IVHFS) on G is defined by, M = {< ge , c(ge ) > /ge ∈ G, e = 1, 2, ..., n}, Where c(ge ) : G → S[0, 1] shows all possible interval valued membership values of the element ge ∈ G to M . For the sake of simplicity we define c(ge ) as interval valued hesitant fuzzy element (briefly, IVHFE). Definition 2.6 [20,21]. Let G = ∅. A cubic hesitant fuzzy set (for short, CHFS) is defined as follows, J = {< g, c(g), r(g) > /g ∈ G}, Where c(g) is an IVHFE and r(g) is HFE. Definition 2.7 [20,21]. Let J = {< g, c(g), r(g) > /g ∈ G} and L = {< g, d(g), s(g) > /g ∈ G} be any two CHFSs on a non-empty set G then addition of J and L is defined by, J ⊕ L = {< g, α ∈ c(g) + d(g), β ∈ r(g) + s(g)/{[δi− + τi− − δi− τi− , δi+ + τi+ − δi+ τi+ ]}, {βi νi } >}. Definition 2.8 [20,21]. Let J = {< g, c(g), r(g) > /g ∈ G} and L = {< g, d(g), s(g) > /g ∈ G} be any two CHFSs on a non-empty set G then multiplication of J and L is defined by, J ⊗L = {< g, α ∈ c(g) × d(g), β ∈ r(g) × s(g)/{[δi− τi− , δi+ τi+ ]}, {βi +νi −βi νi } >}.
Cubic Hesitant Fuzzy Heronian Mean Operators and Their Application
279
Definition 2.9 [20,21]. Assume J = {< g, c(g), r(g) > /g ∈ G} is the CHFS on a non-empty set G with χ > 0, then operations on J are defined as follows, 1. χJ = {< g, χα = χ[α− , α+ ] ∈ (χc)(g), χβ ∈ (χr)(g)/{[1 − (1 − δi− )χ , 1 − (1 − δi+ )χ ]}, {βiχ } >}. 2. J χ = {< g, αχ = [α− , α+ ]χ ∈ cχ (g), β χ ∈ rχ (g)/{[(δi− )χ , (δi+ )χ ]}, {1 − (1 − βi )χ } >}. 3. J c = {< g, αc = [α− , α+ ]c ∈ C c (g), β c ∈ rc (g)/{[1 − δi+ , 1 − δi− ]}, {1 − βi } >}. Definition 2.10 [20,21]. Let G be a non-empty set and J = {< g, c(g), r(g) > /g ∈ G} be a CHFS on G. Then cubic hesitant fuzzy element (briefly, CHFE) on G is defined by, ch = {< δi = [δi− , δi+ ] ∈ c(g), βi ∈ r(g)/{[δi− , δi+ ]}, {βi } >}, Where c(g) represents IVHFE and r(g) represents HFE. Definition 2.11 [20,21]. Let ch = {< δi = [δi− , δi+ ] ∈ C(g), βi ∈ r(g)/{[δi− , δi+ ]}, {βi } >} be a CHFE on a non-empty set G, then the score of CHFE is given by, V (ch) =
1 (ch) (δi− + δi+ − + βi ), (ch) 2
Where δi = [δi− , δi+ ] ∈ C(g) (an IVHFE), βi ∈ r(g) (HFE) for all g ∈ G, (ch) represents number of elements in CHF E. Definition 2.12 [4]. Let el (l = 1, 2, ..., s) be a set of real numbers with el > 0, 1 ≤ l ≤ s then, BHM (e1 , e2 , ..., es ) =
s s √ 2 el em , l=1 m=l s(s + 1)
Here BHM is called as basic Heronian mean. Definition 2.13 [34]. Let el (l = 1, 2, ...s) be a set of real numbers with el > 0, 1 ≤ l ≤ s then, HM
ˆ j h,ˆ
(e1 , e2 , ..., es ) =
s s 2 ˆ ˆ ehl ejm l=1 m=l s(s + 1)
ˆ1
h+ˆ j
,
Here HM is called as Heronian mean. Definition 2.14 [35]. Suppose el (l = 1, 2, ...s) is a collection of real numbers ˆ and jˆ (h ˆ ≥ 0, jˆ ≥ 0), h, ˆ jˆ are not zero with el > 0, 1 ≤ l ≤ s having parameters h at the same time, then geometric Heronian mean (GHM) is defined by, 2 s 1 ˆ ˆ l + jˆem s(s+1)) . GHM h,ˆj (e1 , ee , ..., es ) = he ˆ + jˆ l=1,m=l h
280
F. Mehmood et al.
Definition 2.15 [36]. Assume rm (l = 1, 2, ..., s) is a collection of HFEs, the HFHM is defined as below, HF HM (r1 , r2 , ..., rs ) =
s s 2 ˆ jˆ rlh ⊗ rm l=1 m=l s(s + 1)
ˆ1
h+ˆ j
,
ˆ = jˆ = here HFHM is called as hesitant fuzzy Heronian mean. When h HFHM becomes as basic HFHM (BHFHM), BHF HM (r1 , r2 , ...rs ) =
1 2
then
s s √ 2 rl ⊗ rm . l=1 m=l s(s + 1)
Definition 2.16 [36]. Let rm (l = 1, 2, ..., s) be a collection of HFEs, hesitant fuzzy geometric Heronian mean (HFGHM) is defined as follows, 2 1 ˆ l ⊕ jˆrm s(s+1) , HF GHM (r1 , r2 , ..., rs ) = ⊗sl=1,m=l hr ˆ + jˆ h ˆ = jˆ = When h
1 2
HFGHM becomes as basic HFGHM (BHFGHM),
BHF GHM (r1 , r2 , ..., rs ) =
⊗sl=1,m=l
2 s(s+1) 1 (rl ⊕ rm ) . 2
Definition 2.17 [36]. Let rm (l = 1, 2, ..., s) be a set of HFEs having weight s Tˇ vector ψ = (ψ1 , ψ2 , ..., ψs ) , ψl > 0 and l=1 ψl = 1, then weighted HFHM (HFWHM) and weighted HFGHM (HFWGHM) is defined as follows,
ˆ1 s s j ˆ 2 jˆ h+ˆ h (ψl rl ) ⊗ (ψm rm ) , l=1 m=l s(s + 1)
2 s(s+1) ψl 1 ψm s ˆ HF W GHM (r1 , r2 , ..., rs ) = ⊗l=1,m=l ⊕ (ˆ jrm ) . hrl ˆ + jˆ h
HF W HM (r1 , r2 , ..., rs ) =
Theorem 2.18 [20,21]. Let G be a non-empty set, ch, ch1 , ch2 be the CHFEs on G and χ ≥ 0 then, ch1 ⊕ ch2 , ch1 ⊗ ch2 , χch and chχ are also CHFEs.
3
Cubic Hesitant Fuzzy Heronian Means (CHFHMs)
In this section we define Heronian mean operators for CHFSs which helps to aggregate the cubic hesitant fuzzy information. Definition 3.1. Assume chm (l = 1, 2, ..., s) is a collection of CHFEs, then cubic hesitant fuzzy Heronian mean (CHFHM) is defined by, CHF HM (ch1 , ch2 , ..., chs ) =
s s 2 ˆ ˆ chhl ⊗ chjm l=1 m=l s(s + 1)
ˆ1
h+ˆ j
,
Cubic Hesitant Fuzzy Heronian Mean Operators and Their Application
ˆ = jˆ = When h (BCHFHM) as,
1 2,
281
then the CHFHM operator becomes basic CHFHM s s 2 chl ⊗ chm . l=1 m=l s(s + 1)
BCHF HM (ch1 , ch2 , ..., chs ) =
Theorem 3.2. Suppose chm (l = 1, 2, ..., s) is a collection of CHFEs with two ˆ and jˆ where h ˆ > 0, jˆ > 0 then the aggregated result by applying parameters h CHFHM operator is defined as follows, CHF HM (ch1 , ch2 , ..., chs ) = 1 − j ˆ ˆ j ) )) h+ˆ , (1 m
ˆ − h l
(δi ) (δi ˆ h
βi ) (1 − βim l
s
2
l=1
s(s + 1)
ˆ h
m=l
− Πl=1,m=l (1 − (δi ) (δi
1 2 j ˆ ˆ j ) ) s(s+1) ) h+ˆ }
1 j ˆ
chl ⊗ chm
1 + j ˆ ˆ j ) )) h+ˆ ]}, {1 m
ˆ + h l
s
s
ˆ j h+ˆ
s
= {< {[(1 − Πl=1,m=l (1 −
s
− (1 − Πl=1,m=l (1 − (1 −
>}.
Proof. By applying operational law 2 of Definition 2.9 we have, chhl = {< {[(δi−l )h , (δi+l )h ]}, {1 − (1 − βil )h } >}, ˆ
ˆ
ˆ
ˆ
chhm = {< {[(δi−m )jˆ, (δi+m )jˆ]}, {1 − (1 − βim )jˆ} >}, ˆ
By applying Definition 2.8 we have, ˆ
ˆ
ˆ
ˆ
ˆ chhl ⊗ chjm = {< {[(δi−l )h (δi−m )jˆ, (δi+l )h (δi+m )jˆ]}, {1 − (1 − βil )h + 1 − (1 − βim )jˆ − ˆ
(1 − (1 − βil )h )(1 − (1 − βim )jˆ)} >}, ˆ
ˆ
ˆ
ˆ
ˆ chhl ⊗ chjm = {< {[(δi−l )h (δi−m )jˆ, (δi+l )h (δi+m )jˆ]}, {1 − (1 − βil )h (1 − βim )jˆ} >},
By applying Definition 2.7 we have, s
s
l=1
ˆ
m=l
ˆ
ˆ s s chhl ⊗ chjm = {< {[1 − Πl=1,m=l (1 − (δi−l )h (δi−m )jˆ), 1 − Πl=1,m=l (1 −
ˆ
ˆ
s (1 − (1 − βil )h (1 − βim )jˆ)} >}, (δi+l )h (δi+m )jˆ)]}, {Πl=1,m=l
By applying operational law 1 of Definition 2.9 we have, s s 2 ˆ ˆ 2 ˆ s chhl ⊗ chjm = {< {[1 − Πl=1,m=l (1 − (δi−l )h (δi−m )jˆ) s(s+1) , 1 − l=1 m=l s(s + 1) 2
ˆ
2
ˆ
s s (1 − (δi+l )h (δi+m )jˆ) s(s+1) ]}, {Πl=1,m=l (1 − (1 − βil )h (1 − βim )jˆ) s(s+1) } >}, Πl=1,m=l
Now by applying operational law 2 of Definition 2.9 we have, CHF HM (ch1 , ch2 , ..., chs ) = ˆ − h
−
j ˆ
1
s
2 s(s + 1)
s
ˆ + h
s
l=1 +
ˆ h
m=l j ˆ
1
1 j ˆ
chl ⊗ chm
ˆ j h+ˆ
s
s
= {< {[(1 − Πl=1,m=l (1 −
ˆ j ˆ j , (1 − Πl=1,m=l (1 − (δi ) (δi ) )) h+ˆ ]}, {1 − (1 − Πl=1,m=l (1 − (1 − (δi ) (δi ) )) h+ˆ m m l l ˆ h
j ˆ
2
1
ˆ j } >}, βi ) (1 − βim ) ) s(s+1) ) h+ˆ l
Hence we obtained the required result.
282
F. Mehmood et al.
Definition 3.3. Assume chm (l = 1, 2, ..., s) is a collection of CHFEs, then cubic hesitant fuzzy geometric Heronian mean (CHFGHM) is given by, CHF GHM (ch1 , ch2 , ..., chs ) =
1 ˆ + jˆ h
2 ˆ l ⊗ jˆchm s(s+1) . ⊗sl=1,m=l hch
Theorem 3.4. Suppose chm (l = 1, 2, ..., s) is a collection of CHFEs with two ˆ and jˆ where h ˆ > 0, jˆ > 0 then the aggregated result by applying parameters h CHFGHM operator is defined as follows, CHF GHM (ch1 , ch2 , ..., chs ) =
2 1 ˆ l ⊗ jˆchm )) s(s+1) = {< {[1 − (1 − (⊗s (hch ˆ + jˆ l=1,m=l h 1
2
ˆ
ˆ
s s ˆ j (1 − (1 − δi− )h (1 − δi−m )jˆ) s(s+1) ) h+ˆ , 1 − (1 − Πl=1,m=l (1 − (1 − δi+ )h (1 − Πl=1,m=l l
2
l
1
2
ˆ
1
s ˆ j ˆ j δi+m )jˆ) s(s+1) ) h+ˆ ]}, {(1 − Πl=1,m=l (1 − βihl βijˆm ) s(s+1) ) h+ˆ } >}.
Proof. By applying operational law 1 of Definition 2.9 we have, ˆ l = {< {[1 − (1 − δ − )hˆ , 1 − (1 − δ + )hˆ ]}, {β hˆ } >}, hch il il il jˆchm = {< {[1 − (1 − δi−m )jˆ, 1 − (1 − δi+m )jˆ]}, {βijˆm } >}, By applying Definition 2.7 we obtain, ˆ l ⊕ jˆchm = {< {[1−(1−δ − )hˆ (1−δ − )jˆ, 1−(1−δ + )hˆ (1−δ + )jˆ]}, {β hˆ β jˆ } >}, hch i l im il im il im By applying Definition 2.8 we have, ˆ
s s ˆ l ⊗ jˆchm ) = {< {[Πl=1,m=l ⊗sl=1,m=l (hch 1 − (1 − δi−l )h (1 − δi−m )jˆ, Πl=1,m=l 1 − (1 − ˆ
ˆ
s (1 − βihl βijˆm )} >}, δi+l )h (1 − δi+m )jˆ]}, {1 − Πl=1,m=l
By applying operational law 2 of Definition 2.9 we get, 2
s
s
ˆ − h l 2 ) s(s+1) }
2 − j ˆ ) ) s(s+1) m
ˆ l ⊗ jˆchm ) s(s+1) = {< {[Π ⊗l=1,m=l (hch l=1,m=l (1 − (1 − δi ) (1 − δi ˆ + h l
(1 − δi ) (1 −
2 + j ˆ δi ) ) s(s+1) m
ˆ h
s
j ˆ
]}, {1 − Πl=1,m=l (1 − βi βi m l
s
, Πl=1,m=l (1 −
>},
Now by applying operational law 1 of Definition 2.9 we obtain, CHF GHM (ch1 , ch2 , ..., chs ) = ˆ
2 1 ˆ l ⊗ jˆchm )) s(s+1) = {< {[1 − (1 − (⊗sl=1,m=l (hch ˆ h + jˆ 1
2
ˆ
s s ˆ j (1 − (1 − δi− )h (1 − δi−m )jˆ) s(s+1) ) h+ˆ , 1 − (1 − Πl=1,m=l (1 − (1 − δi+ )h (1 − Πl=1,m=l l
2
1
l
ˆ
2
1
s ˆ j ˆ j δi+m )jˆ) s(s+1) ) h+ˆ ]}, {(1 − Πl=1,m=l (1 − βihl βijˆm ) s(s+1) ) h+ˆ } >}.
Hence we obtained the required result.
Cubic Hesitant Fuzzy Heronian Mean Operators and Their Application
283
Definition 3.5. Suppose chm (l = 1, 2, ..., s) is a set of CHFEs with two paramˆ and jˆ where h ˆ > 0, jˆ > 0 and weight vector ψ = (ψ1 , ψ2 , ..., ψs )Tˇ , ψl > eters h s 0 where l=1 ψl = 1 then cubic hesitant fuzzy weighted Heronian mean (CHFWHM) is defined by, CHF W HM (ch1 , ch2 , ..., chs ) =
ˆ1 s s j ˆ 2 h jˆ h+ˆ (ψl chl ) ⊗ (ψm chm ) . l=1 m=l s(s + 1)
Theorem 3.6. Suppose chm (l = 1, 2, ..., s) is a set of CHFEs with two paramˆ and jˆ where h ˆ > 0, jˆ > 0 and weight vector ψ = (ψ1 , ψ2 , ..., ψs )Tˇ , ψl > 0 eters h s where l=1 ψl = 1 then the aggregated result by applying CHFWHM operator is defined by, ˆ
s − ψ h CHF W HM (ch1 , ch2 , ..., chs ) = {< {[(1 − Πl=1,m=l (1 − (1 − (1 − δi ) l ) (1 − (1 − l
2 − ψm j ˆ ) ) ) s(s+1) m
δi
1
1
2
ˆ
s + ψ h + ψ j ˆ ˆ j ˆ j ) h+ˆ , (1 − Πl=1,m=l (1 − (1 − (1 − δi ) l ) (1 − (1 − δi ) m ) ) s(s+1) ) h+ˆ ]}, {1 − m
l
1
2
ˆ ψ h ψ s j ˆ ˆ j } >}. (1 − Πl=1,m=l (1 − (1 − βi l ) (1 − βi m ) ) s(s+1) ) h+ˆ m
l
Proof. The proof is similar to Theorem 3.2 and Theorem 3.4, so we omit the proof. Definition 3.7. Suppose chm (l = 1, 2, ..., s) is a set of CHFEs with two paramˆ and jˆ where h ˆ > 0, jˆ > 0 and weight vector ψ = (ψ1 , ψ2 , ..., ψs )Tˇ , ψl > 0 eters h s where l=1 ψl = 1 then cubic hesitant fuzzy weighted geometric Heronian mean (CHFWGHM) is defined by, CHF W GHM (ch1 , ch2 , ..., chs ) =
1 ˆ + jˆ h
⊗sl=1,m=l
ˆ l hch
ψl
ψm
⊕ (ˆ jchm )
2 s(s+1)
.
Theorem 3.8. Suppose chm (l = 1, 2, ..., s) is a set of CHFEs with two paramˆ and jˆ where h ˆ > 0, jˆ > 0 and weight vector ψ = (ψ1 , ψ2 , ..., ψs )Tˇ , ψl > 0 eters h s where l=1 ψl = 1 then the aggregated result by applying CHFWGHM operator is defined as follows, ˆ − ψl h ) (1 l
s
CHF W GHM (ch1 , ch2 , ..., chs ) = {< {[1 − (1 − Πl=1,m=l (1 − (1 − (δi ) 2 − ψ j ˆ (δim ) m ) ) s(s+1) s
1 ˆ j ) h+ˆ ,1
s
ˆ + ψl h ) (1 − l 1 2 ψ j ˆ ˆ j ) m ) ) s(s+1) ) h+ˆ }
− (1 − Πl=1,m=l (1 − (1 − (δi )
Πl=1,m=l (1 − (1 − (1 − βil )
ˆ ψl h ) (1
− (1 − βim
−
2 + ψ j ˆ (δim ) m ) ) s(s+1)
1
ˆ j ) h+ˆ ]}, {(1 −
>}.
Proof. The proof is similar to Theorem 3.2 and Theorem 3.4, so we omit the proof.
4
MCDM Based on Cubic Hesitant Fuzzy Weighted Geometric Heronian Mean Operator
The following steps describe a method which solves a MCDM problem based on CHFWGHM operator. We have a set of criteria Rl , 1 ≤ 1 ≤ p against each
284
F. Mehmood et al.
alternative Pn , 1 ≤ n ≤ q that forms a decision matrix Dnl having CHFEs chnl = {< {[δi−nl , δi+nl ]}, {βinl } >}, here nth entry represents row and lth entry Tˇ
represents column in the matrix. We also have weight s vector ψ = (ψ1 , ψ2 , ..., ψs ) which represents weight of each criteria with l=1 ψl = 1. We are giving the notation to aggregated results of CHFEs by applying CHFWGHM operator as Ωn , 1 ≤ n ≤ q. Note that the aggregated result of CHFE is again a CHFE (see Theorem 2.18). The steps used in method are presented in following algorithm;
Algorithm 1. Algorithm of proposed method Input: A set of criteria associated with alternatives. Output: Best alternative. 1: Obtian a set of criteria associated with a set of alternatives. 2: Find the aggregated result of CHFEs by applying CHFWGHM operator (see Definition 3.7) through aggregating nth rows and lth columns in the matrix Dnl . 3: Calculate the scores of CHFEs by using score function (see Definition 2.11). 4: Rank the alternative Pn , 1 ≤ n ≤ q by applying ranking method through score values. 5: Obtain a best choice on a maximum score.
In next section an example is described for a MCDM problem based on real life by using the method described in Sect. 4. The CHFWGHM operator has been used to aggregate the CHFEs.
5
Illustrative Example
Let a welfare organization wants to build a cancer hospital in Beijing, China. They suggest three alternatives as (P1 ) Haidian District (P2 ) Miyun District (P3 ) Yanqing District, against three criteria’s (R1 ) Population of surrounding area (R2 ) Cost of land (R3 ) Number of hospitals in the surrounding area. In order to select best district to build a hospital we apply Algorithm on CHFS information. The matrix having CHFS information is given in Table 1. Tˇ Tˇ Suppose ψ = (ψ1 , ψ2 , ψ3 ) = (0.2, 0.3, 0.5) is the weighting vector of criteˆ = 1 and jˆ = 2. We use the method defined in Sect. 4 ria’s R1 , R2 and R3 . Take h as follows, Step 1 The CHFWGHM operators are calculated as in the following; ch11 = {< {[0.1, 0.3], [0.4, 0.5]}, {0.2, 0.6} >}, ch12 = {< {[0.4, 0.6], [0.7, 0.9]}, {0.5, 0.8} > }, ch13 = {< {[0.2, 0.5]}, {0.6} >},
Cubic Hesitant Fuzzy Heronian Mean Operators and Their Application
285
Table 1. Matrix containing ICHFS values
P1 P2 P3
R1 R
2
< [0.1, 0.3], [0.4, 0.5], < [0.4, 0.6], [0.7, 0.9], {0.2, 0.6} > {0.5, 0.8} >
< [0.3, 0.5], [0.4, 0.7], < [0.1, 0.4], {0.4, 0.5} > {0.3} >
< [0.3, 0.7], [0.8, 1], {< [0.7, 0.9], {0.4} >} , {0.4, 0.9} >
CHF W GHM (ch11 , ch12 , ch13 ) = Ω1 = s
R3 {< [0.2, 0.5], {0.6} >}
< [0.7, 0.8], [0.9, 0.1], {0.6, 0.8} >
< [0.2, 0.6], [0.7, 0.9], {0.7, 0.8} >
2 1 s ψ ˆ nl )ψl ⊕ (ˆ ((⊗l=1,m=l )n=1 ((hch jchnm ) m ) s(s+1) ) = ˆ + jˆ h
1 2 ˆ − ψl h − ψ j ˆ s ˆ j ) ) (1 − (δi ) m ) ) s(s+1) ) h+ˆ , 1 − (1 − Πl=1,m=l (1 1m 1l 1 2 ˆ + ψ j ˆ s ψ h ˆ j (δi ) m ) ) s(s+1) ) h+ˆ ]}, {(1 − Πl=1,m=l (1 − (1 − (1 − βi ) l ) (1 − 1l 1m
{< {[1 − (1 − Πl=1,m=l (1 − (1 − (δi ˆ + ψl h ) ) (1 1l
(1 − (δi
−
2
−
1
ψ j ˆ ˆ j } >}, (1 − βi1m ) m ) ) s(s+1) ) h+ˆ
CHF W GHM (ch11 , ch12 , ch13 ) = Ω1 = {< {[0.5782, 0.7707], [0.6014, 0.7897], [0.6187, 0.7889], [0.6440, 0.8096]}, {0.2474, 0.3062, 0.2686, 0.3275} >}, ch21 = {< {[0.3, 0.5], [0.4, 0.7]}, {0.4, 0.5} >}, ch22 = {< {[0.1, 0.4]}, {0.3} >}, ch23 = {< {[0.7, 0.8], [0.9, 1]}, {0.6, 0.8} >},
CHF W GHM (ch21 , ch22 , ch23 ) = Ω2 = s
2 1 s ψ ˆ nl )ψl ⊕ (ˆ ((⊗l=1,m=l )n=2 ((hch jchnm ) m ) s(s+1) ) = ˆ + jˆ h
1 2 ˆ − ψl h − ψ j ˆ s ˆ j ) ) (1 − (δi ) m ) ) s(s+1) ) h+ˆ , 1 − (1 − Πl=1,m=l (1 2m 2l 1 2 ˆ + ψ j ˆ s ψ h ˆ j (δi ) m ) ) s(s+1) ) h+ˆ ]}, {(1 − Πl=1,m=l (1 − (1 − (1 − βi ) l ) (1 − 2l 2m
{< {[1 − (1 − Πl=1,m=l (1 − (1 − (δi ˆ + ψl h ) ) (1 2l
(1 − (δi
−
2
−
1
ψ j ˆ ˆ j } >}, (1 − βi2m ) m ) ) s(s+1) ) h+ˆ
CHF W GHM (ch21 , ch22 , ch23 ) = Ω2 = {< {[0.6698, 0.8313], [0.6808, 0.8412], [0.6792, 0.8439], [0.6899, 0.8542]}, {0.2384, 0.3303, 0.2435, 0.3495} >}, ch31 = {< {[0.7, 0.9]}, {0.4} >}, ch32 = {< {[0.3, 0.7], [0.8, 1]}, {0.4, 0.9} >}, ch33 = {< {[0.2, 0.6], [0.7, 0.9]}, {0.7, 0.8} >},
286
F. Mehmood et al. CHF W GHM (ch31 , ch32 , ch33 ) = Ω3 = s
2 1 s ψ ˆ nl )ψl ⊕ (ˆ ((⊗l=1,m=l )n=3 ((hch jchnm ) m ) s(s+1) ) = ˆ + jˆ h
1 2 ˆ − ψl h − ψ j ˆ s ˆ j ) ) (1 − (δi ) m ) ) s(s+1) ) h+ˆ , 1 − (1 − Πl=1,m=l (1 3m 3l 1 2 ˆ + ψ j ˆ s ψ h ˆ j (δi ) m ) ) s(s+1) ) h+ˆ ]}, {(1 − Πl=1,m=l (1 − (1 − (1 − βi ) l ) (1 − 3l 3m
{< {[1 − (1 − Πl=1,m=l (1 − (1 − (δi (1 −
ˆ + ψ h (δi ) l ) (1 3l
−
2
−
1
ψ j ˆ ˆ j } >}, (1 − βi3m ) m ) ) s(s+1) ) h+ˆ
CHF W GHM (ch31 , ch32 , ch33 ) = Ω3 = {< {[0.6197, 0.8526], [0.7975, 0.9415], [0.6664, 0.8740], [0.8881, 1]}, {0.2929, 0.3513, 0.3955, 0.4413} >}. Step 2 By applying Definition 2.11 we obtain the scores as follows V (Ω1 ) = 0.3439, V (Ω2 ) = 0.4065, V (Ω3 ) = 0.5151. Step 3 Since V (Ω3 ) > V (Ω2 ) > V (Ω1 ) , so by ranking method Ω3 > Ω2 > Ω1 . Hence P3 is the best choice. Thus the organization will build hospital Yanqing District which is suitable on given criteria.
6
Conclusions
Aggregation of information is an important technique in order to tackle a MCDM problem. Many aggregation operators for different sets have been introduced. Cubic hesitant fuzzy sets proposed by Mahmood et al. [20] is a productive and important tool which handles fuzziness and ambiguous information that cannot be solved by other defined tools such as HFSs or IVHFSs. Mahmood et al. [21] also defined some generalized aggregation operators for CHFSs which only aggregate the given information in a MCDM problem but does not emphasis the relationship amongst the arguments. Heronian maen operator is a tool to aggregate the given information which shows the association amongst given arguments In our work, we proposed HM operators of CHFSs such as CHFHM, CHFGHM, CHFWHM and CHFWGHM. Finally, we applied CHFWGHM operator to a MCDM problem and find the best alternative amongst given criteria. Operators that are defined in this paper are very suitable to evaluate uncertainty. This study can be extended to picture hesitant fuzzy set and some construction engineering risk problems can be analyzed.
Cubic Hesitant Fuzzy Heronian Mean Operators and Their Application
287
References 1. Ali, M.I., Feng, F., Liu, X., Min, W.K., Shabir, M.: On some new operations in soft set theory. Comput. Math. Appl. 57, 1547–1553 (2009) 2. Atanassov, K.: Intuitionistic fuzzy sets. Fuzzy Sets Syst. 31, 87–96 (1986) 3. Atanassov, K., Gargov, G.: Interval valued intuitionistic fuzzy sets. Fuzzy Sets Syst. 31, 343–349 (1989) 4. Beliakov, G., Pradera, A., Calvo, T.: Aggregation Functions a Guide for Practitioners. Springer, Berlin (2007) 5. Chakravarthi, M.K., Venkatesan, N.: Adaptive type-2 fuzzy controller for nonlinear delay dominant MIMO systems: an experimental paradigm in LabVIEW. Int. J. Adv. Intell. Parad. 10, 354–373 (2018) 6. Chen, N., Xu, Z.S., Xia, M.M.: Interval valued hesitant preference relations and their applications to group decision making. Knowl. Based Syst. 37, 528–540 (2013) 7. Chu, Y., Liu, P.: Some two-dimensional uncertain linguistic Heronian mean operators and their application in multiple attribute decision making. Neural Comput. Appl. 26, 1461–1480 (2015) 8. Fernandez, R.P., Alonso, P., Bustince, H., Diaz, I., Montes, S.: Applications of finite interval-valued hesitant fuzzy preference relations in group decision making. Inf. Sci. 326, 89–101 (2016) 9. Hayat, K., Ali, M.I., Alcantud, J.C.R., Cao, B.Y., Tariq, K.U.: Best concept selection in design process: an application of generalized intuitionistic fuzzy soft sets. J. Intell. Fuzzy Syst. 35, 5707–5720 (2018) 10. Hong, D.H., Choi, C.H.: Multi criteria fuzzy decision making problems based on vague set theory. Fuzzy Sets Syst. 21, 1–17 (2000) 11. Hwang, C.L., Yoon, K.: Multiple Attribute Decision Making Methods and Application. Springer, New York (1981) 12. Jun, Y.B., Kim, C.S., Yang, K.O.: Cubic sets. Ann. Fuzzy Math. Inform. 4, 83–98 (2012) 13. Karaaslan, F., Hayat, K.: Some new operations on single-valued neutrosophic matrices and their applications in multi-criteria group decision making. Appl. Intell. 48, 4594–4614 (2018) 14. Li, D.: Multi attribute decision making models and methods using intuitionistic fuzzy sets. J. Comput. Syst. Sci. 70, 73–85 (2005) 15. Li, Y., Liu, P.: Some Heronian mean operators with 2-tuple linguistic information and their application to multiple attribute group decision making. Technol. Econ. Dev. Econ. 21, 797–814 (2015) 16. Liu, P., Liu, Z., Zhang, X.: Some intuitionistic uncertain linguistic Heronian mean operators and their application to group decision making. Appl. Math. Comput. 230, 570–586 (2014) 17. Liu, P., Chen, S.M.: Group decision making based on Heronian aggregation operators of intuitionistic fuzzy numbers. IEEE Trans. Cybern. 47(9), 2514–2530 (2017) 18. Liu, P., Zhang, L.: Multiple criteria decision making method based on neutrosophic hesitant fuzzy Heronian mean aggregation operators. J. Intell. Fuzzy Syst. 32, 303– 319 (2017) 19. Lu, Z., Ye, J.: Cosine measures of neutrosophic cubic sets for multiple attribute decision-making. Symmetry. 9(7), 121, 1–10 (2017) 20. Mahmood, T., Mehmood, F., Khan, Q.: Cubic Hesitant fuzzy sets and their applications to multi criteria decision making. Int. J. Algebra Stat. 5, 19–51 (2016)
288
F. Mehmood et al.
21. Mahmood, T., Mehmood, F., Khan, Q.: Some generalized aggregation operators for cubic hesitant fuzzy sets and their applications to multi criteria decision making. Punjab Univ. J. Math. 49, 31–49 (2017) 22. Nayagam, V.L.G., Muralikrish, S., Sivaraman, G.: Multi criteria decision making method based on interval valued intuitionistic fuzzy sets. Expert. Syst. Appl. 38, 1464–1467 (2011) 23. Sudha, V.K., Sudhakar, R., Balas, V.E.: Fuzzy rule-based segmentation of CT brain images of hemorrhage for compression. Int. J. Adv. Intell. Parad. 4, 256–267 (2014) 24. Torra, V., Narukawa, Y.: On hesitant fuzzy sets and decision. In: The 18th IEEE International Conference on Fuzzy Systems, Korea, Jeju Island, pp. 1378–1382 (2009) 25. Torra, V.: Hesitant fuzzy sets. Int. J. Intell. Syst. 25, 529–539 (2010) 26. Viedma, H.E., Alonso, E., Chiclana, S., Herrera, F.: A consensus model for group decision making with incomplete fuzzy preference relations. IEEE Trans. Fuzzy Syst. 15, 863–877 (2007) 27. Xia, M.M., Xu, Z.S.: Hesitant fuzzy information aggregation in decision making. Int. J. Approx. Reason. 52, 395–407 (2011) 28. Xia, M.M., Xu, Z.S., Chen, N.: Some hesitant fuzzy aggregation operators with their application in group decision making. Group Decis. Negot. 22, 259–279 (2013) 29. Xu, Z.S., Da, Q.L.: An overview of operators for aggregating information. Int. J. Intell. Syst. 18, 953–969 (2003) 30. Xu, Z.S., Yager, R.R.: Some geometric aggregation operators based on intuitionistic fuzzy sets. Int. J. Gen. Syst. 35, 417–433 (2006) 31. Xu, Z.S.: Intuitionistic fuzzy aggregation operators. IEEE Trans. Fuzzy Syst. 15, 1179–1187 (2007) 32. Ye, J.: Linguistic neutrosophic cubic numbers and their multiple attribute decisionmaking method. Information. 8(3), 110, 1–11 (2017) 33. Ye, J.: Multiple attribute decision-making method based on linguistic cubic variables. J. Intell. Fuzzy Syst. (2018). https://doi.org/10.3233/JIFS-171413 34. Yu, D., Wu, Y.: Interval valued intuitionistic fuzzy Heronian mean operators and their application in multi criteria decision making. Afr. J. Bus. Manag. 6, 4158 (2012) 35. Yu, D.: Intuitionistic fuzzy geometric Heronian mean aggregation operators. Appl. Soft Comput. 13, 1235–1246 (2013) 36. Yu, D.: Hesitant fuzzy multi criteria decision making methods based on Heronian mean. Technol. Econ. Dev. Econ., 1–20 (2015) 37. Yu, S.M., Zhou, H., Chen, X.H., Wang, J.Q.: A multi criteria decision making method based on Heronian mean operators under a linguistic hesitant fuzzy environment. Asia Pac. J. Oper. Res. 32, 1550035 (2015) 38. Yordanova, S., Jain, L.C.: Design of supervisor-based adaptive process fuzzy logic control. Int. J. Adv. Intell. Parad. 9, 385–401 (2016) 39. Zadeh, L.A.: Fuzzy sets. Inf. Control. 8, 338–353 (1965) 40. Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning-I. Inf. Sci. 8, 199–249 (1975)
A Comparative Study of Fuzzy Logic Regression and ARIMA Models for Prediction of Gram Production Shafqat Iqbal1, Chongqi Zhang1(&), Muhammad Arif2, Yining Wang3, and Anca Mihaela Dicu4 1
School of Economics and Statistics, Guangzhou University, Guangzhou 510006, China [email protected], [email protected] 2 Department of Computer Science and Technology, Guangzhou University, Guangzhou, China [email protected] 3 School of Mathematics and Information Sciences, Guangzhou University, Guangzhou, China [email protected] 4 Aurel Vlaicu University of Arad, Arad, Romania [email protected]
Abstract. This study examines the comparison of fuzzy logic regression methodology and Autoregressive Integrated Moving Average (ARIMA) models to determine appropriate forecasting model for prediction of yield production of food crops. Due to environmental changes based on multifactor, uncertainty has increased in yield production of different crops around the globe. So, in these circumstances a novel forecasting method should be applied to get precise information before time. Now a days many robust methodologies are being used for forecasting purposes. Fuzzy logic and regression model is one of them, which is capable in conditions of uncertainty. In this study fuzzy time series forecasting model is applied along with traditional forecasting tool on gram production data of Pakistan to retrieve significant model for prediction. Initially, 7 fuzzy intervals are constructed using fuzzy logic computations with second and third-degree relationship and then evaluated the fuzzified values with different regression models. ARIMA model with different orders of p, d & q are formulated based on statistical accuracy measures such as autocorrelation function (ACF) and partial autocorrelation function (PACF). Applied the techniques of Akaike Information Criterion, Bayesian Information Criterion and other accuracy measures to select appropriate model for forecasting of gram production. Overall, models evaluation demonstrate that fuzzy logic and regression model perform well than ARIMA model in forecasting of gram production. The precise information about yield production will help the policy makers to take decision about import export, management, planning and other related issues. Keywords: Fuzzy logic and regression ARIMA Forecasting Production Time series
© Springer Nature Switzerland AG 2021 V. E. Balas et al. (Eds.): SOFA 2018, AISC 1222, pp. 289–299, 2021. https://doi.org/10.1007/978-3-030-52190-5_21
290
S. Iqbal et al.
1 Introduction Pulses with the sufficient source of protein, vitamins, fibers and minerals meet the requirements of major part of the world population. Demand for pulses has been increased at international level from last few decades. In developing countries about 80% of pulses are used in human food, in developed countries about 40% for humans and 50% for animal feed (Global Pulse Confederation). Pulses play key role in the Agriculture field, which is fighting with challenges for land resources, environmental changes, increasing food demand and inconsistent food commodity markets. According to Food and Agriculture Organization (FAO) of United Nations, Pakistan is the 5th pulse growing country in the world with 18.50% of total GDP (Pakistan Economic Survey 2019-20, Ministry of Finance) and 42.02% of labor force is working in agriculture sector (The Global Economy.com), so, it is important to have sound figures about agricultural commodities. For forecasting purpose different conventional statistical as well as mathematical approaches are being used in medicine, agriculture, industry computer science and many other domains. Rahman [1] studied about forecasting the cultivated area and production of black gram, grass pea and mungbean in Bangladesh. They applied statistical tests such as Autocorrelation function, partial autocorrelation function and Philips-Perron unit root test to select appropriate ARIMA model for prediction. Vishwajith [2] conducted research to forecast pulses production in India using ARIMA and GARCH statistical forecasting models. Also, Lecerf [3] examined the forecasting models for prediction of crop yield in Europe. Besides many developments in other fields like privacy and security [4–6], parametric prediction model [7], sensor studies in aquaculture [8], fuzzy time series has been broadly utilized in many domains. In some specific region and fields where uncertainty is very high, statistical methods do not perform well. So, in these situations of uncertainty, Zadeh [9] introduced the concept of fuzzy sets as a mathematical technique. Different fuzzy time series methodologies such as combined approaches based on fuzzy soft sets, fuzzy logic and regressions models, fuzzy membership functions, intuitionistic fuzzy time series and fuzzy ARIMA models are being used as alternative to conventional statistical time series models. Different studies have been conducted for comparison and evaluation of forecasting models to evaluate and determine appropriate prediction model. Garg [10] developed fuzzy logic and regression models for prediction of crops production. They used 5, 7, 9 and 11 fuzzy intervals of equal lengths based on second and third-degree fuzzy logic relationships. In this study a comparative study has been conducted to select appropriate forecasting model for gram production by analyzing Garg [10] and ARIMA models [11].
2 Related Work Due to the significance of FTS to tackle with the genuine issues, many models of different types have been developed. Chen [12] proposed two-factor time variant model, Huarng [13] integrated heuristic knowledge with Chen’s model. Yu [14] proposed novel approach of weighted fuzzy time series with average and distribution-based partitioning.
A Comparative Study of Fuzzy Logic Regression and ARIMA Models
291
In another study, Singh [15, 16] developed robust and computational methods to cope with the high fluctuate time series data. Ghosh [17] presented an improved fuzzy method using L-R fuzzy sets for prediction of total food yield production in India. Iqbal [18] proposed a new fuzzy time series algorithm to forecast crop yield production. Tseng [19] combined seasonal ARIMA and fuzzy regression model to develop a new method to forecast Taiwan’s machinery products and soft drink sales volume. 2.1
Fuzzy Logic Regression
Fuzzy sets for time series data regarding gram production of Pakistan are constructed with the interval-based partitioning. The process of proposed fuzzy forecasting method is based on the following steps. Step 1: First define the universe of discourse and divide the data set into intervals of equal lengths. For time series data universe of discourse is defined as U = [Emin E1 , Emax þ E2 ], where E1 and E2 are two positive integers. Step 2: Now define the fuzzy sets Fi into 7 equal space intervals and further divide these partitions corresponding to the observations lie within each interval. For example, in our study fuzzy sets are defined as F1: poor production F3: average production F5: very good production F7: bumper production
F2: below average production F4: good production F6: excellent production
Step 3: This step contains the exercise of fuzzification process, after the specification in step 2, process of fuzzy logic relationship (FLR) is practiced, i.e. FLR’s specify the relationship among the fuzzified partitions of the data set. Step 4: Take average of the specified FLR relationships as described in step 3. For example, F6 ← F3, F2 in FLR, if M = midpoint of F3 and N = midpoint of F2 then L = (M + N)/2 is the average value for F6. Step 5: This step considers the process of defuzzification of the fuzzified values by regression analysis. Results are compared with accuracy measures such as Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). 2.2
ARIMA Model
In this study Autoregressive Integrated Moving Average (ARIMA) model is applied for forecasting purpose. ARIMA model is consist of Autoregressive, the integration part and Moving Average process. (1) An Autoregressive (AR) model is used for prediction of variable of interest by the linear relationship between present and past values of that variable. Its model with order p can be presented as
292
S. Iqbal et al.
^xt ¼ h þ /1 xt1 þ /2 xt2 . . ./p xtp þ et
ð1Þ
(2) The way by making data stationary is to compute differences between consecutive observations. Transformations help to stabilize the variances of the values. Differencing helps in stabilizing the mean also by removing changes in time series and also removing trend and seasonality. (3) Moving Average is the process in which future values of the variables are predicted with relationship of past error terms. Theses error terms called white noise, where q shows the order of Moving Average process. Its equation can be presented as ^xt ¼ u þ v1 et1 þ v2 et2 . . .vq etq
ð2Þ
Now combining the above three terms Autoregressive, Differencing and Moving Average process the ARIMA model can be written as ^xt ¼ h þ /1 xt1 þ /2 xt2 . . ./p xtp v1 et1 v2 et2 . . .vq etq þ et
2.3
ð3Þ
Model Identification Process
In the model identification stage, the techniques of Autocorrelation Function (ACF), Partial Autocorrelation Function and Unit Root test are applied on the data set. Unit Root test is used to determine the order of differencing in time series. There are many Unit Root tests but the most popular test is Augmented Dickey-Fuller test. ADF test applied on the data to check whether the data is stationary or not. The order of d is computed by number of times the series is differenced. The order of AR and MA is determined by plotting ACF and PACF. Model Specification and Diagnostic Checking. In forecasting process the main part is selection of specific model. The appropriate model was selected on the basis of Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and the following diagnostic checking. Akaike Information Criterion (AIC). It is a technique used to select appropriate model for forecasting from the set of models. A model that has a minimum value of AIC would be suitable model than the others. It is written as, AIC = nlog (MSE) + 2K Where n shows sample size, MSE is the Mean Squared Error and k for number of parameters to be estimated. Bayesian Information Criterion (BIC). This method is also used in model selection process. The lower value of BIC shows that model is appropriate than the other models. It is written as, BIC = nlog (MSE) + k log n
A Comparative Study of Fuzzy Logic Regression and ARIMA Models
2.4
293
Fuzzy Time Series Method
Step 1: First of all, defined the universe of discourse and divided into seven intervals of equal lengths. According to our gram production data, minimum value is 284.3 (000 tons) and maximum value is 868.3 (000 tons). Universe of discourse is defined as U = [284:3 E1 , 868:3 þ E2 ], if E1 = 24.3 and E2 ¼ 21:7 then U would be [260, 890]. Step 2: We defined the fuzzy sets Fi into 7 equal intervals associated with yield production. F1: poor production F3: average production F5: very good production F7: bumper production
F2: below average production F4: good production F6: excellent production
The universe of discourse divided into 7 equal intervals is presented in Table 1, and further these partitions are divided corresponding to the observations or frequencies lie within each interval as shown in Table 2. Table 1. Frequency distribution of production data Fuzzy sets Lower bound Upper bound Frequency F1 260 350 2 F2 350 440 4 F3 440 530 6 F4 530 620 6 F5 620 710 3 F6 710 800 3 F7 800 890 2
Table 2. Frequency based intervals Fuzzy sets Lower bound Upper bound New sets F1 260 305 A1 305 350 A2 F2 350 372.5 A3 372.5 395 A4 395 417.5 A5 417.5 440 A6 F3 440 455 A7 455 470 A8 470 485 A9 485 500 A10 500 515 A11 515 530 A12 (continued)
294
S. Iqbal et al. Table 2. (continued) Fuzzy sets Lower bound Upper bound New sets F4 530 545 A13 545 560 A14 560 575 A15 575 590 A16 590 605 A17 605 620 A18 F5 620 650 A19 650 680 A20 680 710 A21 F6 710 740 A22 740 770 A23 770 800 A24 F7 800 845 A25 845 890 A26
Step 3: In this step fuzzy logic relationship are made among the given fuzzified intervals. This relationship can be explained referring Table 3 as in 1989 yield belong to A10, 1990 belong to A3, then by FLR A8 ← A3, A10. This process is done for all values with second and third degree as presented in Table 3. Table 3. FLR with second and third degree with Averages Year Production (000 tons) 1989 493 1990 371.5 1991 456 1992 561.9 1993 531 1994 512.8 1995 347.3 1996 410.7 1997 679.6 1998 594.4 1999 767.1 2000 697.9 2001 564.5 2002 397 2003 362.1 2004 675.2
FLR 2nd degree – – A8