Soft Computing for Biomedical Applications and Related Topics [1st ed.] 9783030495350, 9783030495367

This book presents innovative intelligent techniques, with an emphasis on their biomedical applications. Although many m

281 22 14MB

English Pages IX, 329 [322] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter ....Pages i-ix
Front Matter ....Pages 1-1
Multiagent Smart Communication Based on CI Technology (Kaoru Hirota, Fangyan Dong, Jesus A. Garcia Sanchez)....Pages 3-9
Develop a Prediction Model for Nonmelanoma Skin Cancer Using Deep Learning in EHR Data (Chih-Wei Huang, Alex P. A. Nguyen, Chieh-Chen Wu, Hsuan-Chia Yang, Yu-Chuan (Jack) Li)....Pages 11-18
VectorDefense: Vectorization as a Defense to Adversarial Examples (Vishaal Munusamy Kabilan, Brandon Morris, Hoang-Phuong Nguyen, Anh Nguyen)....Pages 19-35
Artificial Intelligence in Infection Control—Healthcare Institutions Need Intelligent Information and Communication Technologies for Surveillance and Benchmarking (Walter Koller, Andrea Rappelsberger, Birgit Willinger, Gabriel Kleinoscheg, Klaus-Peter Adlassnig)....Pages 37-48
Why Some Non-classical Logics Are More Studied? (Olga Kosheleva, Vladik Kreinovich, Hoang Phuong Nguyen)....Pages 49-59
Why h-Index (Vladik Kreinovich, Olga Kosheleva, Hoang Phuong Nguyen)....Pages 61-65
Accuracy of Data Fusion: Interval (and Fuzzy) Case (Christian Servin, Olga Kosheleva, Vladik Kreinovich)....Pages 67-79
Imputing Missing Values: Reinforcement Bayesian Regression and Random Forest (Shahriar Shakir Sumit, Junzo Watada, Fatema Nasrin, Nafiz Ishtiaque Ahmed, D. R. A. Rambli)....Pages 81-88
From Machine Learning to Knowledge-Based Decision Support—A Predictive-Model-Markup-Language-to-Arden-Syntax Transformer for Decision Trees (Julia Zeckl, Matthias Wastian, Dominik Brunmeir, Andrea Rappelsberger, Sergei B. Arseniev, Klaus-Peter Adlassnig)....Pages 89-99
Front Matter ....Pages 101-101
Semantic Similarity Between Adjectives and Adverbs—The Introduction of a New Measure (Moreno Colombo, Edy Portmann)....Pages 103-116
Evaluation of Robotic Navigation Model Considering Group Norms of Personal Space in Human–Robot Communities (Yotaro Fuse, Hiroshi Takenouchi, Masataka Tokumaru)....Pages 117-125
ESYNDIAG: A Fuzzy Expert System for Eight Syndrome Diagnosis in Traditional Vietnamese Medicine (Hoang Phuong Nguyen, Lam Tung Vu, Thuy Hong Truong, Kaoru Hirota)....Pages 127-139
Manifold Based Data Refinement for Biological Analysis (Dao Nam Anh)....Pages 141-152
Prediction of Wart Treatment Using Deep Learning with Implicit Feature Engineering (Khang Nguyen, Nga Mai, An Nguyen, Binh P. Nguyen)....Pages 153-168
Visualization of Musical Emotions by Colors of Images (Dao Nam Anh)....Pages 169-182
Saliency Guided Image Watermarking for Anti-forgery (Pham Quang Huy, Dao Nam Anh)....Pages 183-195
Big Data for Public Health Surveillance in Vietnam: Challenges and Opportunities (Long Bui, Thai Son Ha, Ngoc Ha Nguyen, Khandmaa Sukhbaatar, My Hanh Bui)....Pages 197-204
A New Computational Method Based on Heterogeneous Network for Predicting MicroRNA-Disease Associations (Thanh Van Thai, Duong Hung Bui, Xuan Tho Dang, Thanh-Phuong Nguyen, Dang Hung Tran, The Dung Luong)....Pages 205-219
An Approach for Building Effective Real Estate Chatbots in Vietnamese (Tuan-Dung Cao, Quang H. Nguyen)....Pages 221-229
Ensemble of Convolutional Neural Networks for the Detection of Prostate Cancer in Multi-parametric MRI Scans (Quang H. Nguyen, Mengnan Gong, Tao Liu, Ou Yang Youheng, Binh P. Nguyen, Matthew Chin Heng Chua)....Pages 231-241
Adaptive Fuzzy Backstepping Sliding Mode Control for Omni Mobile Robot Over Network Control System (Minh Ngoc Pham, Vinh Quang Thai, Duyen Kim Thi Ha, Tien Manh Ngo, Cuong Manh Nguyen, Manh Van Tran et al.)....Pages 243-255
Fuzzy Adaptive Dynamic Surface Control for Omnidirectional Robot (Duyen Kim Thi Ha, Tien Manh Ngo, Minh Ngoc Pham, Vinh Quang Thai, Minh Xuan Phan, Dung Tien Pham et al.)....Pages 257-268
Traffic Light Control at Isolated Intersections in Case of Heterogeneous Traffic (Phan Duy Hung, Do Thai Giang)....Pages 269-280
Block Difference of Inverse Probabilities Features for Chromoendoscopy Image Classification (Viet Dung Nguyen, Thanh Hien Truong, Ha Anh Pho, Le Thu Thao Dao)....Pages 281-288
Hybrid Convolutional Neural Network Ensemble for Activity Recognition in Mobile Phones (Jyh-Huah Chan, Hui-Juin Lim, Ngoc-Son Hoang, Jeong-Hoon Lim, Khang Nguyen, Binh P. Nguyen et al.)....Pages 289-299
Text Mining Analysis of Comments in Thai Language for Depression from Online Social Networks (Pornpimol Chaiwuttisak)....Pages 301-313
Building a X-ray Database for Mammography on Vietnamese Patients and automatic Detecting ROI Using Mask-RCNN (Nguyen Duc Thang, Nguyen Viet Dung, Tran Vinh Duc, Anh Nguyen, Quang H. Nguyen, Nguyen Tu Anh et al.)....Pages 315-329
Recommend Papers

Soft Computing for Biomedical Applications and Related Topics [1st ed.]
 9783030495350, 9783030495367

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Studies in Computational Intelligence 899

Vladik Kreinovich Nguyen Hoang Phuong   Editors

Soft Computing for Biomedical Applications and Related Topics

Studies in Computational Intelligence Volume 899

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. The books of this series are submitted to indexing to Web of Science, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.

More information about this series at http://www.springer.com/series/7092

Vladik Kreinovich Nguyen Hoang Phuong •

Editors

Soft Computing for Biomedical Applications and Related Topics

123

Editors Vladik Kreinovich Department of Computer Science University of Texas at El Paso El Paso, TX, USA

Nguyen Hoang Phuong Informatics Division Thang Long University Hanoi, Vietnam

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-49535-0 ISBN 978-3-030-49536-7 (eBook) https://doi.org/10.1007/978-3-030-49536-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

If you look at the biomedical research papers, they are all mostly about applying statistical techniques to numerical data. On the other hand, if you look at the medical textbooks, if you observe how medical doctors make decisions—they do not make numerical computations, they adjust their recommendations based on many non-numerical factors: how the patient feels, how they perceive the patient’s state, etc. And it makes sense—if diagnosing or prescribing a medicine were exact algorithmic tasks, we would not need medical doctors: we would be able to use simple calculators instead. How can we take all these non-numerical factors, factors best described by words from natural language like “somewhat better”, into account? How can we incorporate this knowledge into computer models so that these models would become more accurate? This incorporation is the main topic of soft computing, which attempts to transform imprecise (“soft”) knowledge into precise, computerunderstandable form, so that we will be able to combine this knowledge with “hard” numerical one. There are many directions in soft computing. Fuzzy logic directly translates imprecise knowledge into precise terms. Neural networks (and other machine learning techniques) transform this indirectly: first by observing how people make decisions based on their imprecise knowledge and then by looking for a precise model that would lead to the same decisions. There are other directions. All these directions are represented in this book. Some of the book’s chapters directly deal with applications of soft computing to biomedical problems, and other chapters deal with technical aspects of soft computing—thus eventually making the corresponding techniques better and hopefully, more adequate for biomedical applications. Several chapters deal with applications outside biomedicine—with the hope that successful applications to different areas can help find good applications to biomedical problems as well. We hope that this volume will help practitioners and researchers to learn more about the soft computing techniques and to further develop this important research direction.

v

vi

Preface

We want to thank all the authors for their contributions and all anonymous referees for their thorough analysis and helpful comments. The publication of this volume was partly supported by the Thang Long University, Hanoi, Vietnam. Our thanks to the leadership and staff of the Thang Long University for providing crucial support. Our special thanks to Prof. Hung T. Nguyen for his valuable advice and constant support. We would also like to thank Prof. Janusz Kacprzyk (Series Editor) and Dr. Thomas Ditzinger (Senior Editor, Engineering/Applied Sciences) for their support and cooperation in this publication. El Paso, USA Hanoi, Vietnam January 2020

Vladik Kreinovich Nguyen Hoang Phuong

Contents

Invited Papers Multiagent Smart Communication Based on CI Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kaoru Hirota, Fangyan Dong, and Jesus A. Garcia Sanchez Develop a Prediction Model for Nonmelanoma Skin Cancer Using Deep Learning in EHR Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chih-Wei Huang, Alex P. A. Nguyen, Chieh-Chen Wu, Hsuan-Chia Yang, and Yu-Chuan (Jack) Li VectorDefense: Vectorization as a Defense to Adversarial Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vishaal Munusamy Kabilan, Brandon Morris, Hoang-Phuong Nguyen, and Anh Nguyen Artificial Intelligence in Infection Control—Healthcare Institutions Need Intelligent Information and Communication Technologies for Surveillance and Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . Walter Koller, Andrea Rappelsberger, Birgit Willinger, Gabriel Kleinoscheg, and Klaus-Peter Adlassnig

3

11

19

37

Why Some Non-classical Logics Are More Studied? . . . . . . . . . . . . . . . Olga Kosheleva, Vladik Kreinovich, and Hoang Phuong Nguyen

49

Why h-Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vladik Kreinovich, Olga Kosheleva, and Hoang Phuong Nguyen

61

Accuracy of Data Fusion: Interval (and Fuzzy) Case . . . . . . . . . . . . . . . Christian Servin, Olga Kosheleva, and Vladik Kreinovich

67

Imputing Missing Values: Reinforcement Bayesian Regression and Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shahriar Shakir Sumit, Junzo Watada, Fatema Nasrin, Nafiz Ishtiaque Ahmed, and D. R. A. Rambli

81

vii

viii

Contents

From Machine Learning to Knowledge-Based Decision Support—A Predictive-Model-Markup-Language-to-Arden-Syntax Transformer for Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julia Zeckl, Matthias Wastian, Dominik Brunmeir, Andrea Rappelsberger, Sergei B. Arseniev, and Klaus-Peter Adlassnig

89

Regular Papers Semantic Similarity Between Adjectives and Adverbs—The Introduction of a New Measure . . . . . . . . . . . . . . . . 103 Moreno Colombo and Edy Portmann Evaluation of Robotic Navigation Model Considering Group Norms of Personal Space in Human–Robot Communities . . . . . . . . . . . . . . . . . 117 Yotaro Fuse, Hiroshi Takenouchi, and Masataka Tokumaru ESYNDIAG: A Fuzzy Expert System for Eight Syndrome Diagnosis in Traditional Vietnamese Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Hoang Phuong Nguyen, Lam Tung Vu, Thuy Hong Truong, and Kaoru Hirota Manifold Based Data Refinement for Biological Analysis . . . . . . . . . . . . 141 Dao Nam Anh Prediction of Wart Treatment Using Deep Learning with Implicit Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Khang Nguyen, Nga Mai, An Nguyen, and Binh P. Nguyen Visualization of Musical Emotions by Colors of Images . . . . . . . . . . . . . 169 Dao Nam Anh Saliency Guided Image Watermarking for Anti-forgery . . . . . . . . . . . . . 183 Pham Quang Huy and Dao Nam Anh Big Data for Public Health Surveillance in Vietnam: Challenges and Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Long Bui, Thai Son Ha, Ngoc Ha Nguyen, Khandmaa Sukhbaatar, and My Hanh Bui A New Computational Method Based on Heterogeneous Network for Predicting MicroRNA-Disease Associations . . . . . . . . . . . . . . . . . . . 205 Thanh Van Thai, Duong Hung Bui, Xuan Tho Dang, Thanh-Phuong Nguyen, Dang Hung Tran, and The Dung Luong An Approach for Building Effective Real Estate Chatbots in Vietnamese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Tuan-Dung Cao and Quang H. Nguyen

Contents

ix

Ensemble of Convolutional Neural Networks for the Detection of Prostate Cancer in Multi-parametric MRI Scans . . . . . . . . . . . . . . . . 231 Quang H. Nguyen, Mengnan Gong, Tao Liu, Ou Yang Youheng, Binh P. Nguyen, and Matthew Chin Heng Chua Adaptive Fuzzy Backstepping Sliding Mode Control for Omni Mobile Robot Over Network Control System . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Minh Ngoc Pham, Vinh Quang Thai, Duyen Kim Thi Ha, Tien Manh Ngo, Cuong Manh Nguyen, Manh Van Tran, and Hiep Quang Do Fuzzy Adaptive Dynamic Surface Control for Omnidirectional Robot . . . Duyen Kim Thi Ha, Tien Manh Ngo, Minh Ngoc Pham, Vinh Quang Thai, Minh Xuan Phan, Dung Tien Pham, Dinh Duc Nguyen, and Hiep Quang Do

257

Traffic Light Control at Isolated Intersections in Case of Heterogeneous Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Phan Duy Hung and Do Thai Giang Block Difference of Inverse Probabilities Features for Chromoendoscopy Image Classification . . . . . . . . . . . . . . . . . . . . . . 281 Viet Dung Nguyen, Thanh Hien Truong, Ha Anh Pho, and Le Thu Thao Dao Hybrid Convolutional Neural Network Ensemble for Activity Recognition in Mobile Phones . . . . . . . . . . . . . . . . . . . . . . . 289 Jyh-Huah Chan, Hui-Juin Lim, Ngoc-Son Hoang, Jeong-Hoon Lim, Khang Nguyen, Binh P. Nguyen, Chee-Kong Chui, and Matthew Chin-Heng Chua Text Mining Analysis of Comments in Thai Language for Depression from Online Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Pornpimol Chaiwuttisak Building a X-ray Database for Mammography on Vietnamese Patients and automatic Detecting ROI Using Mask-RCNN . . . . . . . . . . . . . . . . . 315 Nguyen Duc Thang, Nguyen Viet Dung, Tran Vinh Duc, Anh Nguyen, Quang H. Nguyen, Nguyen Tu Anh, Nguyen Ngoc Cuong, Le Tuan Linh, Bui My Hanh, Phan Huy Phu, and Nguyen Hoang Phuong

Invited Papers

Multiagent Smart Communication Based on CI Technology Kaoru Hirota, Fangyan Dong, and Jesus A. Garcia Sanchez

Abstract The presenters’ group has been studying on humans-robots interaction based on Computational Intelligence in the frame work of multiagent smart society, where a concept of Fuzzy Atmosfield (FA) is proposed to express the atmosphere in humans-robots communication. The FA is characterized by a 3D fuzzy cubic space with “friendly-hostile”, “lively-calm”, and “casual-formal” based on a cognitive science experiment and PCA. To understand easily such movement of the atmosphere, a graphical representation method is also proposed. To illustrate the FA and its visualization method, a demonstration scenario “enjoying home party by five eye robots and four humans” is introduced/demonstrated. Then, a visualization method of users’ emotion information is proposed for long distance interaction such as telecommuting and distance learning, where 3D emotion vectors in Affinity Pleasure-Arousal space are illustrated by using shape-brightness-size (SBS) figure. It gives users easily understandable emotional profile information, and provides administrator strategic suggestion to improve the interaction between the users and the system. In the matching experiment between 7 basic emotions and 7 SBS figures for 8 subjects, 83.93% matching is achieved, and the administrator finds contents improvement hint in the questionnaire of emotions for 5 reading-text-tasks by 8 subjects. It is planning to be implemented in a language learning application to provide a more comfortable learning experience by contents selection based on user’s emotion.

K. Hirota (B) School of Automation, Beijing Institute of Technology, Beijing, China e-mail: [email protected]; [email protected] Tokyo Institute of Technology, Tokyo, Japan F. Dong Ningbo Institute of Intelligent Manufacturing Industry, Yuyao, China e-mail: [email protected] J. A. Garcia Sanchez Customer Solutions Development Co., Ltd., Kawasaki, Japan

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_1

3

4

K. Hirota et al.

1 Concept of Fuzzy Atmosfield “FA” The term Atmosfield (i.e., atmos’ field, is composed of “atmosphere” and “field”) is proposed to describe the atmosphere in the space surrounding us. The Atmosfield could not only reflect the atmosphere states during the interactive communication, but also affect the emotional states of individuals to some extent. Therefore, it is defined as one kind of psychological field, producing psychological feeling that can give influence to the process and results of humans’ behavior. Due to the psychological characteristics of the Atmosfield, it could not be calculated as accurately as some classical fields such as electrical field and magnetic field, and therefore, is defined as Fuzzy Atmosfield, which needs subjective comprehension to determine the attributes of the FA and fuzzy logic to deal with the reasoning from related factors to the FA. The FA is characterized by a 3D fuzzy cubic space with three attributes (i.e. the three axes). Questionnaire surveys are carried out by ten peoples on twenty different occasions to determine the three axes of the FA, where some common atmosphere-related factors are enumerated as the candidates, e.g., friendly, lively, casual, harmonious, peaceful, noisy, warm, relaxed, and so on, whose ranges are defined as fuzzy domain, i.e., from −1 to 1. The results of questionnaires are analyzed by using principle component analysis and finally the most important 3 axes, i.e., “friendly-hostile”, “lively-calm”, and “casual-formal”, are accepted to represent the FA. Finally, the FA is illustrated as shown in Fig. 1. Fig. 1 Fuzzy Atmosfield, characterized by 3D fuzzy cubic space

Multiagent Smart Communication Based on CI Technology

5

2 Visualization of Fuzzy Atmosfield The FA aims to be a tool to express the real-time atmosphere during the communication with quantitative analysis, however, the dynamic states of atmosphere plotted in the 3D fuzzy cubic space (i.e. the points in 3D coordinate) are not easily observed and understood by humans. Comparing to the coordinate values of atmosphere states, the graphics, which is one kind of visual presentations, is easier to use and understand, because of its functional, artistic, and easy-to-be-perceived characteristics. According to the elements of graphics, shape, color, and size are employed to describe the atmosphere states in “friendly-hostile” axis, “lively-calm” axis, and “casual-formal” axis, respectively, where the shape changes from circle to cross as the value of the “friendly-hostile” axis from 1 to −1; the color varies with the value of the “lively-calm” axis by using a color bar; and the size (i.e. thickness) describes the variation in the “casual-formal” axis. To associate “friendly-hostile” axis, “lively-calm” axis, and “casual-formal” axis, with shape, color, and size, respectively (cf. Figs. 2, 3, and 4), a fuzzy domain of each axis is designed from 1 to −1, where fuzzy linguistic variables “highly”, “medium”, and “low” are adopted as the extent of each attribute. Several examples of the points in Fuzzy Atmosfield are shown in Fig. 5.

Fig. 2 The shape for “friendly-hostile” axis

Fig. 3 The color for “lively-calm” axis

6

K. Hirota et al.

Fig. 4 The size for “casual-formal” axis

door

Fig. 5 Examples of graphical representation of FA

(a) FA(1, 1, 1)

(b) FA(0, 0, 0)

(c) FA(-1, -1, -1)

(d) FA(0.3, -0.6, 0.6)

Robot 4

window

Robot 1 Robot 5

Robot 2 Server Robot 3

Fig. 6 Enjoying home party by 5 robots and 4 humans

3 Enjoying Home Party by Five Eye Robots and Four Humans A Mascot Robot System has been developed by the author’s group as a part of the “Development Project for a Common Basis of Next-Generation Robots” (2005– 2007) sponsored by NEDO (New Energy and industrial technology Development

Multiagent Smart Communication Based on CI Technology

(a) FA(0.1,0.2,-0.3)

7

(b) FA(0.3,0.5,0.3)

Fig. 7 Example of demonstration scenes with FA information

Organization) and “Development of Mascot Robot System with Casual Communication Ability” (2009–2012) sponsored by JSPS (Japan Society for the Promotion of Science). The main purpose of the projects is to perform casual communication between robots and humans. The system is implemented as multi-robots connected by RT middleware (RTM) on the internet. It consists of 5 robots, i.e., 4 fixed robots (placed on a TV, a darts game machine, an information terminal, and a mini-bar) and 1 mobile robot. Each of them includes an eye robot, a speech recognizer/synthesizer, a web camera, and a notebook PC. These robots are connected together with a server through the internet by RTM, where fuzzy interruption technology makes it possible to perform smooth communications among plural robots. The Mascot Robot System’s functioning is demonstrated in an ordinary living room, where casual communication between 5 robots and 4 human beings (1 host, 2 guests, and 1 walk-in) is conducted to enjoy home party scenario. Examples of demonstration scenes are shown in Fig. 7, where the fuzzy atmosphere is shown by the shape-color-size figure in the upside right (Fig. 7).

4 Visualization Method of Emotion Information for Long Distance Interaction In long distance interaction, telecommuting and distance learning are the main tasks, but the lack of automatic communication between users and administrator of the system made the system less appealing for users. Data visualization is a central issue to confirm achievements not only for expert system administrators but also for non-expert users. A simple and effective emotion visualization method from affective computing point of view is necessary to be developed for the sake of realizing smooth interaction. A few approaches for emotion visualization have been studied, where the visualization of the emotion plays an important role in the design and understanding

8

K. Hirota et al.

of the system, but those methods are not easy to understand emotions intuitively and/or completely. A visualization method is proposed to represent users’ emotions in long distance interaction and to improve the interaction between users and administrators, where the emotion is characterized by a 3D vector in affinity pleasure-arousal space [−1, 1]. In accordance with any non-invasive emotion recognition methods, like those based in voice, face, body gestures, or combination of them, the visualization is carried out by accepting a newly proposed shape-brightness-size (SBS) model to understand users’ emotions easily and intuitively. The shape changes from an Xshape to square and finally to a circle to represent −1, 0, and 1 on pleasure-displeasure axis of affinity pleasure-arousal space, respectively. The brightness stands for the value on the arousal-sleep axis starting from the black (= −1) passing by all the gray’s gradients until the white (= 1). The size indicates the position on the affinityno-affinity axis, where the smallest size means no-affinity and the full size implies complete affinity. The proposed visualization method provides SBS figure as the information about each user’s emotions successively on the user’s terminal screen. By observing the easily understandable SBS figure displayed on the screen one after another, each user makes the best use of self-emotional transitional states for his/her further selfimprovement. All users’ emotional personal histories are recorded by the system storage, and the system administrator puts the information to practical use for establishing business strategies such as brushing up the contents and graphical user interface, and supporting users who are faced with hard tasks. Two experiments are carried-out; one is to validate the correct correspondence between the fundamental emotions (anger, disgust, fear, neutral, happiness, sadness, and surprise) and the SBS figures, and another is to confirm the availability of users’ emotional personal histories. The first experiment consists to display 7 fundamental emotions in the shape-brightness-size figures to 5 non-expert users who are asked to select the best SBS figure that matches the given emotion. The same test is applied to 3 expert users. The second experiment involves giving 5 reading-text-tasks (one page of English text in the field of chemistry, neuroscience, biology, mathematics, and quantum computing) to the same 8 users, and the system administrator detects the trends of users’ preferences to tasks by asking users about his/her emotion after finishing each task.

References 1. Liu, Z.T., Wu, M., Li, D., Chen, L.F., Dong, F.Y., Yamazaki, Y., Hirota, K.: Concept of fuzzy atmosfield for representing communication atmosphere and its application to humans-robots interaction. J. Adv. Comput. Intell. Intell. Inf. 17(1), 3–17 (2013) 2. Dong, F.Y., Hirota, K.: Concept of fuzzy atmosfield and its visualization. In: Seising, R., Trillas, E., Moraga, C., Termini, S. (eds.) On Fuzziness. A Homage to Lotfi A. Zadeh Vol. 1. Studies in Fuzziness and Soft Computing, vol. 216, pp. 257–263. Springer, Heidelberg (2013)

Multiagent Smart Communication Based on CI Technology

9

3. Garcia-Sanchez, J.A., Ohnishi, K., Shibata, A., Dong, F.Y., Hirota, K.: Visualization method of emotion information for long distance interaction. In: 7th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (IEEE-HNICEM 2014) joint with 6th International Conference on Computational Intelligence and Intelligent Informatics (ISCIII 2014), Palawan, Philippine, DSP-09, November 2014

Develop a Prediction Model for Nonmelanoma Skin Cancer Using Deep Learning in EHR Data Chih-Wei Huang, Alex P. A. Nguyen, Chieh-Chen Wu, Hsuan-Chia Yang, and Yu-Chuan (Jack) Li

Abstract We aimed to develop deep learning models for the prediction of the risk of advanced nonmelanoma skin cancer (NMSC) in Taiwanese adults. We collected the data of 9494 patients from Taiwan National Health Insurance data claim from 1999 to 2013. All patients’ diseases and medications were included in the development of the convolution neural network (CNN) model. We used the 3-year medical data of all patients before the diagnosed NMSC as the dimensional time in the model. The area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were computed to measure the performance of the model. The results showed the mean (SD) of AUC of the model was 0.894 (0.007). The performance of the model observed with the sensitivity of 0.83, specificity of 0.82, and 0.57 for PPV value. Our study utilized CNN to develop a prediction model for NMSC, based on non-image and multi-dimensional medical records.

1 Introduction Nonmelanoma skin cancer (NMSC), comprising squamous cell carcinoma and basal cell carcinoma, is the most common type of malignant neoplasm in white individuals [1]. Incidence rates of NMSC could exceed 100 per 100 000 person-years in many fair-skinned populations around the world [4], while the NMSC incidence among Asian individuals is 2.3–9.2 per 100 000 population [8]. However, the incidence of NMSC in Hispanic and Asian individuals has continued to increase [3]. Several skin cancer risk prediction models used for communicating risk factors have been proposed for risk stratification and assisting prevention interventions. Artificial intelligence (AI) and machine learning are already widespread, it has been applied in all field of science including clinical medicine [5]. Deep learning is a type of the highly flexible machine learning approach, has emerged a groundbreaking C.-W. Huang · A. P. A. Nguyen · C.-C. Wu · H.-C. Yang · Y.-C. Li (B) International Center for Health Information Technology, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_2

11

12

C.-W. Huang et al.

algorithm to enhance the performance of current machine learning techniques and to solve complex problems [2]. In this study, we aim developing a deep learning algorithm for detection of the onset nonmelanoma skin cancer (NMSC) using patient’s phenotype.

2 Methods 2.1 Data Sets In this study, we used the reimbursement data from Bureau National Health Insurance (NHI) system in Taiwan and has registered all medical records since 1996 [6]. We obtained a randomization of two million population of the NHI claim data from January 01, 1999 to December 31, 2013. The study was approved by the institutional review board committee at Taipei Medical University and the need for informed consent was waived.

2.2 Study Population We identified patients who were diagnosed with a primary diagnosis of NMSC (International Classification of Disease, Ninth Revision [ICD-9] codes 173) between January 01, 1999, and December 31, 2013. The diagnosis accuracy of NMSC was confirmed by both specific ICD-9 codes and inclusion in the Registry for Catastrophic Illness Patient (RCIP), a subpart of NHI database [6]. Surgical pathological confirmation or undergoing treatment process of NMSC is required for patients to be registered in the RCIP. The index date was defined as the date of diagnosis. Patients were excluded if they were younger than 20 years of age at the date of diagnosis NMSC or if they did not have any outpatient claims at each calendar year of four years before the index date. We randomly selected 7665 individuals without any cancer diagnosis during 15 years in the NHI claim database. All medical records of those were also collected. We defined the date of the last visit observed in the NHI database as the index date of patients with cancer-free. We then used the same conditions as NMSC patients to cross-check their age and the history of medical data. In this study, our purpose was to predict NMSC in one year ahead; we, therefore, used the data of patients with and without cancer diagnosis from the fourth year to the first year before the index date and left out a year data to be the gap of developing NMSC cancer (see Fig. 1).

Develop a Prediction Model …

13

Fig. 1 Observation window and lag time

2.3 Features Feature Definition We first used the index date to observe the age and sex of overall individuals (i.e. with and without cancer diagnosis). We included the ICD9-CM 3-digit codes such as 001– 999, V01–V91 in the features of the development of the model. The supplementary classification of external causes of injury and poisoning codes, E000–E999 were excluded in the feature list. For the medications, the Taiwan medication codes were collected and mapped to WHO-ATC (Anatomical Therapeutic Chemical Classification System) codes. The ATC-4 (i.e. ATC 5-digit) codes were used in the further development of the model. Total 1931 features (i.e. age, sex, 1099 ICD9-CM codes, and 830 ATC-4 codes) were used in the development of the NMSC prediction model. Time Definition and Data Processing We collected the data of all individuals in 1095 days (i.e. 3-year data). For each disease, the patient was considered as a diagnosis if at least three visits were observed with its ICD9 code during the 3-year period. For each medication, the patient was defined as a user if its ATC code was observed in the prescriptions, and the medication observation time was the number of days prescribed by physicians. For each patient, the 7-day data was summed up continuously, and the final data of 157 grids were derived from 1095-day data set (see Fig. 2). The value of each feature of each patient was then normalized to scale up to the range in between 0 and 1 in order to make the similar severity of all individuals as the same cause. It expresses mathematically in (1) as: xnew = x − xmin /xmax − xmin ,

(1)

14

C.-W. Huang et al.

Fig. 2 The processing of time data

Fig. 3 An example of normalized data

where x = (x1 , …, xn ) and xnew is now the new normalized value. Figure 3 shows an example of the normalized data.

2.4 Development of the Model Deep learning is the process of training a neural network (a large mathematical function with millions of parameters) to perform a given task. Figure 4 shows the convolution neural network (CNN) structure of the study. We created the function

Develop a Prediction Model …

15

Fig. 4 CNN structure of the study

that computed the NMSC severity from a large set of the matrix of all individual. During the training process, the parameters of the neural network are initially set to random values. The total of 1929 features data including 1099 ICD9 codes and 830 ATC codes, were categorized into 19 groups (i.e. 18 groups for ICD9, for example, 001-139-infectious and parasitic disease group; and one ATC group) for all patients. We created two convolutional layers with 32 feature maps for each category. Two max-pooling layers with the filters of size 1 × 3 were applied to reduce the number of parameters, and to control overfitting. After that, we flattened the pooled feature maps with the addition of age and sex into another layer in order to process the artificial neural network. A hidden layer of 400 neurons was used in this fully-connected layer. The optimization algorithm used to train the network weights was a distributed stochastic gradient descent with a mini-batch size of 32. The model was optimized using Adadelta optimizer [2]. The input and hidden layers used a Rectified Linear Unit (ReLU) activation function, while the output layer used the Softmax activation function. The dropout value of 0.1 used to prevent the overfitting and was applied to the input and hidden layer [2]. The average k-fold cross-validation accuracy, with a k-value of 5, was used as the metric to determine the best performance, optimizer, and loss functions of the model. We computed the area under the receiver operating curve (AUC) value, sensitivity, specificity, and positive predictive value (PPV) to observe the model performance. The software was implemented using python v.3.7 and the model was created and trained with the Tensorflow framework v.1.9.

3 Results Patient demographics of cancer and non-cancer group are summarized in Table 1. Following the conditions, there were 9494 patients including in the study, of which 1829 diagnosed with NMSC and 7665 patients were without a cancer diagnosis.

16

C.-W. Huang et al.

Table 1 Characteristics of patients in this study Age, mean (SD), y

Case (n = 1829)

Control (n = 7665)

65.3 (15.7)

47.5 (17.3)

Male, No. (%)

906 (49.5)

3714 (48.5)

Diagnosis count, No.

551033

211672

Number of ICD9 counts per patient per year

43.4

26.8

Number of Medications per patient per year

863.4

388

The mean age (SD) of patients with NMSC diagnosis was 65.3 (15.7) compared to patients without cancer of 47.5 (17.3). Most of the patients diagnosed with NMSC were female (50.5). In addition, the average number of diagnosis of a NMSC patient was significantly higher compared to a non-cancer patient before the index date (e.g. 43.4 vs. 26.8 in the first observed year). Figure 5 shows the AUROC of the model with 5-fold cross-validation. The mean (SD) of AUC of the model was 0.894 (0.007). The performance of the model observed with the sensitivity of 0.831, specificity of 0.82, and 0.57 for PPV value.

Fig. 5 Area Under the Receiver Operating Characteristic Curve (AUROC) of the Model

Develop a Prediction Model …

17

4 Discussion The results demonstrated that the CNN model could appropriately predict NMSC, based on non-image and multi-dimensional medical records (i.e. phenotype, diseases and medication). The results showed very good sensitivity and specificity. This automated system for the detection of NMSC offers several advantages, including consistency of interpretation (it could be a good screening tool for NMSC for any person before deciding to take further some examination tests). In this study, we tried to observe the features weight by using odds ratio (ORs) and stepwise feature extraction (i.e. ANN model). Most of the top 10 features were related to carcinoma in situ of skin, degenerative osteopathy, hypertension, chronic kidney insufficiency, cirrhosis, age, and CPOD. Comparing to other studies [7, 9, 10], our study, thus, showed high AUC (0.894). Skin cancer is complex, with many factors associated with its development. Instead of focusing on a single risk factor, the present CNN prediction model could weigh all diagnoses and medications to make an accurate prediction. Our study used clinical visits, diagnostic information, and prescriptions along with sequential temporal information. Future studies may further incorporate procedures, laboratory test results, radiographic test results, pathologic test reports, medical expenditures, and accessible personal health information. The main strength of this study lies in a use of a population-based cohort with a large and nationally representative sample, which increases its generalizability in Taiwan. However, we have to acknowledge several limitations as below. First, this is a retrospective cohort study, so selection bias and misclassification may be introduced. Second, inherent limitations of NHIRD hinder our ability to get some information related to the NMSC, such as smoking habits, alcohol consumption, body mass index (BMI), family history of NMSC, diet, and physical activity. However, we have included the comorbidities as surrogates for some risk factors of NMSC, such as COPD and a stop-smoking clinic for smoking, alcohol-related illness for alcohol, and obesity for BMI. Finally, many chronic diseases such as diabetes mellitus (DM) and hypertension are not labeled every time on diagnosis, which may lessen the importance of those diseases. Further external validation using more recent NHIRD data sets or other Asian databases can be performed.

5 Conclusion Our study utilized CNN to develop a prediction model for NMSC, based on nonimage and multi-dimensional medical records (diagnoses and medications). In traditional studies, the time-dependent variables were not able to include the time dimension and were either dichotomous (with or without hypertension) or averaged in a single value. In the real world, however, time sequence matters. Our model was fed by so-called “time-disease map”, hence the evolving of diseases can be considered.

18

C.-W. Huang et al.

The ultimate goal of our study is to develop a web service for the public, helping them make predictions of risk of getting NMSC one year before based on prior threeyear-diagnosis and medication records. Therefore, this model provides a cheap and instant first-line screening for NMSC which doesn’t require medical examination in the first place. In the future, we would like to include more features in our study, including lifestyle, physical characteristics, lab examination, and genetic information, to help predict the risk more accurately. Acknowledgements This research is sponsored in part by Ministry of Science and Technology (MOST) under grant MOST 107-2634-F-038-002, MOST 106-2634-F-038-001-CC2; The Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan (TMU DP2-107-21121-01A-06); MOST 108-2410-H-038-010-SSS; Taipei Medical University under grant TMU107-AE1B18. We would like to thank Mr. Chia-Wei Liang for his assistant in this study.

References 1. Apalla, Z., Lallas, A., Sotiriou, E., et al.: Epidemiological trends in skin cancer. Dermatol. Pract. Concept. 7, 1–6 (2017) 2. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436 (2015) 3. Loh, T.Y., Ortiz, A., Goldenberg, A., et al.: Prevalence and clinical characteristics of nonmelanoma skin cancers among hispanic and asian patients compared with white patients in the united states: a 5-year, single-institution retrospective review. Dermatol. Surg. 42, 639–645 (2016). Official publication for American Society for Dermatologic Surgery [et al.] 4. Lomas, A., Leonardi-Bee, J., Bath-Hextall, F.: A systematic review of worldwide incidence of nonmelanoma skin cancer. Br. J. Dermatol. 166, 1069–1080 (2012) 5. Nguyen, P.A., Jack Li, Y.C.: Artificial intelligence in clinical implications. Comput. Methods Programs Biomed. 166, A1 (2018) 6. Nguyen, P.A., Syed-Abdul, S., Iqbal, U., et al.: A probabilistic model for reducing medication errors. PLoS ONE 8, e82401 (2013) 7. Olsen, C.M., Neale, R.E., Green, A.C., et al.: Independent validation of six melanoma risk prediction models. J. Invest. Dermatol. 135, 1377–1384 (2015) 8. Sng, J., Koh, D., Siong, W.C., et al.: Skin cancer trends among Asians living in Singapore from 1968 to 2006. J. Am. Acad. Dermatol. 61, 426–432 (2009) 9. Vuong, K., Armstrong, B.K., Weiderpass, E., et al.: Development and external validation of a melanoma risk prediction model based on self-assessed risk factors. JAMA Dermatol. 152, 889–896 (2016) 10. Vuong, K., Mcgeechan, K., Armstrong, B.K., et al.: Risk prediction models for incident primary cutaneous melanoma: a systematic review. JAMA Dermatol. 150, 434–444 (2014)

VectorDefense: Vectorization as a Defense to Adversarial Examples Vishaal Munusamy Kabilan, Brandon Morris, Hoang-Phuong Nguyen, and Anh Nguyen

Abstract Training deep neural networks on images represented as grids of pixels has brought to light an interesting phenomenon known as adversarial examples. Inspired by how humans reconstruct abstract concepts, we attempt to codify the input bitmap image into a set of compact, interpretable elements to avoid being fooled by the adversarial structures. We take the first step in this direction by experimenting with image vectorization as an input transformation step to map the adversarial examples back into the natural manifold of MNIST handwritten digits. We compare our method vs. state-of-the-art input transformations and further discuss the trade-offs between a hand-designed and a learned transformation defense.

1 Introduction Deep neural networks (DNNs) have been shown to be extremely vulnerable to adversarial examples (AXs)—the inputs specifically optimized to fool them [25, 36]. An imperceptible perturbation can be crafted such that when added to the input of a deep image classifier would change its prediction entirely [36]. These adversarial perturbations can be as minute as one single pixel [35]; computed inexpensively with a single backward pass through the DNN [11] or via black-box optimization algorithms [7, 13, 27]. The inability to resist AXs is a general, task- and dataset-agnostic weakness across most machine learning models [1]. Importantly, AXs generated to fool one target model transfers to fool other unknown models [27]. This poses serious implications on the security and reliability of real-world applications, especially the safety-critical domains like autonomous driving. V. M. Kabilan · B. Morris · A. Nguyen (B) Auburn University, Auburn, USA e-mail: [email protected] H.-P. Nguyen Thang Long University, Hanoi, Vietnam e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_3

19

20

V. M. Kabilan et al.

Fig. 1 The top row shows the adversarial examples b-f crafted for a real test image, here “3” a via state-of-the-art crafting methods (top labels). The bottom row shows the results of vectorizing the respective images in the row above. Below each image are the predicted label and confidence score from the classifier. All the attacks are targeted to label “1” except for DeepFool (which is an untargeted attack). Vectorization substantially washes out the adversarial artifacts and pulls the images back to the correct labels. See Sect. 5 for more examples

Fig. 2 The workflow of VectorDefense. Given a real image in the bitmap space a, an attacker crafts an adversarial example b. Via vectorization (i.e. image tracing), we transform the input image b into a vector graphic c in SVG format, which is an XML tree with geometric primitives such as contours and ovals. The vector graphic is then rasterized back to bitmap d before being fed to a classifier for prediction. VectorDefense effectively washes out the adversarial artifacts and pulls the perturbed image b back closer to the natural manifold d

Recent research suggests that the adversarial problem is an artifact of classifiers being imperfectly trained i.e. having a non-zero error on test data [9, 33]. However, perfect generalization is impractical to obtain in real-world settings where the input space is high-dimensional, and our models are trained to minimize the empirical risk over a limited training set. Despite a large body of recent research, the adversarial problem remains largely unsolved [1, 40]. A state-of-the-art, effective defense

VectorDefense: Vectorization as a Defense to Adversarial Examples

21

mechanism is adversarial training [11, 21]—directly training a model on both clean1 and adversarial examples—is not a general solution. Furthermore, we wish the ideal defense layer to be (1) attack-agnostic, and (2) model-agnostic (e.g. not having to alter or impose additional constraints on an existing image classifier). In this paper, we explore harnessing vectorization (i.e. image tracing) [3, 31]—a simple image transformation method in computer graphics—for defending against AXs. We refer to our method as VectorDefense. Specifically, we transform each input bitmap image into a vector graphic image (SVG format)—which is composed of simple geometric primitives (e.g. oval and stroke)—via Potrace [31], and then rasterize it back into the bitmap form before feeding it to the classifier (Fig. 2). VectorDefense employs vectorization to smooth out the adversarial perturbations, which are often minute, and local at the pixel level (Fig. 1). Here, we attempt to decompose (i.e. vectorize) an input image (e.g. a handwritten digit “9”) into layers of compact, and interpretable elements (e.g. a stroke and an oval in Fig. 2c) that are resolution-independent. We therefore test vectorizing MNIST data, which compose of simple stroke structures, to purify AXs without having to re-train the classifiers. We make the following contributions: 1. We show that VectorDefense is a viable input transformation defense to the problem of AXs. We validate our hypothesis on (1) classifiers trained on MNIST [18]; (2) across 6 state-of-the-art attack methods; and (3) under both white- and gray-box threat models (Sect. 4.2). 2. We compare and show that VectorDefense performs competitively with state-ofthe-art hand-designed input transformation methods including bit-depth reduction [39] and image quilting [12] (Sect. 4.2). 3. We evaluate the effectiveness of input transformation techniques over the number of pixels allowed to be perturbed (i.e. a budget). That is, we propose budget-aware variants of JSMA [28] and C&W L 0 [5] that craft an AX with a given budget. (Sect. 4.3). 4. We compare and contrast VectorDefense with Defense-GAN [30], a state-of-theart input transformation method with a learned prior. Interestingly, we found that these two types of approaches can perform distinctively in many cases (Sect. 4.5).

2 Adversarial Attacks and Defenses Consider a classifier f (·) that attempts to predict a true label y given an input image x . An AX is a slight perturbation of an input, x , such that f (x ) = y. The AX can be made visually indistinguishable from the original image. AX crafting algorithms can be also constrained by a norm of the perturbation e.g.  ·  p [5]. In this paper, we evaluate our defense under two strong threat models: white-box and gray-box (characterized by the level of knowledge the adversary has about the 1 We use clean and real images interchangeably to refer to real dataset examples (without any perturbations).

22

V. M. Kabilan et al.

Fig. 3 Given clean MNIST images a which are correctly classified with a label l (here, 0–9 respectively), we add perturbations b to produce adversarial examples c that are misclassified as l + 1 (i.e. digit 0 is misclassified as 1, etc.) via I-FGSM [17]. VectorDefense effectively purifies the adversarial perturbations in the final results e, which are correctly classified. Row d shows the difference between the original a vs. purified examples e. Below each image we show its predicted label and confidence score. See Sect. 5 for more examples of VectorDefense vs. other attack algorithms

victim). White-box: the attacker has access to the network weights, and also any defense method (here, the input transformation layer). Gray-box: the attacker has access to the network parameters, but not the defense method.

2.1 Attacks Many methods for crafting adversarial examples have been proposed [1]. We briefly describe the most relevant gradient-based attacks, which we use to evaluate VectorDefense (Sect. 4). A computationally inexpensive procedure is the fast gradient sign method (FGSM) [11]. This attack computes the adversarial input through a single backward pass of the network with respect to the input, amounting to a single-step maximization of the loss function. If the classifier f (·) has a differentiable loss function J (·, ·), calculating the adversarial input is written as x = x +  · sign(∇x J (x, y))

(1)

VectorDefense: Vectorization as a Defense to Adversarial Examples

23

where y is the true label and  determines the strength of the attack (i.e. how much perturbation FGSM is allowed to make). The process can be adapted to an iterative procedure [17] that is more effective with less perturbation: x(i+1) = clipx, (x(i) + α · sign(∇x(i) J (x(i) , y)))

(2)

where x(0) = x is the original image. We refer to this adaptation as iterative fast gradient sign method (I-FGSM). Each iteration, we take a step of size α in the adversarial direction, always staying within  distance of the original input, measured by the L ∞ norm. That is, a cap is imposed on the maximum perturbation that can be made at any pixel. Optimization continues until the image is misclassified or the maximum perturbation has been reached. I-FGSM is similar to projected gradient descent (PGD) attack [19] except that PGD starts from a random point within an -norm ball. Another gradient-based method is the Jacobian Saliency Map Algorithm (JSMA) [28] which relies on the saliency map computed from the backpropagation. The saliency map shows how sensitive the network’s prediction is to the input pixels. In each step, JSMA selects two pixels to perturb, and repeats the process until a perturbation limit is reached or the DNN prediction becomes incorrect. The DeepFool algorithm [22] finds an AX by iteratively projecting x onto a linearized approximation of the decision boundary of classifier f (·). In each iteration, DeepFool performs the below update: x(i+1) = x(i) −  ·

f (x(i) ) ∇x(i) f (x(i) ) ∇x(i) f (x(i) )22

(3)

where again x(0) = x. Carlini and Wagner (C&W) attack [6] instead uses Adam optimizer [15] and incorporates a constraint on the adversarial perturbation in three different ways: (1) limiting the amount of changes in L 2 distance from the original image; (2) limiting the maximum amount of changes to any pixel via L ∞ ; (3) limiting the number of pixels that can be perturbed (i.e. a budget) via L 0 . To study the effectiveness of the input transformation defenses over perturbation budget settings, we modify JSMA and C&W L 0 to take in an additional input parameter that specifies the number of pixels that can be modified. We refer to these algorithms as (1) Budget-aware JSMA and (2) Budget-aware C&W L 0 respectively: • In every step, budget-aware JSMA selects a pair of pixels to perturb until the budget limit is reached regardless of whether the image is misclassified (i.e. f (x ) = y). • In contrast, budget-aware C&W L 0 maintains a set of all pixels that can be perturbed (initially the entire image), and then iteratively shrinks this set until its size equals a given budget. When it is not possible to find an AX at the given budget, the original image is returned.

24

V. M. Kabilan et al.

2.2 Defenses A number of defense strategies have been proposed to stymie the effectiveness of adversarial perturbations [1]. Here we describe existing input transformation schemes for later comparing with our VectorDefense. Input transformation defenses seek to “purify” an AX i.e. removing the adversarial perturbations (often small and local) while maintaining the features necessary for correct classifications. Input transformations can be categorized into those with hand-designed vs. learned priors. Under hand-designed priors, many methods have been proposed [1]. An effective technique is bit-depth reduction [39], which reduces the degrees of perturbation freedom within each pixel by quantizing the image. Image quilting [12] instead replaces patches of an image with similar clean patches extracted from a database. There are input transformation defenses that instead harness a learned prior in the form of Generative Adversarial Networks (GANs) [14, 20, 30, 32], and PixelCNN [34]. Among those, we directly compare VectorDefense with a state-of-the-art method called Defense-GAN [30], which harnesses a GAN [10] to purify AXs. Basically, the idea is similar to conditional iterative image generation techniques [23, 24, 41]. First, we train a generator G(·) to map a latent code z from a simple prior distribution (e.g. Gaussian) to an image x in a target distribution (e.g. handwritten digits in MNIST) which the classifier is also trained on. Then, we search for a latent code z∗ such that the resulting image matches a target (adversarial) image x i.e. G(z∗ ) ≈ x. The final image found G(z∗ ) is then fed to the classifier for prediction. Interestingly, this method is not completely robust. Since G is not guaranteed to be perfectly trained and therefore, one can find AXs directly in the generated distribution [2, 14]. Overall, these defenses, and others that do not explicitly use input transformations, are still imperfect at resisting AXs [1, 2, 4, 5]. This work explores a novel, intuitive method that translates an input image into contours and other simple geometric shapes in an attempt to purify AXs. We view VectorDefense as a stepping stone towards decomposing images into compact, interpretable elements to solve the adversarial problem.

3 Methods VectorDefense is an input transformation defense that aims to remove the adversarial artifacts in an AX, while preserving the class-specific features needed for classification. Specifically, we utilize Potrace vectorization algorithm [31] to translate an input bitmap image into a vector graphic. Then we rasterize the vector graphic back to a bitmap before classification (Fig. 2). We first describe Potrace (Sect. 3.1) and then how the defense method works (Sect. 3.2).

VectorDefense: Vectorization as a Defense to Adversarial Examples

25

3.1 The Potrace Algorithm Potrace is an algorithm to convert a black-and-white bitmap image (i.e. a binary image), into an algebraic description of its contours, typically in Bézier curves [31]. Therefore, a vector image is often smooth-edged and does not have pixelation effect (Fig. 2c vs. d). This description is preferred in graphic design since it is scaleinvariant. We chose to experiment with Potrace because the algorithm is simple, efficient and has been shown effective for handwritten images [31]. At a high level, the algorithm works by tracing polygons onto the bitmap image based on intensity differences, then optimizing curves to match the contours of those polygons [31]. There are four steps in Potrace: Step 1: An input bitmap image is decomposed into a set of paths that are formed by drawing boundaries that separate black and white regions. Step 2: Potrace approximates each path found by a polygon. The polygons are constructed with the constraints of having the fewest number of edges while matching their respective paths. Step 3: Potrace transforms each polygon into a smooth vector outline (i.e. a set of contours). Step 4: Potrace joins adjacent Bézier curve segments together into a more compact vector file, though this often has imperceptible visual effects on the final output.

3.2 Vectorization as a Defense We describe the components of the VectorDefense pipeline that are hypothesized to be effective in purifying AXs. Binarization. Vectorization is an image tracing algorithm based on a color palette [31]. Here, we choose to use only black and white colors for tracing i.e. we binarize an image first before feeding into Potrace. The intuition that binarizing images effectively reduces the input space dimension where adversarial perturbations could be made has also been confirmed in a concurrent work of bit-depth reduction [39] (here, we reduce images into 1-bit). Despeckling. An adversarial example often has many speckles i.e. small color blobs that are not part of the digit but injected to fool the classifier (Fig. 1; top row). Despeckling is an attempt to remove these speckles by dropping all the paths (returned from step 1 in Potrace) that consists of fewer than t pixels. We empirically find that t = 5 works the best in removing adversarial speckles (Fig. 4). Smoothing. Real handwritten digits often possess a large amount of smoothness (Fig. 3a), which is corrupted by adversarial perturbations (Fig. 3c). The steps 1–3 of Potrace vectorize an image into a set of contours, intuitively smoothing out the noisy perturbations (Fig. 4a vs. b). Therefore, vectorization is also effectively a hand-designed smoothness prior e.g. total variation [12].

26

V. M. Kabilan et al.

Fig. 4 Despeckling removes small color blobs by dropping all the paths that consists of fewer than t pixels. We show the results of Potrace vectorization on an adversarial example crafted by C&W L 0 method [5] a with different values of t = 0, 5, 10, 20. t = 0: no despeckling

4 Experiments and Results We evaluate VectorDefense as an input transformation layer to help a state-of-the-art DNN classifier trained on MNIST correctly classify AXs created by 6 state-of-theart gradient-based attack methods: I-FGSM, C&W L 2 , PGD, DeepFool, JSMA and C&W L 0 . We compare VectorDefense against two hand-designed input transformation methods: image quilting [12], and bit-depth reduction [39] because of the following reasons. Image quilting was shown to be one of the most effective input transformation techniques on natural images, i.e. ImageNet [12]. Here, we implement image quilting for MNIST. Bit-depth reduction obtained state-of-the-art results on MNIST, especially by converting the images into 1-bit (rather than other bit-depth levels) [39]. In addition, comparing VectorDefense (which includes binarization) vs. binarization alone enables us to highlight the effectiveness of the entire vectorization transformation. In addition, we also compare and contrast with a learned input transformation method: Defense-GAN [30] to shed more light into the pros and cons of handdesigned vs. learned priors.

4.1 Experiment setup Datasets and Networks. The victim model is a DNN classifier from [37] trained on the MNIST dataset [18]. The network architecture is described in Table 1. We choose MNIST for two reasons: (1) research is still needed to understand and defend against AXs even at simple, low-dimensional problems [1, 9], which can inform defenses in a high-dimensional space (e.g. ImageNet) [9]; (2) the handwritten digits are made of simple strokes, which can be vectorized well by Potrace into decomposable geometric primitives.

VectorDefense: Vectorization as a Defense to Adversarial Examples

27

Implementation. All of the attack algorithms are from the cleverhans library [26], except C&W L 0 which is from the code released by [6]. Defense methods: Image quilting, Defense-GAN and BPDA are from the code released by [2]. We implemented the rest of the algorithms, and the code to reproduce our results is at https:// github.com/VishaalMK/VectorDefense. VectorDefense. To vectorize the input image, we used the open-source implementation of the Potrace algorithm2 . In our experiments, we make two changes to the default potrace algorithm. • We set the turn policy [31] to black, which prefers to connect the black components in Step 1 (Sect. 3.1). • We increased the default maximum size of removed speckles, the despeckling value, to 5 (Sect. 3.2). With the image converted to a scalable vector graphics (SVG) format, we render it with the open-source Inkscape3 program, having the same width and height as the original image. Bit-depth reduction. We reduce the images to 1-bit in our experiments.

Table 1 Neural network architecture used for the MNIST dataset. Conv: convolutional layer, FC: fully connected layer Architecture Conv(64, 5, 5) + Relu Conv(64, 5, 5) + Relu Dropout(0.25) FC(128) + Relu Dropout(0.5) FC + Softmax

Table 2 Input transformation defenses against gray-box attacks We compare our VectorDefense method against state-of-the-art input transformation defenses on 6 attack algorithms (b–h) (a)

(b)

(c)

(d)

(e)

(g)

(h)

Clean

I-FGSM

C&W L 2

DeepFool

PGD

C&W L 0

JSMA

[17]

[5]

[22]

[19]

[5]

[28]

No defense

99.45

0.40

0.00

0.30

0.10

0.00

0.00

Defense-GAN [30]

98.30

96.00

97.50

97.60

97.00

94.00

93.00

Bit-depthReduction [39]

99.30

97.40

96.10

99.30

98.20

31.00

32.00

Quilting [12]

99.30

93.10

96.90

99.00

96.00

47.20

65.00

VectorDefense (ours)

98.60

95.60

91.80

97.40

96.20

60.00

94.00

2 http://potrace.sourceforge.net/. 3 https://inkscape.org/.

28

V. M. Kabilan et al.

Image-quilting. In our implementation of image quilting on MNIST, we use a quilting patch size of 4x4 and a database of 60,000 images from the MNIST training set. The patch is selected at random from one of K = 10 nearest neighbours. Defense-GAN. We set the parameters of L and R to 200 and 10 respectively.

4.2 VectorDefense in a Gray-Box Setting We evaluate VectorDefense under the gray-box threat model, which has strong assumptions to defend against i.e. the attacker does have knowledge of the victim model, but not that of the defense mechanism being employed. Experiment. For each of the 6 attack methods, we compute 1000 AXs on the target DNN from 1000 random test set images. We feed the AXs through four defense methods: (1) Bit-depth reduction; (2) Image quilting; (3) VectorDefense; and (4) Defense-GAN; to produce purified images, which are then fed to the victim DNN for classification. All hyperparameters for the 6 attacks are default from cleverhans and the crafted AXs cause the victim DNN to obtain at most only 0.4% accuracy (Table 2; No defense). Results. We found that VectorDefense performed similarly to all three defense methods across four attack methods (I-FGSM; C&W L 2 , DeepFool, and PGD). Table 2 reports the accuracy scores for all 4 defenses vs. 6 attack methods. Specifically, for C&W L 0 and JSMA attacks, VectorDefense substantially outperformed existing hand-designed transformation methods (Table 2h). Qualitative comparisons also confirmed our quantitative result (Fig. 5c). Compared to other attack methods, C&W L 0 poses a distinct challenge for all input transformation methods by explicitly deleting input features (here, setting many pixels to black in the image; see Fig. 9c & Fig. 1 top row). Under such a strong attack, Defense-GAN substantially outperformed all hand-designed methods (Table 2g). This result highlighted a big performance difference between a hand-designed vs. a strong learned input transformation method and informed our experiment in the next section. See Sect. 5 for more qualitative results of VectorDefense defending against each attack method considered.

4.3 Gray-Box: Budget-Aware Attack Algorithms To study the performance of input transformations against state-of-the-art L 0 attacks, we make use of (1) Budget-aware JSMA and (2) Budget-aware C&W L 0 (described in Sect. 2.1). These attacks explicitly take in as input a budget—the number of pixels the algorithm is allowed to modify.

VectorDefense: Vectorization as a Defense to Adversarial Examples

29

Experiment. We sweep across 6 budget settings, from 8 to 48 pixels (in increments of 8 pixels). The existence of an AX is not guaranteed for every (original image, target label) pair, across every budget setting. Across all the budget settings, we therefore evaluate the defenses on a subset of 80 successful, targeted AXs, chosen from a set of one thousand AXs generated via each budget-aware algorithm. A successful AX is one that is (1) misclassified as a target class; and (2) crafted using the exact given budget. JSMA Results. Under budget-aware JSMA, VectorDefense outperformed bit-depth reduction and image quilting by a large margin (≥17%; Fig. 7a). JSMA poses a challenge to input transformation methods by setting the perturbed pixels to the extreme positive (here, solid white; Fig. 5 heatmaps). The despeckling process is hypothesized to help VectorDefense remove adversarial perturbations more explicitly and effectively than bit-depth reduction and quilting (Fig. 5). However, DefenseGAN is still the most effective in recovering from JSMA attacks.

Fig. 5 Qualitative comparison of input transformations on budget-aware JSMA. Clean images (clean) are added to adversarial perturbations (row 2) to produce adversarial images (adv). Each column a–g corresponds to a specific budget setting denoted on top. Rows 4–7 show the results of VectorDefense (vec), bit-depth reduction [39] (bit), image quilting [12] (quilt) and Defense-GAN [30] (dgan) respectively. VectorDefense effectively maps adversarial images back to the natural manifold

30

V. M. Kabilan et al.

C&W L 0 Results. Under budget-aware C&W L 0 attack, VectorDefense performs similarly to the existing hand-designed input transformations across increasing budget settings (Fig. 7b). Note that as the number of pixels perturbed increases, the average perturbation per pixel decreases, farther away from the extreme values. This makes it easier for all hand-designed methods to recover from, leading to (1) the increasing accuracy scores as the budget increases (Fig. 7b); and (2) similar performances across VectorDefense, bit-depth reduction and image quilting (Fig. 7b). Overall, Defense-GAN performed superior compared to hand-designed defenses under general L 0 attacks with extreme positive and negative perturbations (Table 2g).

4.4 White-Box: Bypassing Input Transformations Recent research showed that all state-of-the-art defenses are not completely robust to white-box attacks where the attacker has access to the defense mechanism [2].

Fig. 6 We show 6 cases a–f to highlight the qualitative difference between how hand-crafted input transformations (VectorDefense (vec), bit-depth reduction (bit), image quilting (quilt)) purify adversarial examples (AXs) compared to Defense-GAN (dgan). AXs were crafted by perturbing the original images at 40 pixels via Budget-aware JSMA. While VectorDefense transforms AXs into arguably human interpretable 4, 7, 9 or 0 (vec; a–d), Defense-GAN interestingly pulls the AXs back into 7, 1, 8, and 6 (bottom row)

(a)

(b)

(c)

(d)

(e)

(f)

VectorDefense: Vectorization as a Defense to Adversarial Examples

(a) Budget-aware JSMA Attack

31

(b) Budget-aware C&W L 0 Attack

Fig. 7 Classification accuracy of MNIST classifier (Table 1) on purified adversarial examples, crafted using a Budget-aware JSMA algorithm and b Budget-aware C&W L 0 algorithm and purified by four input transformations (1) VectorDefense; (2) Image quilting [12]; (3) Bit-depth reduction [39]; and (4) Defense-GAN [30]. Black dotted lines show the classifier accuracy on clean images, which serves as an upper bound to the effectiveness of a defense. A budget-setting of zero corresponds to the classification accuracy on clean images. Higher is better

While VectorDefense is not an exception under the white-box Backward Pass Differential Approximation (BPDA) attack [2], we observed that a large amount of distortion is required to fool the target DNN with VectorDefense. Experiment. Here we employ BPDA [2] to bypass VectorDefense. Basically, BPDA has access to a black-box input transformation method (here, VectorDefense) and uses it to approximate the gradient to compute AXs. We compare results of running BDPA to attack the target DNN with no defense vs. with VectorDefense under the same experimental setup. Optimization runs for 50 steps. L 2 dissimilarity often describes the average distortion necessary to fool a classifier [2]. Here, we report the average L 2 dissimilarity for the successfully crafted AXs. Results. Though the adversary ends up being successful, VectorDefense makes the BPDA distort the images much more vs. when there is no defense (L 2 distortion of 5.30 vs. 2.97).

4.5 Hand-Designed vs. Learned Prior Throughout the previous results, we have qualitatively seen the superiority of Defense-GAN—an input transformation method with a learned prior—compared to hand-designed methods including VectorDefense. We further compare and contrast the process of purifying AXs by these two types of input transformation approaches. Under strong attacks with 40-pixel perturbations, Defense-GAN substantially outperformed all hand-designed transformations (Fig. 7; yellow lines at 40).

32

V. M. Kabilan et al.

To shed more light into this result, we qualitatively examine cases when DefenseGAN succeeded but others failed and showed interesting observations in Fig. 6. Here, the images are strongly perturbed such that the AXs actually qualitatively change into the target class (Fig. 6 row 3; digit “6” after perturbation does look like a “0”). However, Defense-GAN prior is so strong that it pulls the image back into the correct class “6” (Fig. 6d). In contrast, VectorDefense removes the speckles around the digit and turns the image into a “0”—the label that may make sense to humans but is not the original label. Similar opposite behaviors of Defense-GAN vs. VectorDefense can be observed in other cases (Fig. 6).

5 Additional results We show below additional figures demonstrating how VectorDefense purifies AXs created by thee different attack algorithms: I-FGSM (Fig. 3), JSMA (Fig. 8), and C&W L 0 (Fig. 9).

Fig. 8 JSMA: Given starting images a, adversarial images are generated using JSMA c. VectorDefense effectively purifies the adversarial images e. The perturbations introduced by the adversary b and the purification process d can be obtained by subtracting a from c and a from e respectively. All images start out classified correctly with label l and are targeted to have a label l + 1. The images were chosen as the first of their class from the MNIST test set

VectorDefense: Vectorization as a Defense to Adversarial Examples

33

6 Discussion and Future Work We present VectorDefense—a novel defense to AXs that transforms a bitmap input image into the space of vector graphics and back, prior to classification. We showed how this transformation smooths out the visual artifacts created by state-of-the-art attack algorithms. Under strong perturbations, VectorDefense and other hand-designed input transformation methods underperformed Defense-GAN, a method with a strong, learned prior. While this result may not be too surprising, we view VectorDefense as a stepping stone towards decomposing images into compact, interpretable elements to solve the AX problem. Simple use of vectorization in this work alone reduces effectiveness of AXs, and opens up promising extensions towards more robust machine learning: • The vector representation (SVG) could be further compacted via Ramer-PeuckerDouglas algorithm [8, 29] or constrained to compose of elements from a strict set of geometric primitives (e.g. only straight strokes). • Since the vector images are resolution-independent, one could rasterize them back into much smaller-sized images. In lower-dimensional space, it can be easier to defend against AXs [9]. • Another extension of VectorDefense would be to learn a generative model on this vector space.

Fig. 9 C&W L 0 : Given starting images a, adversarial images are generated using C&W L 0 c. VectorDefense effectively purifies the adversarial images e. The perturbations introduced by the adversary b and the purification process d can be obtained by subtracting a from c and a from e respectively. All images start out classified correctly with label l and are targeted to have a label l + 1. The images were chosen as the first of their class from the MNIST test set

34

V. M. Kabilan et al.

• In high-dimensional color image space, simple image tracing may not yield compact and interpretable elements; therefore, we did not explore further vectorizing natural images. Instead it might be interesting to de-render an image into a scene graph [38], and train a prior over the graphs [16]. Acknowledgements We thank Zhitao Gong, Chengfei Wang for feedback on the drafts; and Nicholas Carlini and Nicolas Papernot for helpful discussions.

References 1. Akhtar, N., Mian, A.: Threat of adversarial attacks on deep learning in computer vision: a survey. arXiv preprint arXiv:1801.00553 (2018) 2. Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018) 3. Birdal, T., Bala, E.: A novel method for vectorization. arXiv preprint arXiv:1403.0728 (2014) 4. Carlini, N., Wagner, D.: Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311 (2016) 5. Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM (2017) 6. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy, SP, pp. 39–57. IEEE (2017) 7. Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.-J.: Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. ACM (2017) 8. Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica Int. J. Geographic Inf. Geovisualization 10(2), 112–122 (1973) 9. Gilmer, J., et al.: Adversarial spheres. arXiv preprint arXiv:1801.02774 (2018) 10. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014) 11. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014) 12. Guo, C., Rana, M., Cissé, M., van der Maaten, L.: Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117 (2017) 13. Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Query-efficient black-box adversarial examples. arXiv preprint arXiv:1712.07113 (2017) 14. Ilyas, A., Jalal, A., Asteri, E., Daskalakis, C., Dimakis, A.G.: The robust manifold defense: adversarial training using generative models. arXiv preprint arXiv:1712.09196 (2017) 15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 16. Krishna, R.: Visual genome: connecting language and vision using crowdsourced dense image annotations (2016) 17. Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016) 18. LeCun, Y.: The mnist database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/ 19. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017) 20. Meng, D., Chen, H.: Magnet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147. ACM (2017)

VectorDefense: Vectorization as a Defense to Adversarial Examples

35

21. Miyato, T., Maeda, S.-i., Koyama, M., Nakae, K., Ishii, S.: Distributional smoothing with virtual adversarial training. arXiv preprint arXiv:1507.00677 (2015) 22. Moosavi Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2016). number EPFL-CONF-218057 23. Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J.: Plug & play generative networks: conditional iterative generation of images in latent space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017) 24. Nguyen, A., Dosovitskiy, A., Yosinski, T., Brox, J., Clune, J.: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In: NIPS 29 (2016) 25. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015) 26. Papernot, N., et al.: cleverhans v2. 0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768 (2016) 27. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519. ACM (2017) 28. Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy, EuroS&P, pp. 372–387. IEEE (2016) 29. Ramer, U.: An iterative procedure for the polygonal approximation of plane curves. Computer graphics and image processing 1(3), 244–256 (1972) 30. Samangouei, P., Kabkab, M., Chellappa, R.: Defense-gan: Protecting classifiers against adversarial attacks using generative models (2018) 31. Selinger, P.: Potrace: a polygon-based tracing algorithm. Potrace (2003). http://potrace. sourceforge.net/potrace.pdf. 01 July 2009 32. Shen, S., Jin, G., Gao, K., Zhang, Y.: Ape-gan: adversarial perturbation elimination with gan. ICLR Submission, available on OpenReview (2017) 33. Smith, L., Gal, Y.: Understanding measures of uncertainty for adversarial example detection. arXiv preprint arXiv:1803.08533 (2018) 34. Song, Y., Kim, T., Nowozin, S., Ermon, S., Kushman, N.: Pixeldefend: leveraging generative models to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766 (2017) 35. Su, J., Vargas, D.V., Kouichi, S.: One pixel attack for fooling deep neural networks. arXiv preprint arXiv:1710.08864 (2017) 36. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013) 37. Tramèr, F., Kurakin, A., Papernot, N., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204 (2017) 38. Wu, J., Tenenbaum, J.B., Kohli, P.: Neural scene de-rendering. In: Proceedings CVPR, vol. 2 (2017) 39. Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017) 40. Yuan, X., He, P., Zhu, Q., Bhat, R.R., Li, X.: Adversarial examples: attacks and defenses for deep learning. arXiv preprint arXiv:1712.07107 (2017) 41. Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision, pp. 597–613. Springer (2016)

Artificial Intelligence in Infection Control—Healthcare Institutions Need Intelligent Information and Communication Technologies for Surveillance and Benchmarking Walter Koller, Andrea Rappelsberger, Birgit Willinger, Gabriel Kleinoscheg, and Klaus-Peter Adlassnig Abstract Modern healthcare and medicine depend on the implementation of best practice, which includes surveillance of, and benchmarking with, predefined quality indicators. Given the automated analysis of microbiological findings and automated surveillance of healthcare-associated infections (HAIs), we put forward arguments in favor of the increasing use of intelligent information and communication technologies for the assessment and surveillance of infection. With MOMO, a modern microbiology analytics software, as well as with MONI, a fully automated detection and monitoring system for HAIs, we registered a much greater precision of analytics and surveillance. The time taken by these systems was much less than that needed for conventional surveillance. We registered the need for timely amendments and adaptations concerning new input categories or new reporting outputs as desired by clinicians, administrators, and health authorities. Intelligent information and communication technologies are thus becoming indispensable in the construction of affordable “safety nets” for quality assurance and benchmarking, based on fully automated and intelligent data and knowledge management. These, in turn, constitute the backbone of high-level healthcare, patient safety, and error prevention.

W. Koller Clinical Institute of Hospital Hygiene, Medical University of Vienna and Vienna General Hospital, Währinger Strasse 18–20, 1090 Vienna, Austria A. Rappelsberger · K.-P. Adlassnig Section for Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria B. Willinger Division of Clinical Microbiology, Department of Laboratory Medicine, Medical University of Vienna and Vienna General Hospital, Währinger Strasse 18–20, 1090 Vienna, Austria G. Kleinoscheg · K.-P. Adlassnig (B) Medexter Healthcare GmbH, Borschkegasse 7/5, 1090 Vienna, Austria e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_4

37

38

W. Koller et al.

1 Introduction What Is Infection and What Does It Mean for Our Society? Infection is a disease that is transmissible from one individual to another. The outbreak of an infectious disease is liable to affect a very large number of persons. In history—and in low-income regions even today—infection has killed myriads of persons and is considerably reducing the life expectancy of human beings [1, 2]. With the advent of penicillin and other antibiotics, the menace of bacterial infection has disappeared from our minds, and new options have emerged. These include extended surgical procedures and highly invasive treatments for otherwise disabling or even deadly conditions, industrial livestock farming, and many other blessings from modern science which we now take for granted in many regions of the world. Regrettably, but inevitably, resistant microorganisms have appeared and are threatening the success we have achieved thus far. The development of new drugs has always been followed by the emergence of new resistances. Today we are confronted with the global spread of multi- and even omni-resistant pathogens. The Role of Microbiological Diagnostic Procedures in the Treatment and Prevention of Infectious Disease Targeted and effective antibiotic therapy is a cornerstone of infection therapy and the prevention of its spread. This concerns the spread of pathogens as well as genes conferring resistance. The diagnostic methods of microbiology are needed to identify pathogens as well as their resistance potential (specimen collection; microscopic investigation, culture, and/or molecular techniques such as polymerase chain reaction (PCR) for the identification of pathogens and their toxins or resistance genes; reporting to the physician). Microbiology reports are not only indispensable for the individual patient’s treatment. The information they provide is invaluable for epidemiology, surveillance, and other population-based services in human and veterinary medicine as well as food and water hygiene. Such information includes in-house reporting, regional, national, and international spread of pathogens and their resistances, evidence-based support for prudent use of antibiotics (antibiotic stewardship programs), epidemiological networking, and quality benchmarking. The Role of Electronic Data and Knowledge Management Due to the growing complexity of modern health management, it is dependent to an increasing extent on data-driven guidance and decisions. Electronic data and medical knowledge management have become indispensable in these fields [3, 4]. Implications for Medical Staff The management of the large body of data required for documentation signifies a substantial workload on doctors, nurses, and other experts in health institutions. These obligations interfere with the actual medical and humanitarian services we expect from caregivers, and contribute to the exhaustion of medical staff. Patients,

Artificial Intelligence in Infection Control …

39

in turn, suffer from distracted, overtired, or apathetic health personnel. All of these factors exist in a climate of growing economic and personnel restrictions [5]. The Role of Intelligent Information and Communication Technologies in Infection Control Well-designed and intelligent information and communication technologies (ICTs) are those that avoid unnecessary manual data entry, can be integrated smoothly into clinical workflows, and provide self-explanatory relevant information. By fulfilling these requirements, such technologies help in providing high-quality healthcare [6]. For more than twenty years now, we have been developing highly automated intelligent tools for the analysis of diagnostic microbiology reports [7] as well as for the detection and surveillance of healthcare-associated infections (HAIs) [8–10]. The aim of this paper is to present arguments in favor of the increasing use of intelligent information and communication technologies for the prevention and control of infection. Two applications—MOMO (Monitoring of microorganisms) and MONI (Monitoring of nosocomial infections; nosocomial refers to originating in a hospital)—which support daily patient care will be presented to emphasize these arguments. First, the respective scope of each application will be explained. Second, the positive effects of these two systems on the treatment of patients will be described. The report is concluded with a critical discussion on the use of intelligent ICTs in healthcare.

2 Methods MOMO and MONI are Intelligent ICT Systems for Surveillance, Reporting, and Benchmarking in Healthcare Institutions MOMO is a microbiology analytics and clinical tool for analyzing and reporting pathogens and antimicrobial resistances (AMR) [7, 11]. MONI is an ICT system for the automated detection and surveillance of HAIs in intensive care medicine [12, 13]. Both tools are currently in use (MONI only test-wise) at the Vienna General Hospital (VGH), which is the hospital of the Medical University of Vienna (MUV). MOMO MOMO is a microbiology analytics software that provides complete information on the occurrence, frequency distribution, and resistance of pathogens in a hospital. It accommodates all data items available in modern microbiological laboratories: bacteria and fungi including AMR profiles, PCR results, serology results, toxins, microscopy results, remarks accompanying the request sent, on the results, the time of collection of material, their entry into the laboratory, and several others. MOMO’s QuickScan functionality provides immediate overviews of the data of the individual patient. It serves as a rapidly usable clinical tool for the attending physician. The analyzed data is updated on a daily basis. Via FlexScan, the 58 different query parameters provide a high degree of flexibility for clinical quality management

40

W. Koller et al.

(QM), administrative queries, and research questions. User-defined query templates facilitate detailed customized reporting and benchmarking. MOMO is embedded into the clinical and laboratory environment: new data are provided once a day by the microbiology laboratory information system (microLIS) and the administrative hospital information system (HIS) (patient denominators). Sources of terminology for text processing include the codes for the requester, specimen type, and antibiotic/antimycotic drugs. Microbiological results—as a special feature of MOMO—are managed by text rather than by code. A specified MOMO thesaurus management system provides this thesaurus service, thus keeping pace with unexpected changes or additional entities in microbiological terminology [7]. Specific artificial intelligence (AI) components of the software include the text processing of microbiological terms as well as knowledge-based AI for alerts or interpretations of microbiological findings. Figure 1 shows the overall structure of MOMO. Changes in the Management of Microbiology Terms In using MOMO, we were confronted with the fact that microbiology reports contain terms unknown to the thesaurus. These could not be sorted in automatically, thus rendering MOMO analyses partly incomplete. This apparently simple problem was complicated by the fact that the thesaurus codes of the microLIS were sometimes overrun by free text entries of lab staff. These free text entries were displayed as readable text in ‘pdf’ microbiology reports to physicians only. We modified the analytics software MOMO in that microbiology terms could be managed by text processing rather than by code. Microbiological results containing textual elements

Fig. 1 Position of MOMO as a comprehensive tool for automated processing of microbiology lab report data into surveillance or alert information, which can be provided as output for patient-specific monitoring/alerting as well as population-specific services, such as ward and hospital analytics, QM, internal or external benchmarking, or mandatory legal reporting

Artificial Intelligence in Infection Control …

41

unknown to the thesaurus that could not be processed automatically were flagged for subsequent assignment. With this change in place, a human expert assigned the elements to the respective existing thesaurus elements or created new ones. We measured the positive effects of this change in a prospective study (see [7]). MONI MONI is a fusion of several methodologies drawn from AI, fuzzy set theory and logic, as well as medical knowledge engineering [8–10]. In general, MONI stands for two separate systems, namely MONI-ICU and MONI-NICU, which are clinical detection and monitoring systems for HAIs. MONI-ICU monitors intensive care units (ICUs) for adult patients at the about 1,800-bed Vienna University hospital. MONI-NICU is the corresponding system for neonatal intensive medicine. It monitors neonatal ICUs (NICUs) at the VGH and differs from MONI-ICU in that it uses a different knowledge base specifically developed for neonatal patients. The MONI systems use automated access to several data sources which provide the day-to-day clinical, laboratory, and care data of patients: • intensive care medical information and patient data management system (PDMS), • microLIS of the hospital which provides microbiological laboratory findings, • HIS, from which both the PDMS and the microLIS acquire administrative patient data to uniquely identify the patient and his/her hospital stay. Figure 2 shows the overall structure of MONI.

Fig. 2 Position of MONI as an intelligent tool for automated processing of specified electronic clinical and laboratory raw data into surveillance or alert information, which—if required—is outputted in appropriate formats for QM, for internal or external benchmarking, or for mandatory legal reporting. ICU: intensive care unit for adults, NICU: neonatal ICU

42

W. Koller et al.

Medical Knowledge Bases Both, the MONI-ICU and the MONI-NICU knowledge bases are Arden Syntax 2.9 representations [14] of the respective HAI definitions used by ECDC [15], Stockholm, and KISS [16, 17]. The Arden Syntax for Medical Logic Modules (MLMs) is a language for encoding medical knowledge; it consists of independent modules and is maintained and developed by Health Level Seven (HL7) International. Clinical knowledge is captured in Arden Syntax rules or procedures and can be accessed and evaluated through a processing engine. Within the Arden Syntax, individual rules or procedures are organized in MLMs, each of which contains sufficient knowledge for at least a single medical decision [18–20]. The majority of the clinical concept thresholds in MONI’s knowledge bases, such as fever, leukopenia, and leukocytosis, are fuzzy. In other words, they permit a gradual transition from one clinical concept to the next. By making thresholds fuzzy, we permit clinical borderline cases to be evaluated in a more gradual way than the usual binary inclusion or exclusion of a patient with respect to a certain sharply defined condition. What we formally capture here is the inherent linguistic uncertainty of clinical terms. In addition, propositional uncertainty—characteristic of uncertain clinical assertions—is captured by using fuzzy logic. Architecture and Processing The MONI systems are implemented in an automated, data-driven Arden Syntax framework as described in [21]. Data from the above-mentioned sources are downloaded overnight and stored in the systems’ data warehouse. Once the data transfer has been completed, the MLMs are executed by the Arden Syntax processing engine. Results and reports can be accessed through web services and displayed in a webpage frontend. Data processing in MONI is a stepwise procedure, starting with raw data and advancing from one knowledge level to the next, as shown in Fig. 3.

3 Results Modification of MOMO Thesaurus Management With the modification of microbiological ontologies management as described in the Methods section, MOMO delivered concise analyses. In a prospective set of 196,714 laboratory results, approximately 2.1% contained unassignable textual elements which would otherwise have been ignored in code-based MOMO analyses [7] (see Table 1). The modification of thesaurus management is incorporated in the last issue of MOMO. In current use the thesaurus is substantiated by synonyms, syntactic deviations, misspellings, and entries not contained earlier, with man-machine interaction of 2–3 h per week [7]. This approach helps to accommodate both, up-to-date clinical

Artificial Intelligence in Infection Control …

43

Fig. 3 Data processing layers in MONI explaining the pathway from raw data input of electronic bedside sensors, such as pulse, blood pressure, body temperature; from the biochemical laboratory, such as leukocyte count, erythrocyte sedimentation rate, C-reactive protein; from microbiology and routine bedside data entries by the ICU staff, to the required specific outputs. HAI: healthcareassociated infection, ICU: intensive care unit, NICU: neonatal intensive care unit

Table 1 Figures for 4,191 manually allocated microbiology concepts between October 30, 2018 and August 20, 2019 (approximately 2.1% of 196,714 microbiology results)

Category Culture PCR Serology

Number 580

Fraction (%) 13.8

251

6.0

1,603

38.3

Toxins

264

6.3

Microscopy

792

18.9

Miscellaneous

701

16.7

4,191

100.0

Total

reporting for immediate patient care as well as up-to-date queries for infection surveillance and epidemiology, outbreak management, quality control, benchmarking, and antimicrobial stewardship.

44

W. Koller et al.

Table 2 Effectiveness of manual HAI surveillance and electronic surveillance based on the number of detection events (95% CI). PPV, positive predictive value; NPV, negative predictive value Surveillance method

Total events generated

Sensitivity %

Specificity %

PPV %

NPV %

Manual surveillance

108

40 (23–59)

94 (86–98)

71 (44–90)

80 (71–88)

Electronic surveillance

106

87 (69–96)

99 (93–100)

96 (81–100)

95 (88–99)

MONI Performance Studies Studies assessing the MONI systems revealed a high degree of precision in surveillance (Table 2). Using automated systems, the time spent on surveillance could be reduced by 85% compared to conventional methods [10, 12]. Inputs from Clinical Users of the ICT Systems At the VGH, MOMO as well as the two MONI systems are fully integrated into the health IT landscape. By continuous guidance and observation, we noticed that clinical users (infection control personnel, clinicians, research study personnel) started to address new issues. For instance, clinicians asked for possibilities to start MOMO queries from the intensive care PDMS at their wards or at the bedside. Furthermore, intensive care specialists for preterm infants wish to use MONI in order to exchange their benchmark data with the German NEO-KISS network [17]. We observed that hospitals tend to exchange their HAI benchmark data not only with one surveillance network (such as the ECDC-based Austrian Nosocomial Infection Surveillance System [22]), but also with others like the German counterpart KISS [16, 17] or the Austrian quality indicator project AUQIP [23].

4 Discussion Having observed the surveillance of pathogens, AMR, and infections for more than two decades, we summarize our conclusions in the following. Automated vs. Manual Data and Knowledge Management First, analysis and surveillance systems depending on manual data acquisition are laborious to establish and maintain, as well as vulnerable. They depend on specifically trained and dedicated personnel, and cease to function when such personnel are no longer available. Since substantial effort is needed to keep manually operated analysis or surveillance systems alive, their potential for extension or change is limited. However, in hospitals equipped with efficient analysis and surveillance systems, users tend to ask for extensions of the systems in terms of additional parameters or connections to other networks. This reflects the growing demand for IT assistance in worldwide clinical benchmarking activities.

Artificial Intelligence in Infection Control …

45

Second, in our experience, electronic PDMSs constitute an important field of application for intelligent ICT, especially in intensive care medicine. When PDMSs are able to provide comprehensive clinical, laboratory, and denominator information on a timely basis, ICT can translate the data into the specific formats required by different networks or applications. New or modified data fields may be added, and more than one MLM (or MLM packages) for automated expert interpretation of the same data set may be implemented. Two examples of these are MONI-NICU and MONI-ICU. MONI-NICU interprets clinical data according to the sets of rules specified in NEO-KISS [17] and VermontOxford [24] as well as “clinical alert” criteria, whereas MONI-ICU is designed to provide data interpretation in accordance with ECDC [15] as well as KISS [16] or CDC/NHSN [25] criteria. Thus, MONI users may share their surveillance data with different surveillance networks nationally and internationally. Third, surveillance systems for HAIs may be regarded as mere precursors of newer and much more comprehensive surveillance systems. Wider entities have been introduced recently, including “ventilator-associated events” instead of “pneumonia” [26], and “readmissions, complications and deaths” which include HAIs, timely microbiological investigation, and appropriate antibiotic therapy [27]. Figures 1 and 2 provide a system view, emphasizing the various outcomes, be they patient specific such as AMR and HAI alerts, or population related such as information on pathogen transmission, surveillance, or benchmarking. Arguments Against the Use of ICT in Healthcare ICT has been suspected of importing “Big Brother” methods into health care systems. In fact, protected ICT systems (not only those of healthcare institutions) themselves are endangered by computer hackers. The problems of fraudulent intrusion and loss of confidentiality are yet to be conclusively resolved. Others warn against dependency on ICT systems: some medical experts express concerns about being challenged or even overruled by an advanced computer system. The expert’s inability to understand the processes behind ICT output may be the reason for a presumed loss of control over medical decisions. Regardless of whether such opposition is based on plain fear or facts, it needs to be addressed. In part, it is reminiscent of the arguments brought forward against steam engines, the railroad, or motorcars at the time of their introduction in daily life. In health and medicine, and especially in hospitals, ICT support pertains mainly to hospital administration, logistics, and billing. This is in contrast to other fields, such as the production of technical devices, state administration, business, or commerce. Medical expertise appears to have been protected from modern ICT appliances for a long time, but this is changing rapidly. Present-day medical experts still lack appreciation of ICTs. Here we face a field of work that cannot be accomplished by technical devices alone. We may assume that the future generation of medical experts will be much more willing to use ICT appliances. Our current task is to convince “digital non-natives”. From our own experience we know that individual skepticism, unfamiliarity with ICT terminology, and unwillingness to dive into the complexity of computer-based

46

W. Koller et al.

ICT decision-making may render even powerful ICT tools useless. A significant lack of awareness of the added value provided by surveillance, benchmarking, and related ICT-supportable activities remains to be overcome [5]. Finally, as stated in a review on electronic surveillance for HAIs by Freeman et al. [28]: “… electronic surveillance systems should be developed to maximize the efficacy of abundant electronic data sources existing within hospitals;”

and furthermore: “Electronic surveillance systems should be seen as an opportunity to enhance current surveillance practices. Staff involved in surveillance activities should not feel threatened by advances in this area, but should recognize that these methods can reduce the burdens associated with traditional surveillance methodologies, which will only increase as the emphasis on transparency and public reporting causes increased demand for more information to be reported.”

5 Conclusion Our aim is to implement intelligent ICT systems in health and medicine as supporting tools to handle an increasing body of knowledge that has long exceeded the mental capacity of a single human being. We need such tools for maintaining and updating health and medical knowledge, as well as applying the knowledge to the individual patient comprehensively, concisely, and in a timely manner. The tools make it possible to provide knowledgeable proposals and alerts to caregivers, and support growing surveillance, reporting, and benchmarking duties. The MOMO and MONI systems are examples of this modern ICT approach. They serve as intelligent tools that can be adapted to varying or newly emerging inputs, as well as changing output demands. Thus, they are “living” intelligent ICT systems, responsive to environmental changes.

References 1. Carmichael, A.G.: Infectious disease and human agency: an historical overview. In: Andreae, M.O., Confalonieri, U., McMichael, A.J. (eds.) Global Environmental Change and Human Health, vol. 106, pp. 3–46. The Pontifical Academy of Sciences, Vatican City (2006) 2. Brachman, P.S.: Infectious diseases—past, present, and future. Int. J. Epidemiol. 32(5), 684– 686 (2003) 3. Kawamoto, K., Houlihan, C.A., Balas, E.A., Lobach, D.F.: Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ 330, 765 (2005). https://doi.org/10.1136/bmj.38398.500764.8F 4. Garg, A.X., Adhikari, N.K.J., McDonald, H., Rosas-Arellano, M.P., Devereaux, P.J., Beyene, J., Sam, J., Haynes, R.B.: Effects of computerized clinical decision support systems on practitioner performance and patient outcomes. J. Am. Med. Assoc. 293(10), 1223–1238 (2005)

Artificial Intelligence in Infection Control …

47

5. Koller, W., Adlassnig, K.-P., Rappelsberger, A., Blacky, A.: Plea for use of intelligent information and communication technologies in infection surveillance and benchmarking by healthcare institutions. In: Bienkiewicz, M., Verdier, C., Plantier, G., Schultz, T., Fred, A., Gamboa, H., (eds.) Proceedings of the 7th International Conference on Health Informatics (HEALTHINF 2014), pp. 399–404. SCITEPRESS, Portugal (2014) 6. Rhoads, D.D., Sintchenko, V., Rauch, C.A., Pantanowitz, L.: Clinical microbiology informatics. Clin. Microbiol. Rev. 27(4), 1025–1047 (2014) 7. Koller, W., Kleinoscheg, G., Willinger, B., Rappelsberger, A., Adlassnig, K.-P.: Augmenting analytics software for clinical microbiology by man-machine interaction. In: Ohno-Machado, L., Séroussi, B. (eds.) MEDINFO 2019: Health and Wellbeing e-Networks for ALL – Proceedings of the 17th World Congress on Medical and Health Informatics. Studies in Health Technology and Informatics, vol. 264, pp. 1243–1247. IOS Press, Amsterdam (2019) 8. Adlassnig, K.-P., Blacky, A., Koller, W.: Fuzzy-based nosocomial infection control. In: Nikravesh, M., Kacprzyk, J., Zadeh, L. (eds.) Forging New Frontiers: Fuzzy Pioneers II, pp. 343–349. Springer, Berlin (2008) 9. Adlassnig, K.-P., Blacky, A., Koller, W.: Artificial-intelligence-based hospital-acquired infection control. In: Bushko, R. (ed.) Strategy for the Future of Health. Studies in Health Technology and Informatics, vol. 149, pp. 103–110. IOS Press, Amsterdam (2009) 10. Blacky, A., Mandl, H., Adlassnig, K.-P., Koller, W.: Fully automated surveillance of healthcareassociated infections with MONI-ICU – a breakthrough in clinical infection surveillance. Appl. Clin. Inform. 2(3), 365–372 (2011) 11. Medexter Healthcare GmbH [Internet]. Medexter Healthcare, Vienna; c2019. Momo. https://www.medexter.com/products-and-services/clinical-solutions/microbiology-and-amr. Accessed 12 July 2019 12. de Bruin, J., Adlassnig, K.-P., Blacky, A., Mandl, H., Fehre, K., Koller, W.: Effectiveness of an automated surveillance system for intensive care unit-acquired infections. J. Am. Med. Inform. Assoc. 20(2), 369–372 (2013) 13. Medexter Healthcare GmbH [Internet]. Medexter Healthcare, Vienna; c2019. Moni. https:// www.medexter.com/products-and-services/clinical-solutions/infection-control. Accessed 12 July 2019 14. Health Level Seven International. The Arden Syntax for Medical Logic Systems, Version 2.9, March 2013 [Internet]. Health Level Seven International, Ann Arbor (2013). https://www.hl7. org/implement/standards/product_brief.cfm?product_id=290. Accessed 1 July 2019 15. European Centre for Disease Prevention and Control [Internet]. Solna: European Centre for Disease Prevention and Control; c2019. http://www.ecdc.europa.eu/en/Pages/home.aspx. Accessed 10 July 2019 16. German National Center for Surveillance of Nosocomial Infections. Institute of Hygiene and Environmental Medicine at Charité University Medicine Berlin, Berlin. ITS-KISS. https:// www.nrz-hygiene.de/surveillance/kiss/its-kiss/. Accessed 10 July 2019 17. German National Center for Surveillance of Nosocomial Infections. Institute of Hygiene and Environmental Medicine at Charité University Medicine Berlin, Berlin. NEO-KISS. http:// www.nrz-hygiene.de/surveillance/kiss/neo-kiss/. Accessed 10 July 2019 18. Hripcsak, G.: Writing Arden Syntax medical logic modules. Comput. Biol. Med. 24(5), 331– 363 (1994) 19. Samwald, M., Fehre, K., de Bruin, J., Adlassnig, K.-P.: The Arden Syntax standard for clinical decision support: experiences and directions. J. Biomed. Inform. 45(4), 711–718 (2012) 20. Adlassnig, K.-P., Rappelsberger, A.: Medical knowledge packages and their integration into health-care information systems and the world wide web. In: Andersen, S.K., Klein, G.O., Schulz, S., Aarts, J., Mazzoleni, M.C., (eds.) eHealth Beyond the Horizon–Get IT There. Proceedings of the 21st International Congress of the European Federation for Medical Informatics (MIE 2008). Studies in Health Technology and Informatics, vol. 136, pp. 121–126. IOS Press, Amsterdam (2008) 21. Adlassnig, K.-P., Fehre, K.: Service-oriented fuzzy-Arden-Syntax-based clinical decision support. Indian J. Med. Inform. 8(2), 75–79 (2014)

48

W. Koller et al.

22. Universitätsklinik für Krankenhaushygiene und Infektionskontrolle der Medizinischen Universität Wien [Internet]. Medizinische Universität Wien, Wien. ANISS – Surveillance. https://www.meduniwien.ac.at/hp/krankenhaushygiene/forschung-lehre/aniss-surveilla nce/. Accessed 10 July 2019 23. Institut für Pflege und Gesundheitssystemforschung [Internet]. Johannes Kepler Universität Linz, Linz. AUQIP. http://www.ipg.uni-linz.ac.at/fr_leiste_proj.htm. Accessed 10 July 2019 24. Vermont-Oxford Network [Internet]. Vermont-Oxford Network, Burlington. https://public.vto xford.org/. Accessed 10 July 2019 25. Centers for Disease Control and Prevention [Internet]. Centers for Disease Control and Prevention; National Healthcare Safety Network (NHSN), Atlanta. https://www.cdc.gov/nhsn/index. html. Accessed 10 July 2019 26. Centers for Disease Control and Prevention [Internet]. Centers for Disease Control and Prevention; Surveillance for Ventilator-Associated Events, Atlanta. https://www.cdc.gov/nhsn/acutecare-hospital/vae/index.html. Accessed 10 July 2019 27. Centers for Medicare & Medicaid Services [Internet]. Centers for Medicare & Medicaid Services Medicare.gov, The Official U.S. Government Site for Medicare, Baltimore. https:// www.medicare.gov/. Accessed 10 July 2019 28. Freeman, R., Moore, L.S.P., Garcia Alvarez, L., Charlett, A., Holmes, A.: Advances in electronic surveillance for healthcare-associated infections in the 21st century: a systematic review. J. Hosp. Infect. 84(2), 106–119 (2013)

Why Some Non-classical Logics Are More Studied? Olga Kosheleva, Vladik Kreinovich, and Hoang Phuong Nguyen

Abstract It is well known that the traditional 2-valued logic is only an approximation to how we actually reason. To provide a more adequate description of how we actually reason, researchers proposed and studied many generalizations and modifications of the traditional logic, generalizations and modifications in which some rules of the traditional logic are no longer valid. Interestingly, for some of such rules (e.g., for law of excluded middle), we have a century of research in logics that violate this rule, while for others (e.g., commutativity of “and”), practically no research has been done. In this paper, we show that fuzzy ideas can help explain why some non-classical logics are more studied and some less studied: namely, it turns out that most studied are the violations which can be implemented by the simplest expressions (specifically, by polynomials of the lowest order).

1 Formulation of the Problem Commonsense Reasoning and Formal Logic. We all use logic in real life; we use phrases containing “and”, “or”, and “not” in our reasoning. Since ancient times, researchers have been trying to describe such reasoning in precise terms—i.e., trying to transform commonsense reasoning into formal logic. Traditional Logic: A Brief Reminder. The most widely used formalization of logic is the traditional 2-valued logic, which was formally described by Boole in the 19th century. Operations of this logic have many well-known properties. O. Kosheleva · V. Kreinovich (B) University of Texas at El Paso, El Paso, TX 79968, USA e-mail: [email protected] O. Kosheleva e-mail: [email protected] H. P. Nguyen Division Informatics, Math-Informatics Faculty, Thang Long University, Nghiem Xuan Yem Road, Hoang Mai District, Hanoi, Vietnam e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_5

49

50

O. Kosheleva et al.

For example, for the “and”-operation a & b is: • “false” (0) and any a is equivalent to “false”: 0 & a ⇔ 0; • similarly, a and “false” is equivalent to “false”: a & 0 ⇔ 0; • “true” (1) and any a is equivalent to a, as well as a and “true”: 1 & a ⇔ a & 1 ⇔ a; • this operation is idempotent, i.e., a & a is equivalent to a: a & a ⇔ a; • this operation is commutative, i.e., a & b is equivalent to b & a: a & b ⇔ b & a; • this operation is associative, i.e., a & (b & c) is equivalent to (a & b) & c: a & (b & c) ⇔ (a & b) & c. Similarly, the “or”-operation a ∨ b also satisfies similar properties: • • • • •

0 ∨ a ⇔ a ∨ 0 ⇔ a; 1 ∨ a ⇔ a ∨ 1 ⇔ 1; it is idempotent, i.e., a ∨ a is equivalent to a; it is commutative, i.e., a ∨ b is equivalent to b ∨ a, and it is associative, i.e., a ∨ (b ∨ c) is equivalent to (a ∨ b) ∨ c. These two operations are distributive with respect to each other: a & (b ∨ c) ⇔ (a ∨ b) & (a ∨ c); a ∨ (b & c) ⇔ (a ∨ b) & (a ∨ c). Many properties of the traditional logic involve negation:

• • • •

the rule that ¬(1) is 0, and that ¬(1) is 0; the law of excluded middle, according to which a ∨ ¬a is always true; the double negation rule, according to which ¬¬a ⇔ a; and de Morgan laws: ¬(a & b) ⇔ ¬a ∨ ¬b; ¬(a ∨ b) ⇔ ¬a & ¬b.

The Traditional 2-Valued Logic Is Only an Approximation to Commonsense Reasoning. It is well known that boolean logic is only an approximation to how we actually reason. Our actual use of “and” and “or” is more complex. For example, while in the formal logic, “and” is commutative, the phrases “I studied and I took the test” and “I took the test and I studied” clearly mean two different things. Because of this difference, researchers have been trying to come up with extensions and modification of boolean logic that would better capture commonsense reasoning. As a result, we have a plethora of different non-classical logics; see, e.g., [3, 11].

Why Some Non-classical Logics Are More Studied?

51

Non-classical Logics: Examples. Since the early 20th century, many extensions and generalizations of classical logic have appeared: • in some of these logics, propositional operations “and”, “or”, and “not” do not satisfy all the usual properties; • other logics introduce additional propositional operations; • yet other logics do both. One of the first examples of non-classical logics was intuitionistic logic— developed early in the 20th century. This logic rejects the law of excluded middle, the law of double negation, and de Morgan laws. Another example of a non-classical logic—which was known since Aristotle but which was formalized only in the early 20th century—was modal logic, that introduces additional unary operations “necessary” and “possible”. Linear logic rejects the rule a & a ⇔ a. Some logics use two different negation operations—usual negation and strong negation, etc. Challenge. But why some extensions were developed, and some were not? For example, as we have mentioned earlier, in commonsense reasoning, “and” is not always commutative, but no mainstream logics seriously considers non-commutative “and”-operations. So why there have been developed logics rejecting some laws of boolean logic and not others? What We Do in This Paper. At first glance, it may seem that the above challenge has no good answer—it is like asking why Picasso moved to a blue period and not to some other period. However, surprisingly, we do present an answer. To come up with such an answer, we take into account yet another non-classical logic that was developed specifically to capture important features of commonsense reasoning—namely, fuzzy logic (see, e.g., [1, 4, 7–10, 14]) in which: • in addition to the traditional two truth values “true” (1) and “false” (0), • we also allow intermediate truth values which are represented by numbers from the interval [0, 1]. It turns out that fuzzy ideas can indeed explain why some extensions of boolean logic have indeed been well-studied and some have not yet been thoroughly explored.

2 Main Idea Behind Our Explanation Background. When we extend propositional operations from the 2-valued set {0, 1} to the interval [0, 1], each operation becomes a continuous function of the corresponding real-valued variables. In general, such functions can be expanded in Taylor series—and can be thus approximated by polynomials, which correspond to keeping first few terms in this expansion.

52

O. Kosheleva et al.

As we increase the order of the corresponding polynomials, we get more and more accurate representation of the corresponding functions–but at the same time this representation becomes more complex. Our Main Idea. It is natural to expect that: • violations of logical laws that can be attained by the simplest (smallest order) functions will be explored first, while • violations that require much higher (and thus, more complex) polynomials will be studied much later. This Idea Helps Explain Why Some Non-classical Logics are More Studied. We will show that this natural idea helps explain why: • some non-classical logics have been thoroughly studied, while • others remain largely a not-well-studied idea.

3 Let Us Start Our Analysis: Simplest Polynomial Representations of “and”, “or”, and “not” Let us Start with “and”. Let us consider the very first properties of an “and”operation. Based on these properties, we have the following simple (and known) result. Proposition 1 • There is no linear function f (a, b) for which f (0, a) = f (a, 0) = 0 and f (1, a) = f (a, 1) = a. • The only quadratic function with these properties is f (a, b) = a · b. Proof. Let us first consider the case of linear functions. A general linear function of two variables a and b has the form f (a, b) = c0 + c1 · a + c2 · b.

(1)

From f (0, a) = 0 we conclude that c0 + c2 · a = 0 for all a. Thus, c0 = 0 and c2 = 0. Similarly, from f (a, 0) = 0, we conclude that c1 = 0 and thus, that f (a, b) = 0 for all a and b—while we wanted to have, e.g., f (1, 1) = 1 = 0. Similarly, a general quadratic function of two variables has the form f (a, b) = c0 + c1 · a + c2 · b + c11 · a 2 + c12 · a · b + c22 · b2 .

(2)

Why Some Non-classical Logics Are More Studied?

53

The requirement f (0, a) = 0 implies that c0 + c2 · a + c22 · a 2 = 0 for all a, hence c0 = c2 = c22 = 0. Similarly, the requirement f (a, 0) = 0 implies that c1 = c11 = 0. Thus, f (a, b) = c12 · a · b. The condition f (1, a) = a now implies that c12 = 1 and hence, that f (a, b) = a · b. The proposition is proven. Comment. A similar result holds for “or”-operations. Proposition 2 • There is no linear function f (a, b) for which f (0, a) = f (a, 0) = a and f (1, a) = f (a, 1) = 1. • The only quadratic function with these properties is f (a, b) = a + b − a · b. Proof. Let us first consider the case of linear functions (1). From f (0, a) = a we conclude that c0 + c2 · a = a for all a. Thus, c0 = 0 and c2 = 1. Similarly, from f (a, 0) = a, we conclude that c1 = 1 and, thus, that f (a, b) = a + b for all a and b. So, for a = 1, we will get f (1, 1) = 1 + 1 = 2, while we wanted to have f (1, 1) = 1 = 2. Similarly, for a quadratic function (2), the requirement f (0, a) = a implies that c0 + c2 · a + c22 · a 2 = a for all a, hence c0 = c22 = 0 and c2 = 1. Similarly, the requirement f (a, 0) = a implies that c11 = 0 and c1 = 1. Thus, f (a, b) = a + b + c12 · a · b. The condition f (1, a) = 1 now implies that 1 + a + c12 · a = 1 and thus, that c12 = −1. So, f (a, b) = a + b − a · b. The proposition is proven. Comment. For negation, a linear operation is possible. Proposition 3. The only linear function f (a) for which f (0) = 1 and f (1) = 0 is the function f (a) = 1 − a. Proof. is straightforward: it is based on the general form of a linear function of one variable: f (a) = c0 + c1 · a. (3) Discussion. What properties are satisfied for these simple operations: a · b for “and”, a + b − a · b for “or”, and 1 − a for “not”? • The “and” and “or”-operations are commutative and associative. • The law of double negation is satisfied. • de Morgan laws are satisfied.

54

O. Kosheleva et al.

However, already for this simple example, we can see that two major laws are not satisfied: • we do not have the excluded middle: a ∨ ¬a = 1 + (1 − a) − a · (1 − a) = 1 − a · (1 − a), so this law is not always true; and • in general, we have a & a = a 2 = a, so a & a is not always equivalent to a. Not surprisingly, the logics based on these violations—intuitionistic logic and linear logic—have been actively studied.

4 General Case of Quadratic Functions Possible Quadratic Negation Operations. In the previous section, we considered linear negation operations. However, since the only “and”- and “or”-operations are quadratic anyway, why not consider quadratic negation operations as well? Proposition 4. For quadratic functions f (a), the following two conditions are equivalent to each other: • f (0) = 1 and f (1) = 0, and • f (a) = 1 − a − c11 · a · (1 − a). Proof. For a general quadratic function of one variable f (a) = c0 + c1 · a + c11 · a 2 ,

(4)

the condition f (0) = 1 implies that c0 = 1, and the condition f (1) = 0 implies that 1 + c1 + c11 = 0, i.e., that c1 = −1 − c11 . Thus, the expression (4) takes the form f (a) = 1 − a − c11 · a + c11 · a 2 = 1 − a − c11 · a · (1 − a). The proposition is proven. Discussion • When c11 = 0, we have f ( f (a)) = a. This explains why logics with no double negation property have been actively studied. • We can have several different such operations, corresponding to different values c11 . This explains why logics with several different negation operations have been considered. • The resulting function is not necessarily monotonic—this explains why nonmonotonic logics have also been actively studied.

Why Some Non-classical Logics Are More Studied?

55

Other Quadratic Operations. What are the general extension of the identity function, i.e., a function for which f (0) = 0 and f (1) = 1? Proposition 5 • The only linear function f (a) for which f (0) = 0 and f (1) = 1 is the trivial function f (a) = a. • There exist non-trivial quadratic functions with the above properties; they all have the form f (a) = a − c11 · a · (1 − a). Proof. The case of the linear function (3) is straightforward. Let us now consider the general case of a quadratic function (4). In this case, the requirement that f (0) = 0 implies that c0 = 0, and the requirement that f (1) = 1 implies that c1 + c11 = 1, thus c1 = 1 − c11 , and the expression (4) takes the form f (a) = a − c11 · a + c1 · a 2 = a − c11 · a · (1 − a). The proposition is proven. Discussion. When c11 > 0, we have f (a) ≤ a. This can be identified with the unary operation “necessary”, for which the usual intuition is that: • if something is absolutely true, it is also absolutely necessarily true: f (1) = 1; • if something is absolutely false, it is also absolutely necessarily false: f (0) = 0; • and in general, if something is necessarily true, then it is true—but not vice versa, so our degree of confidence that a statement is necessarily true can be smaller than our degree of confidence that it is true—it could be true accidentally. When c11 < 0, we get always x ≤ f (x). This can be identified with unary operation “possible” in modal logic. So it is not surprising that modal logic have been actively developed. It should also be noticed that, in general, for c11 = 0, we have f ( f (a)) = a. This explains why modal logic actively studies logics in which an iteration of necessity is not equivalent to a single necessity operation.

5 What About Cubic Operations? To explore the possibility of violating other rules, let us consider cubic “and”- and “or”-operations, i.e., functions of the type f (a, b) = c0 + c1 · a + c2 · b + c11 · a 2 + c12 · a · b + c22 · b2 + c111 · a 3 + c112 · a 2 · b + c122 · a · b2 + c222 · b3 .

(5)

56

O. Kosheleva et al.

Proposition 6. The only cubic function for which f (0, a) = f (a, 0) = 0 and f (1, a) = f (a, 1) = a is f (a, b) = a · b. Proof. For the expression (5), the condition f (0, b) = 0 implies that c0 + c2 · b + c22 · b2 + c222 · b3 = 0 for all b. Thus, c0 = c2 = c22 = c222 = 0. Similarly, the condition f (a, 0) = 0 implies that c1 = c11 = c111 = 0. Thus, the expression (5) takes the form f (a, b) = c12 · a · b + c112 · a 2 · b + c122 · a · b2 . For this formula, the requirement that f (1, b) = b implies that c12 · b + c112 · b + c122 · b2 = 0 for all b, hence c122 = 0. Similarly, the requirement that f (a, 1) = a implies that c112 = 0. Thus, all cubic terms are 0, so f (a, b) is actually a quadratic function, and for quadratic functions, we already know that the only operation with the desired properties is f (a, b) = a · b. The proposition is proven. Proposition 7. The only cubic function for which f (0, a) = f (a, 0) = a and f (1, a) = f (a, 1) = 1 is f (a, b) = a + b − a · b. Proof. We can prove this result by taking into account that, as one can easily prove, a function f (a, b) satisfies the corresponding conditions if and only if its “dual” g(a, b) = 1 − f (1 − a, 1 − b) satisfies the conditions of Proposition 5. Since we already know, from Proposition 5, that g(a, b) = a · b, we can thus conclude that f (a, b) = 1 − g(1 − a, 1 − b) = 1 − (1 − a) · (1 − b) = a + b − a · b. The proposition is proven. Discussion. So, even if we consider cubic terms, we will still get only commutative and associative “and”- and “or”-operations. In view of our general approach, this explains why in the vast majority of logics studied so far, these operations are indeed commutative and associative. To find example of non-commutative and/or non-associative logics, we therefore need to go to polynomials of even higher orders.

Why Some Non-classical Logics Are More Studied?

57

6 Case of 4th Order Operations Let us consider general 4th order functions f (a, b) = c0 + c1 · a + c2 · b + c11 · a 2 + c12 · a · b + c22 · b2 + c111 · a 3 + c112 · a 2 · b + c122 · a · b2 + c222 · b3 + c1111 · a 4 + c1112 · a 3 · b + c1122 · a 2 · b2 + c1222 · a · b3 + c2222 · b4 . (6) Proposition 8. For 4th order functions, the following two conditions are equivalent to each other: • for all a, we have f (0, a) = f (a, 0) = 0 and f (1, a) = f (a, 1) = a, and • the function f (a, b) has the form f (a, b) = a · b − c · a · (1 − a) · b · (1 − b) for some c. Proof. The condition f (0, b) = 0 implies that for all b, we have c0 + c2 · b + c22 · b2 + c222 · b3 + c2222 · b4 = 0. Thus, we have c0 = c2 = c22 = c222 = c2222 = 0. Similarly, the condition that f (a, 0) = 0 for all a implies that c1 = c11 = c111 = c1111 = 0. Thus, the general formula (6) gets the following simplified form: f (a, b) = c12 · a · b + c112 · a 2 · b + c122 · a · b2 + c1112 · a 3 · b +c1122 · a 2 · b2 + c1222 · a · b3 . For this function, the requirement that f (1, b) = b for all b implies that c12 · b + c112 · b + c122 · b2 + c1112 · b + c1122 · b2 + c1222 · b3 = b. Thus, c1222 = 0. Similarly, the condition f (a, 1) = a implies that c1112 = 0. Thus, the above equality takes the following simplified form: (c12 + c112 ) · b + (c122 + c1122 ) · b2 = b.

58

O. Kosheleva et al. def

So, c12 + c112 = 1, hence c112 = 1 − c12 , and c1112 = −c112 . So, if we denote c = 1 − c12 , we get c112 = c, c12 = 1 − c, and c1122 = −c. Similarly, from the condition that f (a, 1) = a, we conclude that c122 = c. Thus, we get the desired expression for the function: f (a, b) = a · b − c · (a · b − a 2 · b − a · b2 + a 2 · b2 ) = a · b · (1 − c · (1 − a − b + a · b)) = a · b(1 − c · (1 − a) · (1 − b)) = a · b − c · a · (1 − a) · b · (1 − b).

The proposition is proven. Comment. For “or”-operations we have a similar result. Proposition 9. For 4th order functions, the following two conditions are equivalent to each other: • for all a, we have f (0, a) = f (a, 0) = a and f (1, a) = f (a, 1) = 1, and • the function f (a, b) has the form f (a, b) = a + b − a · b − c · a · (1 − a) · b · (1 − b) for some c. Proof. Similar to the proof of Proposition 6, we can use the fact that the dual function g(a, b) = 1 = f (1 − a, 1 − b) satisfies the conditions of Proposition 7 and thus, has the form f (a, b) = a · b − c · a · (1 − a) · b · (1 − b). Thus, g(a, b) = 1 − f (1 − a, 1 − b) = 1 − (1 − a) · (1 − b) + c · a · (1 − a) · b · (1 − b).

By taking into account that 1 − (1 − a) · (1 − b) = a + b − a · b, we get the desired expression. The proposition is proven. Discussion. We did not assume neither commutativity nor associativity. Interestingly, we got operations which are commutative but not associative. This explains why at least some research has been done for non-associative logics (see, e.g., [2, 5, 6, 12, 13] and references therein), while much fewer results are known for non-commutative ones. To get non-commutative operations, we need to consider at least 5th order polynomials. For 5th order polynomials, it is already possible to have a non-commutative operation: for example, we can take

Why Some Non-classical Logics Are More Studied?

59

f (a, b) = a · b − (ca · a + cb · b) · a · (1 − a) · b · (1 − b), for any ca = cb . Acknowledgements This work was supported in part by the National Science Foundation via grants 1623190 (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science) and HRD-1242122 (Cyber-ShARE Center of Excellence).

References 1. Belohlavek, R., Dauben, J.W., Klir, G.J.: Fuzzy Logic and Mathematics: A Historical Perspective. Oxford University Press, New York (2017) 2. Bouchon-Meunier, B., Kreinovich, V., Nguyen, H.T.: Non-associative operations. In: Proceedings of the Second International Conference on Intelligent Technologies InTech 2001, Bangkok, Thailand, 27–29 November 2001, pp. 39–46 (2001) 3. Gabbay, D.M., Guenthner, F. (eds.): Handbook of Philosophical Logic. Springer, Cham (2018) 4. Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic. Prentice Hall, Upper Saddle River (1995) 5. Kreinovich, V.: Towards more realistic (e.g., non-associative) ‘and’- and ‘or’-operations in fuzzy logic. Soft Comput. 8(4), 274–280 (2004) 6. Martinez, J., Macias, L., Esper, A., Chaparro, J., Alvarado, V., Starks, S.A., Kreinovich, V.: Towards more realistic (e.g., non-associative) and- and or-operations in fuzzy logic. In: Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference, Tucson, Arizona, 7–10 October 2001, pp. 2187–2192 (2001) 7. Mendel, J.M.: Uncertain Rule-Based Fuzzy Systems: Introduction and New Directions. Springer, Cham (2017) 8. Nguyen, H.T., Kreinovich, V.: Nested intervals and sets: concepts, relations to fuzzy sets, and applications. In: Kearfott, R.B., Kreinovich, V. (eds.) Applications of Interval Computations, pp. 245–290. Kluwer, Dordrecht (1996) 9. Nguyen, H.T., Walker, C., Walker, E.A.: A First Course in Fuzzy Logic. Chapman and Hall/CRC, Boca Raton (2019) 10. Novák, V., Perfilieva, I., Moˇckoˇr, J.: Mathematical Principles of Fuzzy Logic. Kluwer, Boston (1999) 11. Priest, G.: An Introduction to Non-Classical Logic: From If to Is. Cambridge University Press, Cambridge (2008) 12. Trejo, R., Kreinovich, V., Goodman, I.R., Martinez, J., Gonzalez, R.: A realistic (nonassociative) logic and a possible explanations of 7 ± 2 law. Int. J. Approximate Reasoning 29, 235–266 (2002) 13. Xiang, G., Kreinovich, V.: Towards improved trapezoidal approximation to intersection (fusion) of trapezoidal fuzzy numbers: specific procedure and general non-associativity theorem. In: Proceedings of the IEEE World Congress on Computational Intelligence WCCI 2010, Barcelona, Spain, 18–23 July 2010, pp. 3120–3125 (2010) 14. Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)

Why h-Index Vladik Kreinovich, Olga Kosheleva, and Hoang Phuong Nguyen

Abstract At present, one of the main ways to gauge the quality of a researcher is to use his or her h-index, which is defined as the largest integer n such that the researcher has at least n publications each of which has at least n citations. The fact that this quantity is widely used indicates that h-index indeed reasonably adequately describes the researcher’s quality. So, this notion must capture some intuitive idea. However, the above definition is not intuitive at all, it sound like a somewhat convoluted mathematical exercise. So why is h-index so efficient? In this paper, we use known mathematical facts about h-index—in particular, the results of its fuzzy-related analysis—to come up with an intuitive explanation for the h-index’s efficiency.

1 Formulation of the Problem h-index is ubiquitous. At present, one of the main criteria used to evaluate the quality of a researcher is an h-index, a concept first proposed in [5] and then spread like wildfire. We will explain what it is in the next paragraph, but we just want to mention that while everyone seems to agree that it is an imperfect characteristic of a person’s research status, the h-index is what is cited in recommendation letters for promotion or for hiring, this is what is cited when nominating someone for awards, this is what is cited everywhere.

V. Kreinovich (B) · O. Kosheleva University of Texas at El Paso, El Paso, TX 79968, USA e-mail: [email protected] O. Kosheleva e-mail: [email protected] H. P. Nguyen Division Informatics, Math-Informatics Faculty, Thang Long University, Nghiem Xuan Yem Road, Hoang Mai District, Hanoi, Vietnam e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_6

61

62

V. Kreinovich et al.

What is h-index? The definition of an h-index is somewhat convoluted, actually so convoluted that people who are nor familiar with this notion—e.g., new students— get surprised to learn that such a non-intuitive notion is so ubiquitous. Specifically, an h-index is defined as the largest integer n such that a researcher has at least n publications each of which has at least n citations. Comment. It should be taken into account that the numerical value of an h-index depends on how we count citations. One possibility is to use only Web of Science citations—in which case we miss many citations in conference papers, and in computer science, many important results are published in peer-refereed archival conference proceedings. Another possibility is to use Google citations—in which case conference papers counts, but so are citations in less reputable journals. In both cases, however, the same general definition is used. But why h-index? One could think of many ways to gauge a researcher’s productivity and its effect. Why namely h-index—whose definition does not seem to be intuitive at all—has been so widely spread? The very fact that this index is widely used is an indication that it does reflect the researcher’s quality. So, a natural question is: why is this seemingly non-intuitive definition so efficient? What we do in this paper: general idea. In this paper, we provide an intuitive explanation for h-index. People (kind of) tried to answer this question. The notion of h-index has been actively studied from the mathematical and computational viewpoint. In particular, fuzzy researchers may be aware that almost immediately after h-index has been invented, a paper [13] showed that h-index is a particular case of Sugeno integral—a notion closely related to fuzzy logic (see, e.g., [2, 6, 8–11, 14]) and thus, very familiar in fuzzy community. Sugeno integral was first introduced in [12]; for a latest overview, see, e.g., [1]. Does this explain the ubiquity of h-index? Maybe for a few die-hard fans of Sugeno integral it does, but not for others – Sugeno integral is just one of the many different “fuzzy integrals”, different ways to combine different estimates into a single value. Even the paper [13] mentions that we could use other fuzzy integrals—and get other bibliometric characteristics. So the question remains: why h-index? What we do in this paper: some more details. In contrast to h-index, the Sugeno integral does have an intuitive understanding. So what we did is borrow this understanding and transform it into an understanding of why h-index is so ubiquitous and so efficient.

2 Our Explanation Analysis of the problem: towards an intuitive understanding of what is a influential researcher. The researcher’s output is usually his or her publications. Each publication contains some results. Some of these results are influential, some are not that

Why h-Index

63

influential. From the commonsense viewpoint, an influential researcher is a one who produces many influential results, i.e., in other words, a researcher who published a large number of influential papers. How can we gauge whether an idea described in a paper is influential? By definition, an idea is influential if it influences others and leads to other ideas—i.e., to new papers that use the original idea—and thus, that cite the original paper. So, a natural way to gauge how influential is a given paper is to consider how many other papers cite it. If a paper has a large number of citations, this means that this paper is influential. Substituting this (informal) definition of “influential paper” into the above (informal) definition of an influential researcher, we arrive at the following informal definition: an influential researcher is a researcher who published a large number of papers each of which has a large number of citations. How to formalize the above intuitive (informal) definition? The wording “a large number of” is informal, it means different things to different people, and a proper formal description of this logic would indeed require the use of fuzzy logic (or some other techniques appropriate for describing informal notions). The simplest formalization of this notion. Intuitively, the notion “the large number of” is imprecise. So, to get an adequate formalization of this notion, we should use techniques for formalizing such imprecise terms, such a fuzzy logic. However, for simplicity, let us see what happens if we use the simplest possible formalization of this notion: namely, we select some threshold value n 0 and then: • if we have n ≥ n 0 of items, we say that we have a large number of items, and • if we have n 0 < n items, then we say that do not have a large number of items. With respect to formalizing the notion of “an influential paper”—which we interpreted as “a paper with large number of citations”, we thus get the following formalization—which we will call n 0 -influential: a paper is n 0 -influential if it has at least n 0 citations. The resulting formalization of the notion of an influential researcher. The notion of n 0 -large number leads to the following formalization of the above intuitive notion of an influential researcher: a researcher is n 0 -influential if this researcher has published at least n 0 papers each of which has at least n 0 citations. Analysis of this notion leads to the desired explanation. If we have two possible thresholds n 0 > n 0 , then clearly each n 0 -influential researcher is also n 0 -influential. Thus, to properly gauge the quality of a researcher, it makes sense to consider the largest possible value n 0 for which this researcher is n 0 -influential. This largest number is the largest number n 0 such that the researcher has published at least n 0 papers each of which has at least n 0 citations. This is exactly the definition of the h-index. So, we have indeed come up with an intuitive explanation of the h-index.

64

V. Kreinovich et al.

3 What Next? There are many ways in which the above explanation can be used to improve the notion of the h-index. Let us list the two main ideas. A first natural idea is to replace a simplified formalization of the notion “a large number of” with a more adequate fuzzy notion, in which, for each n, we have a degree to which this n is large—i.e., to which extend this number n corresponds to the above notion. This will hopefully allow us to distinguish between the cases between which h-index does not distinguish: e.g., between the two researchers each of which has exactly one published paper but whose papers have different number of citations: 100 for the first paper, 1 for the second one. For both researchers, the h-index is the same— equal to 1, but clearly the first researcher is more influential. This idea may lead to some of the fuzzy modifications proposed in [13] or to yet another characteristics— depending on what “and”—and “or”—operations we use. Another natural idea is to take into account that not all citations are equal: a citation by an influential paper (which is itself highly cited) should be valued more than a citation by a paper which was not cited at all. There should be a weight, e.g., proportional to the number of citations of the citing paper. These citations should also be similarly weighted—as a result, we end up with the notion of an eigenvalue, similar to Google’s PageRank (see, e.g., [3, 4, 7]) or to the eigenvalues used to estimate the quality of a journal. So, the second idea is to replace the number of citations with eigenvalue in the definition of an h-index. Acknowledgements This work was supported in part by the National Science Foundation via grants 1623190 (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science) and HRD-1242122 (Cyber-ShARE Center of Excellence).

References 1. Beliakov, G., James, S., Wu, J.-Z.: Discrete Fuzzy Measures: Computational Aspects. Springer, Cham (2020) 2. Belohlavek, R., Dauben, J.W., Klir, G.J.: Fuzzy Logic and Mathematics: A Historical Perspective. Oxford University Press, New York (2017) 3. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998) 4. Gleich, D.F.: PageRank beyond the Web. SIAM Rev. 57(3), 321–363 (2015) 5. Hirsch, J.E.: An index to quantity an individual’s research output. Proc. NatL. Acad. Sci. U.S.A. 102(45), 16569–16572 (2005) 6. Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic. Prentice Hall, Upper Saddle River (1995) 7. Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2006) 8. Mendel, J.M.: Uncertain Rule-Based Fuzzy Systems: Introduction and New Directions. Springer, Cham (2017) 9. Nguyen, H.T., Kreinovich, V.: Nested intervals and sets: concepts, relations to fuzzy sets, and applications. In: Kearfott, R.B., Kreinovich, V. (eds.) Applications of Interval Computations, pp. 245–290. Kluwer, Dordrecht (1996)

Why h-Index

65

10. Nguyen, H.T., Walker, C., Walker, E.A.: A First Course in Fuzzy Logic. Chapman and Hall/CRC, Boca Raton (2019) 11. Novák, V., Perfilieva, I., Moˇckoˇr, J.: Mathematical Principles of Fuzzy Logic. Kluwer, Boston (1999) 12. Sugeno, M.: Theory of Fuzzy Integrals and Its Applications. Ph.D. Dissertation, Tokyo Institute of Technology, Tokyo, Japan (1974) 13. Torra, V., Narukawa, Y.: The h-index and the number of citations: two fuzzy integrals. IEEE Trans. Fuzzy Syst. 16(3), 795–797 (2008) 14. Zadeh, L.A.: Fuzzy sets. Inform. Control 8, 338–353 (1965)

Accuracy of Data Fusion: Interval (and Fuzzy) Case Christian Servin, Olga Kosheleva, and Vladik Kreinovich

Abstract The more information we have about a quantity, the more accurately we can estimate this quantity. In particular, if we have several estimates of the same quantity, we can fuse them into a single more accurate estimate. What is the accuracy of this estimate? The corresponding formulas are known for the case of probabilistic uncertainty. In this paper, we provide similar formulas for the cases of interval and fuzzy uncertainty.

1 Formulation of the Problem Need for Estimation. In the grand scheme of things, the main objectives of science and engineering are: to get a good understanding of the current state of the world and how this state will change, and to come up with recipes of how to make sure that this change will go in favorable directions. To describe the state of the world, we need to describe the numerical values of all the physical quantities. To describe the corresponding recommendations, we need to describe the numerical values of all the parameters of the corresponding designs and/or of the appropriate controls—and these values usually depend on the parameters describing the current state of the world. We can estimate the corresponding physical quantities directly, by measuring them or by relying on experts who provide the related estimates. If this is not possible, we can estimate the desired quantities indirectly, by measuring and/or estimating related C. Servin Computer Science and Information Technology Systems Department, El Paso Community College, 919 Hunter, El Paso, TX 79915, USA e-mail: [email protected] O. Kosheleva · V. Kreinovich (B) University of Texas at El Paso, El Paso, TX 79968, USA e-mail: [email protected] O. Kosheleva e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_7

67

68

C. Servin et al.

quantities and using the known relation between all these quantities to compute the corresponding estimates of the desired quantities; see, e.g., [11]. Need to Improve Accuracy and Need for Data Fusion. Sometimes, the available estimates already have the needed accuracy. However, in many other cases, even the measurements by state-of-the-art measuring instruments are not accurate enough. For example, in geophysics, we can use different techniques to estimate the density and other characteristics several kilometers beneath the Earth surface; however, in many cases, these estimates are not sufficient to predict the location of mineral deposits or to predict an earthquake. In such situations, a natural way to improve accuracy is to perform more estimations of the same quantity, and then to combine (“fuse”) the resulting estimates into a single more accurate one. How Accurate Is the Result of Data Fusion? A natural question is: how accurate is the result of the data fusion? The answer to this question is known for probabilistic uncertainty; we briefly describe this answer in Sect. 2. However, for important cases of interval and fuzzy uncertainty, no such general formulas have been known. In Sect. 3 of this paper, we show how to derive the corresponding formulas.

2 Accuracy of Data Fusion: Case of Probabilistic Uncertainty Towards Formulating the Problem in Precise Terms. Let us assume that we have n estimates x1 , . . . , xn of the same quantity x. Probabilistic uncertainty means that we know the probability distributions of the corresponding approximation errors def

Δxi = xi − x. We are interested in values xi obtained by state-of-the-art measurements— otherwise, there is no big need for fusion, we could simply use more accurate measuring instruments. State-of-the-art means, in particular, that all the usual ways to improve accuracy have already been applied. For example, if a measuring instrument has a bias, i.e., if the mean value of the approximation error is different from 0, then we can detect this bias if we calibrate the instrument—i.e., if we repeatedly compare its results with the results of measuring the same quantity by a stationary “standard” (maximally accurate) measuring instrument. Once the bias is known, we can simply subtract this bias from all the measurement results produced by this instrument. Thus, when we deal with state-ofthe-art measuring instruments, we can safely assume that the bias is 0, i.e., that the mean value of the estimation error is 0. Similarly, we can safely assume that all known major sources of possible measurement errors have been taken care of. For example, many accurate measurements

Accuracy of Data Fusion: Interval (and Fuzzy) Case

69

are affected by the periodic 50 or 60 Hz electromagnetic signals emitted by the usual electric outlets—so we can safely assume that these signals have been screened away. In general, we can safely assume that all reasonably major sources of measurement errors have been eliminated and thus, that the remaining estimate comes from the joint effect of a large number of small error components. It is known that in such situations, the resulting error distribution is close to Gaussian—this follows from the Central Limit Theorem (see, e.g., [12]), according to which, when the number N of such small components tends to infinity, the distribution of the sum of these components tends to Gaussian. So, we can safely assume that each estimation error Δxi is normally distributed. As we have mentioned, probabilistic uncertainty means that we know the probability distribution of each estimation error. A normal distribution is uniquely determined by its mean and its standard deviation. We know that the mean of Δxi is 0, so knowing the distribution simply means that we know the standard deviation σi of the corresponding estimate. It is also reasonable to assume that measurement errors corresponding to different measurements are independent. In these terms, the problem has the following form. Formulation of the Problem • We have n estimates x1 , . . . , xn of the same quantity y; • we know that for each i, the measurement error xi − x is normally distributed with 0 mean and known standard deviation σi ; • we also know that the measurement errors corresponding to different distributions are independent. Based on this information, we would like to find a combined estimate  x and estimate how accurate is this estimate. Fusing Measurement Results: Derivation of the Formula. Since each measurement error xi − x is normally distributed, the corresponding probability density function (pdf) ρi (x) has the form 

(xi − x)2 · exp − ρi (x) = √ 2σi2 2π · σi 1

 .

(1)

Since the measurement errors corresponding to different measurements are independent, the overall pdf is equal to the product of the corresponding probability densities: ρ(x) =

n  i=1

ρi (x) =

n   i=1

  (xi − x)2 . · exp − √ 2σi2 2π · σi 1

(2)

As the desired fused estimate  x , it is reasonable to select the most probable value x, i.e., the value for which the expression (2) attains its largest possible value; this idea is known as Maximum Likelihood Method.

70

C. Servin et al.

This maximization problem can be simplified if we take into account that the function f (z) = − ln(z) is strictly decreasing. Thus, maximizing the expression (2) is equivalent to minimizing its negative logarithm − ln(ρ(x)) = const +

n  (xi − x)2 , 2σi2 i=1

(3)

where const denotes terms that do not depend on x. Differentiating the expression (3) with respect to the unknown x and equating the derivative to 0, we conclude that n  x − xi = 0, σi2 i=1

(4)

i.e., equivalently, that x·

n 

σ −2 =

n 

i=1

xi · σi−2 .

(5)

i=1

Thus, we arrive at the following formula: Fusion in the Probabilistic Case: Final Formula. When we fuse measurement results x1 , . . . , xn measured with accuracies σ1 , . . . , σn , we get the estimate n 

 x=

xi i=1 n  i=1

· σi−2 σi−2

.

(6)

How Accurate Is the Fused Estimate? The probability distribution for different values x is given by the formula (2), i.e., equivalently, by the formula ⎛ ⎜ ρ(x) = ⎜ ⎝√



1 2π ·

n

· exp − σi

n  i=1



(xi − x)2 ⎟ ⎟. ⎠ 2σi2

(7)

i=1

One can easily see that the expression under the exponent is a quadratic function of x, and thus, the distribution for x is also Gaussian, i.e., has the form   ( x − x)2 ρ(x) = √ . · exp − 2σ 2 2π · σ 1

(8)

Accuracy of Data Fusion: Interval (and Fuzzy) Case

71

Comparing the coefficients for x 2 under the exponential function in expressions (7) and (8), we conclude that n  1 1 = 2, 2 σ σ i=1 i

(9)

i.e., that σ2 =

1 n 

1 2 σ i=1 i

.

(10)

In particular, for n = 2, we get σ2 =

1 1 1 + 2 2 σ1 σ2

=

σ12 · σ22 . σ12 + σ22

(11)

Another important case is when all the measurements have the same accuracy, i.e., 1 n σ2 when σ1 = . . . = σn . In this case, according to (10), we have 2 = 2 , so σ 2 = 1 σ n σ1 σ1 and σ = √ . n

3 Case of Interval Uncertainty Need for Interval Uncertainty. The traditional probabilistic approach to measurement processing is based on the assumption that we know the probability distribution def of measurement errors Δxi = xi − x. Usually, this distribution is obtained by calibrating the measuring instrument, i.e., by comparing the values xi measured by this instrument with the values measured by a much more accurate (“standard”) instrument—whose measurement results are so much more accurate that we can safely ignore the difference between these results and the actual values x of the corresponding quantity. However, there are important cases when calibration is not done. The first such case is the case of state-of-the-art measurements, when we use the most accurate measuring instruments. In this case, there are simply no more-accurate instruments which can be used for calibration, so calibration is not possible. At best, we can provide an upper bound Δi on the corresponding measurement error Δxi = xi − x: |Δxi | ≤ Δi . In this case, once we know the measurement result xi , the only information that can conclude about the actual value x is that this value belongs to the interval [xi − Δi , xi + Δi ].

72

C. Servin et al.

Another case is measurements on the shop floor, during the manufacturing process. In this case, theoretically, we could calibrate every single sensor, every single measuring instrument. However, calibration is a rather expensive procedure—since it involves the use of a complex standard measuring instrument. As a result, in manufacturing, such calibration is not done, and often, the only information that we have about a measuring instrument is the upper bound on its measurement error. (And if we do not even know any such upper bound, then this is not a measuring instrument at all—since the actual value can be anything, no matter what the instrument shows.) How to Fuse Measurement Results in Case of Interval Uncertainty. Based on each measurement in which we get the value xi with accuracy Δi , we conclude that the actual value x belongs to the interval [xi − Δi , xi + Δi ]; see, e.g., [2, 5, 7, 11]. Thus, if we have n such measurement results, we can conclude that x belongs to the intersection of all these n intervals, i.e., to the interval [x, x] =

  n  [xi − Δi , xi + Δi ] = max(xi − Δi ), min(xi + Δi ) . i=1

i

i

(12)

We Need to Estimate Average Accuracy. Each value xi is the result of measuring the desired quantity x with accuracy Δi . Thus, each value xi can take any value from the interval [x − Δi , x + Δi ]. For the same measurement errors of two measurements, we can get different accuracies of the fusion result. For example, if we fuse the results of two measurements performed with the same accuracy Δ, then we can have two extreme situations: • We can have the exact same measurement result in both cases x1 = x2 . In this case, the corresponding intervals are the same, and their intersection is the exact same interval [x1 − Δ, x1 + Δ]. Thus, in this case, fusion does not improve the accuracy at all. • On the other hand, we may have x1 = x + Δ and x2 = x − Δ. In this case, based on the first measurement result, we conclude that the actual (unknown) value x belongs to the interval [x1 − Δ, x1 + Δ] = [x, x + 2Δ], and based on the second measurement result, we conclude that the actual (unknown) value x belongs to the interval [x2 − Δ, x2 + Δ] = [x − 2Δ, x]. The intersection of these two intervals is a single point x—i.e., in this case, by fusing two measurement result, we get the exact value of the measured quantity. Since the accuracy of the fused result depends on the actual measurement results, the only thing that we can estimate is the average value of the corresponding estimation error. What Probability Distribution Should We Use? For each measurement, all we know about the measurement error is that it is located somewhere between −Δi and Δi . There is no reason to believe that some values from this interval [−Δi , Δi ] are more probable than others. Thus, it is reasonable to conclude that all these values

Accuracy of Data Fusion: Interval (and Fuzzy) Case

73

should have the exact same probability—i.e., that the measurement error should be uniformly distributed on the corresponding interval. This natural conclusion is known as Laplace Indeterminacy Principle; see, e.g., [3]. Similarly, if we have n measurements, then all we know about n corresponding measurement errors Δx1 , . . . , Δxn is that the corresponding vector (Δx1 , . . . , Δxn ) is located somewhere in the box [−Δ1 , Δ1 ] × . . . × [−Δn , Δn ]. Thus, it is reasonable to conclude that we have a uniform distribution on this box— i.e., in other words, that all measurement errors are independent random variables. What Is Known and What We Will Do in This Paper. It is known (see, e.g., [13]) that if we fuse several interval estimates with the same accuracy Δ1 = . . . = Δn , then Δ1 . the average accuracy of the fused result is, for large n, asymptotically equal to n We can see that, in this case, the average measurement error decreases more than √ in the probabilistic case, when the average measurement error decreases as 1/ n (much slower). In this paper, we consider the general case of possibly different accuracies Δi . Case of Two Fused Measurements n = 2: Analysis of the Problem. Let us start with the simplest case when we fuse two measurement results x1 and x2 , measured with accuracies, correspondingly, Δ1 and Δ2 . Without losing generality, let us assume that the measurements are numbered from the most accurate one to the least accurate one, i.e., in this case, that Δ1 ≤ Δ2 . According to the formula (12), based on the results of these two measurements, we can compute the following upper bound u on the actual value x of the measured quantity u = min(x1 + Δ1 , x2 + Δ2 ).

(13)

The accuracy of the measurement can be described by the difference u − x between this upper bound and the actual (unknown) value x. We are interested in the average (expected) value Δ of the difference u − x. Due to symmetry with respect to the change of x → −x that swaps lower and upper bounds, we will get the exact same average value for the difference between x and the lower bound  = max(x1 − Δ1 , x2 − Δ2 ). Thus, it is sufficient to compute the expected value Δ of the upper-bound difference u − x. To compute this average value, let us find the probability distribution of the difdef ference Δx = u − x. Here, for both i, we have xi = x + Δxi , thus, u = min(x + Δx1 + Δ1 , x + Δx2 + Δ2 ) = x + min(Δx1 + Δ1 , Δx2 + Δ2 ).

(14)

74

C. Servin et al.

Thus, Δx = u − x = min(Δx1 + Δ1 , Δx2 + Δ2 ).

(15)

Let us compute, for each real number z, the probability Prob(z ≤ Δx) = Prob(z ≤ min(Δx1 + Δ1 , Δx2 + Δ2 )).

(16)

It is easy to see that z is smaller than the minimum of the two numbers if and only it is smaller than both of them, so Prob(z ≤ Δx) = Prob(z ≤ Δx1 + Δ1 & z ≤ Δx2 + Δ2 ).

(17)

Let us rewrite each of the resulting inequalities so that it will have Δxi on one side on the corresponding inequality and all other terms on the other side. Then, we get Prob(z ≤ Δx) = Prob(Δx1 ≥ z − Δ1 & Δx2 ≥ z − Δ2 ).

(18)

Since we assumed that the measurement errors Δx1 and Δx2 are independent, the probability that both inequalities (related to both measurement errors) hold is simply equal to the product of the probabilities corresponding to individual measurement errors: Prob(z ≤ Δx) = Prob(Δx1 ≥ z − Δ1 ) · Prob(Δx2 ≥ z − Δ2 ).

(19)

Each of the measurement errors Δxi is uniformly distributed on the corresponding interval [−Δi , Δi ] of width 2Δi . Uniform distribution means that the probability to be on each subinterval is proportional to the width of this subinterval—to be precise, it is equal to the ratio of the width of the subinterval to the width of the original interval. For each threshold z i , the inequality Δxi ≥ z i is satisfied on the subinterval [z i , Δi ] of width Δi − z i . Thus, the probability that this inequality is satisfied is Δi − z i . In particular, for z i = z − Δi , we get equal to the ratio 2Δi Prob(Δxi ≥ z − Δi ) =

Δi − (z − Δi ) 2Δi − z = . 2Δi 2Δi

(20)

Thus, based on the formula (19), we have Prob(z ≤ Δx) =

2Δ1 − z 2Δ2 − z (2Δ1 − z) · (2Δ2 − z) · = . 2Δ1 2Δ2 4Δ1 · Δ2

(21)

Accuracy of Data Fusion: Interval (and Fuzzy) Case

75

So, the cumulative distribution function (cdf) F(z) = Prob(Δx ≤ z) is equal to F(z) = 1 − Prob(z ≤ Δx) = 1 −

(2Δ1 − z) · (2Δ2 − z) . 4Δ1 · Δ2

(22)

The corresponding probability density function ρ(z) can be obtained if we differentiate the cdf: ρ(z) =

(2Δ1 − z) + (2Δ1 − z) d F(z) 2Δ1 + 2Δ2 − 2z Δ1 + Δ2 − z = = = . dz 4Δ1 · Δ2 4Δ1 · Δ2 2Δ1 · Δ2

(23)

The difference Δx = u − x is always greater than or equal to 0—since u is the upper bound for x. This difference cannot exceed the value 2Δ1 corresponding to the worst-possible case of the first measurement when x1 = x + Δ1 . Thus, the difference Δx takes all possible value from the interval [0, 2Δ1 ]. The average (expected) value Δ of this difference can thus be computed as  2Δ 1

1 z · ρ(z) dz = · 2Δ · Δ2 0 1  2 z 1 · = · (Δ1 + Δ2 ) − 2Δ1 · Δ2 2

 2Δ 1

[z · (Δ1 + Δ2 ) − z 2 ] dz   4Δ21 8Δ31 1 = · · (Δ1 + Δ2 ) − 3 2Δ1 · Δ2 2 3 0  2Δ2 · Δ2 − 2 · Δ3  1 1 8 1 Δ2 1 3 · 2Δ31 + 2Δ21 · Δ2 − · Δ31 = = Δ1 − · 1 . = 2Δ1 · Δ2 3 2Δ1 · Δ2 3 Δ2

Δ=

0  2Δ1 z3

(24)

Thus, we arrive at the following conclusion. Case of Two Fused Measurements n = 2: Conclusion. When we fuse two measurements with interval uncertainties Δ1 ≤ Δ2 , then the average accuracy Δ of the fused result is Δ = Δ1 −

1 Δ21 · . 3 Δ2

(25)

In particular, when Δ2 → ∞, we get Δ → Δ1 , which makes perfect sense: very inaccurate measurements do not add any information, so accuracy is not improved. 2 When Δ1 = Δ2 , we get Δ = · Δ1 . In other words, the average inaccuracy 3 decreases by a factor of 1.5—as √ opposed to the probabilistic case, when it only decreases by a smaller factor of 2 ≈ 1.41. Let us show that for n = 2, interval uncertainty always leads to a larger decrease of the average measurement error. For n = 2, the Result of Interval Fusion Is, on Average, More Accurate, than the Result of the Probabilistic Fusion: A Proof. Let us start with the same accuracy values Δ1 = σ1 and Δ2 = σ2 , and let us compare the results of applying the formulas (11) and (25). We want to prove that the result of applying the formula (25) is always smaller, i.e., that

76

C. Servin et al.

1 Δ2 Δ1 − · 1 < 3 Δ2



Δ21 · Δ22 Δ1 · Δ2 = . Δ21 + Δ22 Δ21 + Δ22

(26)

If we divide both sides of this inequality by Δ1 and express both sides in terms of def Δ2 ≥ 1, then the inequality (26) gets the following equivalent form the ratio r = Δ1 1− Multiplying both sides by r and by

1 r . 0. 3 9 3 9

(31)

Let us prove that this inequality holds for all r . Indeed, this inequality clearly holds for r = 1: then, the left-hand side is equal to 2 10 2 1 6 − 10 + 6 − 1 1 − + − = = > 0. 3 9 3 9 9 9

(32)

Let us prove that the left-hand side of (31) is increasing and thus, it is positive for all r > 1 as well. Indeed, the derivative of this left-hand side is equal to 2r 2 −

2 20 r+ . 9 3

(33)

Accuracy of Data Fusion: Interval (and Fuzzy) Case

77

The discriminant of this quadratic equation is equal to 

 20 2 400 16 400 − 16 · 27 400 − 432 32 2 − = = =− < 0. −4·2· = 9 3 81 3 81 81 81

(34)

So, the quadratic expression (33) for the derivative is always non-negative. Thus, the inequality (31) holds for all r ≥ 1—and since it is equivalent to the desired inequality (26), the desired inequality also holds always. The statement is proven. Case When All the Measurements Are Equally Accurate Δ1 = . . . = Δn . In this case, similar to the case n = 2, we conclude that F(z) = 1 −

2Δn − z 2Δ1 − z · ... · =1− 2Δ1 2Δn



2Δ1 − z 2Δ1

def

Here, z can take any value from 0 to 2Δ1 , so the ratio y = the interval [0, 1]. In terms of y, we have

n .

(35)

z takes values from 2Δ1

F(y) = 1 − (1 − y)n ,

(36)

so ρ(y) =

d F(y) = n · (1 − y)n−1 . dy

(37)

Thus, the average value E[y] of y is equal to  E[y] =

1

n · (1 − y)n−1 · y dy.

(38)

0

Introducing an auxiliary variable u = 1 − y for which y = 1 − u and for which u also changes between 0 and 1, we get  n 1  1 u u n+1 (1 − u) · u n−1 du = n · (u n−1 − u n ) du = n · − n n+1 0 0 0   (n + 1) − n 1 1 1 1 =n· =n· − =n· = . n n+1 n · (n + 1) n · (n + 1) n+1 

E[y] = n ·

1

(39)

Thus, for the accuracy Δ = r · (2Δ1 ) of the fusion result, we get Δ=

2 · Δ1 . n+1

(40)

For the case of n = 2, this is exactly what we got based on our general formula.

78

C. Servin et al.

General Case. In the general case, one can actually also get an explicit formula for each n: indeed, here, we have, for Δ 1 ≤ . . . ≤ Δn ,

(41)

that for the values z ∈ [0, 2Δ1 ], we get the following cumulative distribution function: F(z) = 1 −

2Δn − z 2Δ1 − z · ... · . 2Δ1 2Δn

(42)

The right-hand side of this formula is a polynomial. Thus, by differentiation, we can d F(z) and thus, an get an explicit polynomial formula for the derivative ρ(z) = dz explicit polynomial formula for the resulting value  Δ=

2Δ1

ρ(z) · z dz.

(43)

0

Case of Fuzzy Estimates. It is known that in fuzzy logic (see, e.g. [1, 4, 6, 8– 10, 14]), the usual way of processing fuzzy estimates—by using Zadeh’s extension principle—is equivalent to processing α-cut intervals for all α ∈ [0, 1]; see, e.g., [1, 4, 6, 8–10]. Thus, the above formulas can be applied to the fuzzy case as well. Acknowledgements This work was supported in part by the National Science Foundation via grants 1623190 (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science) and HRD-1242122 (Cyber-ShARE Center of Excellence).

References 1. Belohlavek, R., Dauben, J.W., Klir, G.J.: Fuzzy Logic and Mathematics: A Historical Perspective. Oxford University Press, New York (2017) 2. Jaulin, L., Kiefer, M., Didrit, O., Walter, E.: Applied Interval Analysis, with Examples in Parameter and State Estimation, Robust Control, and Robotics. Springer, London (2001) 3. Jaynes, E.T., Bretthorst, G.L.: Probability Theory: The Logic of Science. Cambridge University Press, Cambridge (2003) 4. Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic. Prentice Hall, Upper Saddle River (1995) 5. Mayer, G.: Interval Analysis and Automatic Result Verification. De Gruyter, Berlin (2017) 6. Mendel, J.M.: Uncertain Rule-Based Fuzzy Systems: Introduction and New Directions. Springer, Cham (2017) 7. Moore, R.E., Kearfott, R.B., Cloud, M.J.: Introduction to Interval Analysis. SIAM, Philadelphia (2009) 8. Nguyen, H.T., Kreinovich, V.: Nested intervals and sets: concepts, relations to fuzzy sets, and applications. In: Kearfott, R.B., Kreinovich, V. (eds.) Applications of Interval Computations, pp. 245–290. Kluwer, Dordrecht (1996)

Accuracy of Data Fusion: Interval (and Fuzzy) Case

79

9. Nguyen, H.T., Walker, C., Walker, E.A.: A First Course in Fuzzy Logic. Chapman and Hall/CRC, Boca Raton (2019) 10. Novák, V., Perfilieva, I., Moˇckoˇr, J.: Mathematical Principles of Fuzzy Logic. Kluwer, Boston, Dordrecht (1999) 11. Rabinovich, S.G.: Measurement Errors and Uncertainties: Theory and Practice. Springer, New York (2005) 12. Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. Chapman and Hall/CRC, Boca Raton (2011) 13. Walster, G.W., Kreinovich, V.: For unknown-but-bounded errors, interval estimates are often better than averaging. ACM SIGNUM Newsl. 31(2), 6–19 (1996) 14. Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)

Imputing Missing Values: Reinforcement Bayesian Regression and Random Forest Shahriar Shakir Sumit, Junzo Watada, Fatema Nasrin, Nafiz Ishtiaque Ahmed, and D. R. A. Rambli

Abstract Imputing missing data plays a pivotal role in minimizing the biases of knowledge in computational data. The principal purpose of this paper is to establish a better approach to dealing with missing data. Clinical data often contain erroneous data, which cause major drawbacks for analysis. In this paper, we present a new dynamic approach for managing missing data in biomedical databases in order to improve overall modeling accuracy. We propose a reinforcement Bayesian regression model. Furthermore; we compare the Bayesian Regression and the random forest dynamically under a reinforcement approach to minimize the ambiguity of knowledge. Our result indicates that the imputation method of random forest scores better than the Bayesian regression in several cases. At best the reinforcement Bayesian regression scores over 85% under range condition of 5% missing data. The reinforcement Bayesian regression performs over 70% accuracy for imputing missing medical data in overall condition. However; the proposed reinforcement Bayesian regression models imputed missing data on over 70% cases are exactly identical to the missing value, which is remarkably making the advantage of the study. This approach significantly improves the accuracy of imputing missing data for clinical research. Keywords Bioinformatics · Missing data · Imputation methods · Reinforcement approach · MCAR · Machine Learning · Clinical data. S. S. Sumit · J. Watada (B) · D. R. A. Rambli Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, 32610 Perak, Malaysia e-mail: [email protected] S. S. Sumit e-mail: [email protected] D. R. A. Rambli e-mail: [email protected] F. Nasrin · N. I. Ahmed Department of Computer Science, American International University-Bangladesh, Dhaka 1229, Bangladesh e-mail: [email protected] N. I. Ahmed e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_8

81

82

S. S. Sumit et al.

1 Introduction Scientific computational research always faces a common problem to deal with missing data [1]. Missing data include mishandling data samples, absence of data, nonresponsive data, deleted data etc. Furthermore, missing data problems occur when sensor failure, non-response data, and unsaved data. Imputing missing data on research implies multiple approaches that depend on several situations. Relying on the messiness mechanism of data; three situations are most appropriate [2]. First; partially random missing data that depends on specific data domain. Generic mining algorithms can resolve this kind of problem. Second; missing of data that is not entirely at random, like the step to step predicting on playing a game of chess. Finally; completely random missing data, which is indeed the most costing problem in the real situation. Missing values are a common manifestation in several real-world databases. Those numerous statistical methods have been developed to address that problem, referred to as missing data imputation [3]. Missing data are increasing biases in data analysis. Missing data are common in every research works, and it can produce a significant influence on the research results. They can affect the resulted knowledge of data analysis by resulting in statically wrong variance and means. The current procedure depends on implying only a single algorithm that depends on missing data. Again imputting faulty values on different domain shows robust accuracy to different percentages of missing values contained in a dataset [4]. Lacking an adaptive and dynamic approach to mimic the missing data causes for inaccurate information analysis in clinical research. The research aim of this paper is to offer a new method to input clinical missing data with a high dence level works and provide a significant accuracy for clinical research works. Furthermore, missing data are a severe issue in every field of research. It is not appropriate to analyze anything without obtaining complete data and information. In clinical data analysis, missing data may arise troubles of treatment for various diseases. On the other hand, this issue can generate difficulties for patients, doctors, management, and it should restrict further processing of any big step for clinical observations. Missing data are classified by three sections which are MAR (Missing at Random), MCAR (Missing Completely at Random), and NMAR (Not Missing at Random). Here, the ambition of this research is working with MCAR data [5]. The remaining of the paper consists of the following sections: Sect. 2 is to explain the proposed methodology, Sect. 3 is to show the experiments result and discuss them. Finally Sect. 4 draws the conclusion of the paper.

2 Methods Random forest is a mixture of random tree models. Individual tree depends on the value of the random vector, which is dealing out for all trees in the forest. The random forest runs efficiently on an extensive database. It produces immensely precise

Imputing Missing Values: Reinforcement Bayesian Regression and Random Forest

83

predictions. The random forest can maintain accuracy even when a large quantity of data is missing [6]. In the experimental period, The random forest can decide many determinations and most of the selections are the ultimate decision from the random forest tree. The random forest can handle a significant amount of data without deducting any data. It is fast to clean data and find a faster result for missing data, and is able to handle data without any change and modification [7]. Moreover, in analyzing extensive medical data, from the prominent dataset faces a high chance of missing some data. Avoiding this complication; the random forest algorithm can mitigate the complexity and accuracy issues [8]. For calculating missing data, the random forest works with the decision tree. Here, the decision tree assumes a value for a missing data set, and it provides the most appropriate decision from the tree. Each tree from the random forest depends on the set of predictor values that are chosen independently, and this is the same for all distribution trees. The most elegant size of the predictor variable is log2 N + 1, where N represents the number of inputs. For regression, the random forests works as a group of simple trees, each tree can produce a numerical response value, and the predictor data set is randomly selected from the same tree for all distribution. The random forest averages all of the prediction trees is given below in Eq. 1. T 1  T th (1) R= T T −1

Fig. 1 Hieratical Bayesian Regression

84

S. S. Sumit et al.

Evaluation (no unit)

S2

RF Ev aluation (no unit)

BR

C y c le (n o u n it)

Evaluation (no unit)

S3

Evaluation (no unit)

C y c le (n o u n it)

Cycle (no unit)

Evaluation (no unit)

S5

Evaluation (no unit)

Cycle (no unit)

Cycle (no unit)

Cycle (no unit)

Note that plotted data are normalized between 0 to 1, where green color wavelet indicates original data, and the red color wavelet indicates imputed data.

Fig. 2 Plotted figures of the experiment: Random Forest and reinforcement Bayesian Regression models imputation accuracy for Blood scrum S2, S3, and S5. indicates the imputed value.

where T goes all over the individual trees in the forest, R is Random Forest prediction, T stands for individual Tree and T th stands for tree responses. The Bayesian regression can easily be adapted to cope with missing data in partially observed events. For missing computational data, The Bayesian regression is well suited for constructing a complex model by connecting minor sub-models into a comprehensible combined pattern for the complete data [9]. The Bayesian regression analyzes different choices before distributing and converting assumption about the missing data. The mechanism of Bayesian regression and random forest [10] uses regression in imputing missing data. The random forest performs so as to find missing data by predicting data and pick the highest standard value from the data set. Likewise, the Bayesian regression works with ecological data. Both the approaches are imputing missing data for generating more accurate information. Here, considering the performance of the two approaches, a regression of diabetes data set is used from Annals of Statistics [11]. The data set provides ten baseline variables of n = 442 diabetes

Imputing Missing Values: Reinforcement Bayesian Regression and Random Forest Table 1 R square score Type of variables Age Sex BI BP S1 S2 S3 S4 S5 S6

R square (RF)

R square (BR)

0.8629 0.0210 0.5207 0.2871 0.4532 0.9585 0.9348 0.8629 0.8886 0.1304

0.7431 0.0397 0.3429 0.3429 0.0534 0.8439 0.8534 0.7431 0.8091 0.0676

85

patients. The baseline includes six blood serum S1, S2, S3, S4, S5, and S6. Patient’s general information is like average blood pressure, body mass index, age, and sex also measured with the test. As the response of interest, a quantifiable portion of disease development is obtained one year after baseline. For the study, we have used six blood serums, average blood pressure, body mass index, age, and sex of 442 patients. The Reinforcement approach has been introduced into the Bayesian regression and random forest in this study. For both the Bayesian regression and random forest, the reinforcement approach was applied to the unknown or missing values. First, 95% data of the data set is randomly selected to train the model as a learner data set. Then the learner data set is utilized for training both the missing data imputation methods. According to the regression of P-value and t-value in the Bayesian regression, an instance of learning data is split from the data set. This mechanism avoids the null hypothesis adaptively. For every iteration, the Bayesian regression compares individual missing value row with all data, and as a reward, it only counts high scored P-value columns data and discards lower P-value scored column from the imputation model. Figure 1 gives more details of the approach. The reinforcement approach as showing better accuracy for learning from the data model of the Bayesian regression. When the model is ready, the remaining 5% of random data is considered as missing data. Then both the models, Bayesian regression and random forest predict the missing data utilizing the leaning data set. The original data set is then used to score both the missing data imputation models according to the confidence level. Here we use a 95% confidence level model. That means 5% of the data is missing from the dataset. The method is applied to impute missing data which are missing completely at random (MCAR). MCAR is the most challenging situation for missing data mechanisms.

86

S. S. Sumit et al.

3 Result and Discussion Missing data causes a significant amount of bias in clinical research. Besides, it makes the analysis more arduous and decreases analysis efficiency. The random forest is considered the most effective over imputing missing data where the Bayesian regression is mostly practical in ecological data [12]. However, both methods are applied dynamically in the study to impute the missing data of the diabetes data set. All data set’s regression values are not the same according to data quality [13]. Both the Bayesian regression and random forest models were tuned repeatedly to adopt with the state of the data set to maximize the expected likelihood. The Bayesian regression reacts more positively to the reinforcement approach. After several simultaneous The reinforcement approaches, the most stable value of R square acquired from both the Bayesian regression (BR) and random forest (RF) as shown in Table 1. The random forest scores are slightly higher than the Bayesian regression in most

Evaluation (no unit)

AGE

RF

Evaluation (no unit)

BR

Cycle (no unit) Evaluation (no unit)

BP

Evaluation (no unit)

Cycle (no unit)

Cycle (no unit) Evaluation (no unit)

S4

Evaluation (no unit)

Cycle (no unit)

Cycle (no unit)

Cycle (no unit)

Note that plotted data are normalized between 0 to 1, where green color wavelet indicates original data, and the red color wavelet indicates imputed data.

Fig. 3 Plotted figures of experiments: Random Forest and reinforcement Bayesian Regression models imputation accuracy for Age, Blood pressure and Blood scrum S4.

Imputing Missing Values: Reinforcement Bayesian Regression and Random Forest

87

cases. However, in particular cases, the Bayesian regression made a more favorable score. The random forest scored over 85% in five cases; where the Bayesian regression scored over 70% on those five cases. The Bayesian regression intersects the random forest score on two cases but also in two cases it lacked much than the random forest’s score. Both the models perform better at imputing blood serum S1, S2, and S3 as shown in Fig. 2. These data are more dependent on additional specialties. From Fig. 2’s result, we can perceive that the random forest model is more pretended to make a baseline and attempts to follow the baseline. It has occurred since it traversed the whole tree of its repetition and focused on obtaining the most competent comparable pathway. Consequently, all of Random Forest’s imputed values were not identical to those missing data though they were adjacent to the missing values. Surprisingly our reinforcement Bayesian regression’s imputed values were accurately classified as the identical missing values. Though in several cases, the Bayesian regression’s imputed values were not adjacent to the precise missing value. This circumstance happens when the model does not receive significantly higher p-value scored column, which is not relevant to the missing value row. Moreover, the Bayesian regression did not create a baseline pathway. Blood serum S4 made a sharp observation that imputes precise values for the missing values. Though the R square score indicates that the random forest scored higher than the Bayesian regression but predicts absolute accuracy in many cases as shown in Fig. 3. Furthermore; using traditional and convenient method imputing the identical missing value has never acquired. So far predicting the missing value with absolute accuracy is still a challenging issue. Again the same model does not make good accuracy in fault diagnosis, pipeline diagnosis, and in the biomedical field [14]. We have given careful attention to nominal data [15]. Having that flash-point; the study provides a functional observation of those two missing data imputation methods. Our result also explains that the reinforcement Bayesian regression can make preciously identical value for imputing missing data on interrelated depended data set as clinical data.

4 Conclusion The impact of missing data is significant for any study and research. Through this study, we have revealed the influence of data missing and the imputation of missing data. We have also disclosed how to mitigate missing data and the best possible way to deal with missing data. The study performed a neutral comparison between both the Bayesian regression and random forest in a dynamic reinforcement approach on the diabetes data set bearing an MCAR situation. For the random forest and Bayesian regression, both the methods were applied to Missing Completely Random data environment and have found the accurate output, which is challenging for the imputation method and regression to validate the result from the data set. Neurological research

88

S. S. Sumit et al.

can be benefited to recover accurately missing data using this model. Some cell of the brain does not work or dead in neural disorder patient which is the part of missing data. As further research, we will focus on neurological data to input dead neurons data. Conflicts of Interests The authors declare that there is no conflict of interest in relation to this work.

References 1. Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biometrics Biostatistics 6(1), 1–6 (2015) 2. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics. Wiley, New York (2002) 3. Watada, J., Shi, C., Yabuuchi, Y., Yusof, R., Sahri, Z.: A rough set approach to data imputation and its application to a dissolved gas analysis dataset. In: 2016 Third International Conference on Computing Measurement Control and Sensor Network, pp. 24–27 (2016) 4. Sahri, Z., Yusof, R., Watada, J.: FINNIM: iterative imputation of missing values in dissolved gas analysis dataset. IEEE Trans. Ind. Informatics 10(4), 2093–2102 (2014) 5. Bennett, Derrick A.: How can I deal with missing data in my study? Australian New Zealand J. Public Health 25(5), 464–469 (2001) 6. Breiman, Leo: Random forests. Mach. Learn. 45(1), 5–32 (2001) 7. Pantanowitz, A., Marwala, T.: Evaluating the impact of missing data imputation through the use of the random forest algorithm. arXiv:0812.2412 (2008) 8. Saravana, R.: Medical big data classification using a combination of random forest classifier and k-means clustering. Int. J. Intell. Syst. Appl. (IJISA) 10(11), 11–19 (2018) 9. Mason, Alexina, Richardson, Sylvia, Plewis, Ian, Best, Nicky: Strategy for modelling nonrandom missing data mechanisms in observational studies using Bayesian methods. J. Official Stat. 28(2), 279–302 (2012) 10. Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002) 11. Efron, B., HatieE, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004) 12. Studies, T.E.: Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81(11), 3178–3192 (2000) 13. de la Fuente, Angel, Doménech, Rafael: Human capital in growth regressions: how much difference does data quality make? An update and further results. J. Eur. Econ. Assoc 4, 1–36 (2006) 14. State, T.P.: Toward best practices in analyzing datasets with missing data: comparisons and recommendations. J. Marriage Fam. 73(October), 926–945 (2011) 15. Taylor, P., Horton, N.J., Kleinman, K.P., Horton, N.J., Kleinman, K.P.: Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am. Stat. 61(1), 79–90 (2007)

From Machine Learning to Knowledge-Based Decision Support—A Predictive-Model-MarkupLanguage-to-Arden-Syntax Transformer for Decision Trees Julia Zeckl, Matthias Wastian, Dominik Brunmeir, Andrea Rappelsberger, Sergei B. Arseniev, and Klaus-Peter Adlassnig Abstract Arden Syntax is an HL7 International standard for the representation and execution of clinical knowledge in knowledge-based clinical decision support (CDS) systems. The predictive model markup language (PMML) specifies a file format for the representation and exchange of statistical and data mining models. To use those machine-learned models in Arden-Syntax-based CDS systems, the PMML files have to be transformed into an Arden Syntax representation. A PMML-to-Arden-Syntax transformer was created to process PMML structures and generate the Arden Syntax code. It employs the extensible stylesheet language transformation to create Arden Syntax medical logic modules (MLMs) out of PMML files. The transformer may create multiple MLM files from a single PMML. Currently the transformer is able to transform decisions tree models only. Its transforming capabilities may be extended to additional models available in PMML format in the future. This approach generated a new way of creating MLMs based on machine learning results, in addition to the traditional method of knowledge design with clinical experts.

J. Zeckl · K.-P. Adlassnig (B) Medexter Healthcare GmbH, Borschkegasse 7/5, 1090 Vienna, Austria e-mail: [email protected] M. Wastian · D. Brunmeir dwh GmbH, Neustiftgasse 57-59, 1070 Vienna, Austria A. Rappelsberger · K.-P. Adlassnig Section for Artificial Intelligence and Decision Support, Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria S. B. Arseniev Clinical and Research Institute for Emergency Pediatric Surgery and Trauma, Bolshaya Polyanka 22, 119180 Moscow, Russian Federation © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_9

89

90

J. Zeckl et al.

1 Introduction Medical decisions are usually based on a combination of clinical knowledge and experience, results of medical research, and personal judgement [1, 2]. The increasing numbers of solved clinical cases will facilitate medical decisions and probably also render them more reliable. Clinical decision-making is becoming increasingly complex because of the consistently growing quantity of data that needs to be taken into account. Moreover, time slots for making decisions are becoming tighter. Thus, clinical decision-making can and will be assisted by clinical decision support (CDS) systems [3–6]. Arden Syntax [7–9] is a medical knowledge representation and processing standard for CDS system building offered by Health Level Seven International [10]. It defines the way clinical and scientific knowledge can be represented, computerized, and processed. Arden Syntax was first approved by the American Society for Testing and Materials in 1992 [8]. Several extensions have followed since. The current version (v2.10) was released in November 2014 [11]. In Arden Syntax, decision support and logic are contained and separated from each other in so-called medical logic modules (MLMs). Each of these MLMs contains enough clinical knowledge for at least one medical decision. MLMs can receive input data via a direct argument, with its call from other MLMs, or via so-called read/curly brace expressions. The latter may be defined as expressions to access external data sources such as an SQL database or an FHIR resource. MLMs are split into four sections: maintenance, library, knowledge, and resources. The maintenance and library sections contain information about the MLM itself and some medical background data. The knowledge section contains the medical logic, and the resources section comprises localized messages in different languages if applicable. The ArdenSuite software [12, 13] is a commercial CDS authoring and processing platform based on Arden Syntax. It was developed by Medexter Healthcare, Vienna, Austria. As seen in the middle part of Fig. 1 (processing), it consists of two main components: the ArdenSuite integrated development environment (IDE) to create, compile, and test MLMs; and the ArdenSuite server that grants external client applications access to uploaded MLMs via a REST or SOAP interface. The top portion of Fig. 1 (integration) shows external software that may be integrated with or into the ArdenSuite. The lower part of Fig. 1 (data sources) shows the different types of data sources the ArdenSuite is able to communicate with. The predictive model markup language (PMML) [14] is an XML-based specification for the representation of statistical and data mining models. It was developed and is continuously supported by the Data Mining Group [15]. Its latest version (v4.3) was released in August 2016 [16]. The terms “predictive analytic models” and “data mining models” refer to mathematical models that use statistics, clustering, and similar approaches to learn patterns from given data with a known outcome. The learned models use the acquired knowledge to predict patterns in new data. The basic purpose of PMML is to create a vendor-independent method of exchanging models between applications. This approach is adapted by over 20 vendors, such as

From Machine Learning to Knowledge-Based Decision Support …

91

Fig. 1 The ArdenSuite’s two main components IDE and server, as well as its interoperability components: the SOAP and REST webservice APIs, the database, FHIR, and OpenEHR connectors, the ArdenML and PMML transformers, the CDS hooks API, and the Activiti workflow extension

IBM, KNIME, Phyton, LigaData, R, Java, and JavaScript [15]. As shown in Fig. 2, PMML files may also include data pre-processing and post-processing methods and parameters, in addition to the models themselves. The following models are currently supported by the PMML specification [16]: • • • • • •

Association rules Baseline models Bayesian network Cluster models Gaussian process General regression

Fig. 2 PMML content. Source [14]

92

• • • • • • • • • • •

J. Zeckl et al.

K-nearest neighbors Naïve Bayes Neural network Regression Rule set Scorecard Sequences Text models Time series Trees Vector machine

A single PMML file can contain several models of the same or different type; a PMML file may even contain no model at all. In the latter case, it is used to carry metadata alone. Each model in a PMML file may have the attribute “isScorable”, which indicates that the model is valid for processing. If this attribute is set to “false”, the model is meant to be purely informative and was not meant to be applied to any data. Decision trees [17] have been developed for statistical and machine learning purposes, and are successfully applied to various medical tasks [18]. They are extensively used for the classification of data. A decision tree has a “tree-like” structure with a single root node at the top. Each node may have any number of child nodes, which in turn may have any number of further child nodes. Nodes without child nodes are known as leaf nodes. Each node defines a condition used to select a child node to be checked next. The leaf nodes define the outcome of the tree evaluation. The first regression tree algorithm was published over fifty years ago [19]. Classification and regression trees (CART) is an algorithm first described by Breiman et al. [20] in 1984. It introduced some novel improvements, including the ability to control the size of the tree being generated. The goal of this project was to create a PMML-to-Arden-Syntax transformer that is able to produce well formatted Arden Syntax MLMs out of PMML files. A bridge between the Arden Syntax standard to develop knowledge-based decision systems and machine learning results that autonomously learn from data has thus been created. The PMML-to-Arden-Syntax transformer has been implemented as an add-on of the ArdenSuite IDE. The completed PMML-to-Arden-Syntax transformer will be useful for development teams that wish to integrate their machine learning results into the established ArdenSuite CDS platform environment of a hospital. The results of machine learning can be directly converted into Arden Syntax MLMs and processed within this environment. The present PMML-to-Arden-Syntax transformer generates Arden Syntax MLMs from decision tree models only.

From Machine Learning to Knowledge-Based Decision Support …

93

2 Methods The PMML version 4.3 and the Arden Syntax version 2.9 were used to transform PMML files into Arden Syntax MLMs [21]. First, a Java project was created with the Eclipse Oxygen 3a release. This program gets a PMML file as well as an output folder location as input. The program starts by validating the PMML file against an XML Schema Definition (XSD) file to ensure the validity of the file against the PMML specification. This XSD file can be downloaded free of cost from the PMML website [16]. Once the validity of the input file has been confirmed, the transformation into Arden Syntax takes place. This is done by using the extensible stylesheet language transformation (XSLT) method. The rules for the transformation are written into XSL files. In XSLT, the transformation rules are split into templates. Each template matches a specific XML tag and defines how it should be transformed. Using the XPath syntax, data from other XML tags can be transformed. The main template matches the “PMML” tag which contains the entire PMML file. This template creates the basic structure of the MLM file and calls other templates to fill the file with the necessary data. Some of the basic XSLT elements for the creation of the MLM are: “text”, “value-of”, “call-template”, “apply-templates”, “if”, “foreach” and “choose”. The “text” element writes an arbitrary text into the result file. With a “value-of” element, the content or an attribute of an XML element is written into the result file. “Call-template” and “apply-templates” are used to call other templates. The “call-template” executes a specific template, while “apply-templates” iterates through all XML tags given to the method and executes a matching template. The “if” and “for-each” elements have the common functionalities of these methods in other programming languages while the “choose” element correlates to an “ifelseif-else” block. In addition to the named elements, a number of XSLT elements were employed to create a well formatted MLM out of the PMML file. We used three PMML files for the purpose of testing. The first was taken from the examples given in the PMML specification [16]. It defines a tree to evaluate the suitability of weather conditions for playing golf. The second MLM uses the “iris” data set (also Fisher’s Iris data set) that is publicly available and often used for demonstrating machine learning algorithms (see UCI Machine Learning Repository [22]). The third tree uses the breast cancer data set (also from the UCI Machine Learning Repository [22]), which includes 286 cases described by ten attributes. The data were imported in R and the rpart package was used to train a decision tree for the breast cancer data set using the CART algorithm as well as for the iris data set. The tree was generated by the R script in Fig. 3 that uses the patient’s age, the size of the tumor, as well as other features to predict whether there will be any recurrence of events. The three PMML files and unit tests were used for testing. The PMML files were fed as input into the PMML-to-Arden-Syntax transformer. The resulting MLM files were copied into the ArdenSuite IDE and then compiled. In the subsequent tests,

94

J. Zeckl et al.

Fig. 3 R script for the creation of a test PMML

library(pmml) library(rpart) train threshold;

With a threshold of 0, the smallest order i of the semantic similarity measure satisfying the criterion of coverage obtained is i = 2. With i = 2, the number n of words li[ j] with similarity sim2 (li[0], li[ j])= 0, j ∈ [1, 3] is n = 0, compared to n = 60 with i = 0 and n = 21 with i = 1 out of 60 comparisons on the dataset. With i = 2, the number of elements li[ j] with similarity sim2 (li[0], li[ j]) = 1 are 0. This ensures that the criterion of completeness is satisfied. Moreover, selecting the smallest possible order i such that the requirement of coverage is fulfilled for simi , ensures that the criterion of simplicity is also satisfied.

110

M. Colombo and E. Portmann

4 Evaluation The main goal of this article being that of developing a good quality spectral semantic similarity measure for adjectives and adverbs, a well-suited evaluation strategy for this specific case had to be elaborated. Indeed, the most common methods for evaluation of semantic similarity [20, 21] are focusing on other types of similarities, more representative of relations between nouns. In this section, the chosen evaluation methodology is presented and motivated.

4.1 Methodology Semantic similarity being a quite fuzzy and subjective concept, the creation of a sharp ground truth (e.g., a list containing the exact semantic similarity value between couples of words) to which the outcomes of the proposed algorithm can be directly compared is impossible. Indeed, the value of a semantic similarity measure per se does not have a meaning for humans. It starts making sense only when put on a scale with a minimum and a maximum and their corresponding meaning. In our case, the maximum (1), respectively minimum (0), values, correspond to the similarity of a word with itself, respectively with a perfect antonym. Moreover, being able to compare a semantic similarity measure with another is crucial for the understanding of the effective closeness of two words. Inspired by this concept, we elaborated a new evaluation methodology making use of human perceptions in the creation of rankings of words based on their relative similarity. This can then be compared with the results of the same task performed by the proposed semantic similarity.

4.2 Data Collection To build a good data set describing perception-based rankings of adjectives and adverbs, the involvement of humans is fundamental. For this reason, a survey asking people to rank adjectives and adverbs based on their relative semantic similarity was created, using some carefully selected adjectives and adverbs lists. 15 groups of 5 adjectives or adverbs belonging to the same category (e.g., adjectives describing temperature) were generated by hand, taking care of creating groups with a variability in the following criteria: • Granularity: for example, in [never, often, sometimes, regularly, always] all words are quite far in meaning, whereas in [gigantic, massive, big, very big, huge] all words are very close in meaning. • Categories: different categories of adjectives and adverbs were selected.

Semantic Similarity Between Adjectives and Adverbs ...

111

Fig. 3 Example question extracted from the survey

For each of the generated groups, we selected at random one of the extremes (e.g., from [never, often, sometimes, regularly, always], never was picked). The respect of the variability in the enumerated criteria and the correctness of the selected extremes were double-checked by three external reviewers, which ensured the correctness of the prepared data. Once the adjectives and adverbs to be ranked were defined, the survey was created as follows: on the first page, age and level of English were asked as demographic data; after that, for each of the groups of 5 adjectives or adverbs, people were asked to rank (1–4) four words from the closest in meaning, to the furthest in meaning from the previously hand-selected extreme of the group. Questions and answer options were both displayed in a randomised order. See Fig. 3 or an example question corresponding to the group [never, often, sometimes, regularly, always]. The same task asked to the participants to the survey, is executed by a Python script implementing the second order semantic similarity measure we propose in this article (Eq. 2 with i = 2). For each group of data, sim2 is computed between the extreme of the group and all of its other elements. The words are then sorted from the one with the highest to the one with the lowest similarity score with the extreme. The similarity scores are also saved, as a representation of how certain the algorithm is of the order of each pairs of elements (i.e., the closer the scores, the less the certainty of the algorithm with respect to their order).

5 Results The created questionnaire was sent to people working or studying at different universities in Europe and America. Of the contacted people, 21 non-native English speakers aged between 18 and 44 years old (18–24: 3, 25–34: 17, 35–44: 1) participated to the survey. No native English speaker participated to the survey. In the results, one can notice an overall agreement on rankings between all participants; however, in many cases, there were disagreements on the order of couples with a very similar meaning (e.g., roughly half of the people said often is closer

112

M. Colombo and E. Portmann

to never than regularly, whereas the other half said the opposite). This shows that similar words can subjectively take slightly different connotations. In this context, we are interested to know the accuracy of the proposed similarity measure sim2 . Therefore, the rankings obtained using sim2 are compared with the ones collected from the participants to the study. Standard methods to compare rankings include Kendall rank correlation coefficient [23], Goodman and Kruskal’s gamma [24], Spearman’s rank correlation coefficient [25] and Somers’ D [26]. These rank correlation measures can be used to determine if the algorithm—and humangenerated rankings are significantly correlated. For our case, these measures have the disadvantage that they take in account only the order of words in the analysed rankings, and do not make use of the information about similarity between couple of elements. Indeed, in the computation of the accuracy of the proposed similarity measure sim2 , it is crucial to make sure that the distance between couples of words is taken into account. This means that, if elements with nearly identical meaning appear in different order in the human—and algorithm-generated rankings, it should be less penalising than if two elements with a very different meaning are swapped (e.g., if in a survey normally follows regularly and using sim2 one obtains the opposite, the solution from sim2 is not as wrong as if it returns regularly and rarely in a different order than the perception-based answer). To solve this issue, a generalisation of Kendall’s tau distance between rankings [23] can be used, adapted in such a way to include similarities between pairs of elements of the ranking [27]. Let [n] = 1, ..., n (in our case n = 4) be the set of indices corresponding to the ordered set of words retrieved from an answer submitted by a participant of the survey. Let Sn be the set of permutations on [n] and for σ ∈ Sn , let σ (i) be the rank of element i in the ranking automatically generated with sim2 . Let τ such that τ (i) = sim2 (we , wi ), with we the hand-picked extreme of the group. Then Di j is the distance between wi and w j , defined as: Di j = |τ (i) − τ ( j)|

(4)

An example of these values can be found in Table 1. Table 1 Values necessary for computation of generalised Kendall’s tau distance for the rankings with we = never Survey ranking (participant 10) Algorithm [n] σ τ ranking Sometimes Often Regularly Always

Sometimes Regularly Often Always

1 2 3 4

1 3 2 4

0.0506 0.0020 0.0018 0.0016

Semantic Similarity Between Adjectives and Adverbs ...

as

113

With these values defined, the weighted Kendall’s tau distance can be computed  Di j [σ (i) > σ ( j)] (5) K D (σ ) = i< j

To transform this weighted distance to an accuracy measure, we can normalise it and subtract it from the perfect accuracy of 1. To normalise the weighted Kendall’s tau distance, we divide it for the worst possible ranking the algorithm could generate, that means we compute K D,wor st = max K D (σ  ). We can then compute for each group g  σ ∈Sn

of 5 elements (1 extreme and 4 other words), the accuracy of the algorithm ranking with respect to the ordering made by a participant p to the survey as follows: Accuracy pg = 1 −

K D (σ pg ) K D,wor st,g

(6)

This way, the accuracy of sim2 with respect to the perceptions of each participant to the study for each group of words can be obtained. To get an estimation of the overall accuracy of the semantic similarity sim2 with respect to human perception, the average and standard deviation of the accuracies can be computed on all available data. Accuracy = 0.876 (S D = 0.192)

(7)

Which means that sim2 is a pretty accurate spectral similarity measure for adjectives and adverbs.

6 Discussion By analysing more in detail the accuracy measurements, it is noticeable that the lowest accuracy (0.532) was reached in question 11, where the elements to be ordered according to their similarity to always were often, regularly, normally, usually, which are words with a very close meaning to one another. Indeed, also the participants to the study could not agree at all between them on the order of these words (e.g., the closest word to always was: normally for 7 people, regularly for 7, usually for 4 and often for 3). This shows that natural language, which is being analysed thanks to sim2 , is fuzzy and between words which are very close in meaning the differences are completely subjective. In general, the subtle subjective differences that exists between very similar words do not matter and have no impact on language, so there are objectively no differences between very close words. But as the distance between two words increases, the more it becomes important to understand their differences correctly. We can include this observation in the computation of the average accuracy of sim2 by recomputing it weighted by K D,wor st,g , which ensures that having words with low

114

M. Colombo and E. Portmann

similarity in the correct order is the most important thing for a semantic similarity measure. By doing this, we obtain Accuracy = 0.961 (SD = 0.062), which means that a major part of the errors in the non-weighted accuracy measure from Sect. 5 come from discrepancies in the order of words with very close meaning, whereas words with further meaning are almost always in the same order in the algorithm and survey results. To improve the measure of accuracy of sim2 , it would be interesting to compare it with the accuracy would obtained on the same test and compared with the same baseline. One thing to be considered in the analysis of the results is also the quality and completeness of the thesaurus one is using. Overall, the one used for the experiments illustrated in this article seems of good quality and quite complete, but it could clearly be improved. Indeed, it happens quite often to see w2 in the synonyms of w1 when w1 is not in the list of synonyms of w2 , which is an indicator of missing synonyms. A measure of the quality of thesauri could thus be a useful tool to let people chose the best synonym database for sim2 . To further improve the quality of the proposed semantic similarity measure, future steps include the possibility of extending sim2 with the use of antonyms, and the use of some techniques to allow the use of higher orders of similarity without running into problems related to the criterion of completeness. Considering word combinations could also be important for improving the descriptive power of sim2 , for example making it able to handle expressions as extremely hot. Moreover, finding a way of uniforming the measures of semantic similarity on nouns, adjectives and adverbs would simplify their practical use, for example in the context of analogical reasoning (see [4, 28]). Another future extension of sim2 , could be related to the precise description of its degree of fuzziness, as a function for example of incompleteness and incongruences in the used synonym collection. We believe that a good quality measure of similarity between words is a fundamental step towards the automatic expansion of knowledge in computer systems. Indeed, by understanding how different words are interconnected, a system could be able to automatically precisiate the meaning of new words it is faced with, based on some previous knowledge of other words. Because of this, we believe that in the years to come, measures similar to the one that has been presented in this article will become more and more important and used in the context of self-learning machines. This will not only include natural language as in this article, but we argue it will be extended to any field where the artificial understanding of humans is important, from the interpretation of non-verbal communication to artificial cognition.

7 Conclusion We presented a novel similarity measure based on synonyms able to estimate the spectral closeness between adjectives and adverbs belonging to the same category.

Semantic Similarity Between Adjectives and Adverbs ...

115

This measure has been evaluated on a ranking-based, human-generated ground truth with very promising results. Although possible improvements can be made following the law of parsimony, this newly proposed measure is an important step towards new developments in the field of self-learning machines able to understand humans and to process their perceptions. For example, it is a fundamental building block of automatic extension of machine knowledge, like precisiation of words’ meaning and analogical reasoning, processes that can simplify the application of the theory of computing with words [5] to real world problems. Acknowledgements We thank Jhonny Pincay Nieves and Minh Tue Nguyen for their precious contribution in the review of the data used for the creation of the questionnaire as well as the refinement of the survey itself, and Sara D’Onofrio for her valuable revision of this article. We moreover express our gratitude to all the participants to the survey for their fundamental contribution to the evaluation of our algorithm.

References 1. Ruder, S.: Neural Transfer Learning for Natural Language Processing (2019) 2. Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41 (2009). https://doi. org/10.1145/1459352.1459355 3. Budanitsky, A., Hirst, G.: Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and Other Lexical Resources 2 (2001) 4. D’Onofrio, S., Müller, S.M., Papageorgiou, E.I., Portmann, E.: Fuzzy reasoning in cognitive cities: an exploratory work on fuzzy analogical reasoning using fuzzy cognitive maps. In: 2018 IEEE International Conference on Fuzzy Systems, pp. 1–8 (2018). https://doi.org/10.1109/ FUZZ-IEEE.2018.8491474 5. Zadeh, L.A.: Fuzzy logic = computing with words. IEEE Trans. Fuzzy Syst. 4, 103–111 (2006). https://doi.org/10.1109/91.493904 6. Gupta, C., Jain, A., Joshi, N.: Fuzzy logic in natural language processing - a closer view. Procedia Comput. Sci. 132, 1375–1384 (2018). https://doi.org/10.1016/j.procs.2018.05.052 7. Zhao, L., Ichise, R., Mita, S., Sasaki, Y.: An ontology-based intelligent speed adaptation system for autonomous cars. In: The 4th Joint International Semantic Technology Conference (2014). https://doi.org/10.1007/978-3-319-15615-6_30 8. Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: 2th European Conference on Machine Learning (2001). https://doi.org/10.1007/3-540-44795-4_42 9. Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. Proc. WWW 2017(7), 757–766 (2007) 10. Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th international conference on World Wide Web, pp. 377–386 (2006) 11. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995) 12. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998) 13. Speer, R., Chin, J., Havasi, C.: ConceptNet 5.5: an open multilingual graph of general knowledge. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (2017) 14. Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: An electronic lexical database, pp. 305–332 (1998) 15. Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. An electronic lexical database, WordNet (1998)

116

M. Colombo and E. Portmann

16. Pilehvar, M.T., Jurgens, D., Navigli, R.: Align, disambiguate and walk: a unified approach for measuring semantic similarity. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1341–1351 (2013) 17. Pilehvar, M.T., Navigli, R.: From senses to texts: an all-in-one graph-based approach for measuring semantic similarity. Artif. Intell. 228, 95–128 (2015) 18. Banerjee, S., Pedersen, T.: An adapted Lesk algorithm for word sense disambiguation using WordNet. In: International conference on intelligent text processing and computational linguistics, pp. 136–145 (2002) 19. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26 (1986) 20. Finkelstein, L., et al.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20, 116–131 (2002) 21. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8, 627–633 (1965) 22. Marková, V.: Synonyme unter dem Mikroskop. Eine korpuslinguistische Studie. Korpuslinguistik und interdisziplinäre Perspektiven auf Sprache 2 (2012) 23. Kendall, M.: A new measure of rank correlation. Biometrika 30, 81–89 (1938). https://doi.org/ 10.1093/biomet/30.1-2.81 24. Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications. J. Am. Stat. Assoc. 49, 732–764 (1954). https://doi.org/10.2307/2281536 25. Spearman, C.: Proof and measurement of association between two things. Am. J. Psychol. 15, 72–101 (1904) 26. Somers, R.H.: A new asymmetric measure of association for ordinal variables. Am. Soc. Rev. 27 (1962). https://doi.org/10.2307/2090408 27. Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: Proceedings of the 19th international conference on World wide web, pp. 571–580 (2010). https://doi.org/10.1145/ 1772690.1772749 28. Müller, S., D’Onofrio, S., Portmann, E.: Fuzzy analogical reasoning in cognitive cities - a conceptual framework for urban dialogue systems. In: Proceedings of the 20th International Conference on Enterprise Information Systems, vol. 1, pp. 353–360 (2018)

Evaluation of Robotic Navigation Model Considering Group Norms of Personal Space in Human–Robot Communities Yotaro Fuse, Hiroshi Takenouchi, and Masataka Tokumaru

Abstract We propose a robotic model that helps determine a robot’s position when there are changes in the human’s personal space in a human–robot community. Recently, there have been a number of efforts to develop personal robots suitable for human communities. Determining a robot’s position is important not only to avoid collisions with humans but also to maintain a socially acceptable distance with humans. The inter–personal space maintained by persons in a community depends on the closeness of the persons. Therefore, robots need to determine the positions of persons and evaluate the changes made in their personal space. In this paper, we propose a robotic model and examine whether the experimental participants could distinguish the robot’s trajectory from the human’s trajectory in the simulation. Our results showed that none of the participants could completely distinguish between the robot’s and human’s trajectories.

1 Introduction Recently, there has been a spurt in the research on developing communication robots that can communicate naturally with people; this is because it has been predicted that most people will live with robots in the future [1]. Therefore, robots must become more human–like in their interactions with people. Many studies have demonstrated that robots can communicate with persons using social behaviors, such as expressing emotions and gestures [2]. In addition, many researchers have investigated how navigation robots tend to position themselves too close to humans, which can create Y. Fuse (B) Graduate School of Kansai University, 3-3-35 Yamate-cho, Suita-shi, Osaka 564-8680, Japan e-mail: [email protected] H. Takenouchi Fukuoka Institute of Technology, 3-30-1 Wajiro-higashi, Higashi-ku, Fukuoka 811-0295, Japan e-mail: [email protected] M. Tokumaru Kansai University, 3-3-35 Yamate-cho, Suita-shi, Osaka 564-8680, Japan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_11

117

118

Y. Fuse et al.

discomfort [3–7]. It is important that robots in human communities conform to human social norms. In social groups, people conform to expectations and common group behaviors. Although individuals have different personalities and decision–making criteria, all criteria converge into one common criterion when a group is formed. To exhibit sociality in a human society, robots need to adapt to group norms that are formed by group members. Group norms are the informal rules that groups adopt to regulate the behavior of group members [8]. Fuse et al. proposed a robotic model that adjusted to group norms and enabled the robot to behave as a member of a group by observing the group members [9]. An example of a group norm in human communities is the physical distance that people maintain when they stand with one another. In human communities, a person usually has a region surrounding him/her called the personal space, which is psychologically their own space; if another person intrudes into this space, it causes discomfort [10, 11]. Although [12] reported that personal space is dynamic and situation dependent, people in a group usually maintain some appropriate distance from one another when the group members stand together. Autonomous mobile robots in a human–robot group also need to move based on the changes in the humans’ personal spaces. However, in previous studies, these methods aim to avoid colliding with human and encroaching human’s personal space while the robot moves from an initial point to a target point [4–7]. The previous methods do not consider a situation where a robot moves in a human–robot group as a group member. Therefore, the robot using the previous methods can not maintain the distance in the group. In this paper, we study a robotic navigation model by considering the changes in the personal spaces in human–robot groups. In our study, the robot using our proposed method is a member of human–robot group, whereas robots in previous studies are not a group member and have no social relation between the human and the robots. This model aims to enable a mobile robot to learn the physical distances that human group members maintain and to determine its physical position in the human–robot group on the basis of the locations of other group members. In this study, we investigate whether or not a robot as a member of a human group can determine an appropriate location that does not encroach upon the personal space of another group member and moves in a human–like way in a simulation in which the human members’ personal spaces change.

2 Proposed Model The proposed model aims to enable a mobile robot to move in a human–like way by considering the changes in the personal spaces of group members. Figure 1 shows the diagram of a mobile robot that uses the proposed model. The mobile robot moves to a location appropriate for the estimated physical distance that the robot should maintain within the area where the robot can move. If this is done in all the steps, the robot moves in a human group as a group member. Figure 2 shows

Robotic Navigation Model Considering Group Norms ...

119

the locations of certain humans (H) and a robot (R) standing in the x–y plane. In the graph on the left (Fig. 2(a)), the humans and the robot are located at some distance from one another. In the graph on the right (Fig. 2(b)), they are positioned more closely to one another. In these figures, the position of the humans (H) indicate the change of their physical distances that humans want to maintain, whereas R shows the robot’s ideal location reached by using the proposed navigation model. Therefore, a robot in such a situation must continually find a location that is not too close or too distant from a human although humans in the group keep moving and changing their physical distances.

2.1 Environment and Robots’ Action In this paper, the robot in the human group observes the x–y plane shown in Fig. 2 as the environment. In this environment, the state of a human or a robot indicate that a human or a robot is located at (x, y) at the step t(state s t ). Each group member transits from the state s t to s t+1 by moving from the present location to the location at which the group member aims to arrive. Moreover, the action a(nm) that the robot executes indicates that it moves from (x, y) to (x + n, y + m). Therefore, to avoid invading the humans’ personal space, the robot determines the action that should be executed in the state s at every step.

Fig. 1 Navigation model based on the changes in the group members’ personal spaces

Fig. 2 Humans’ and robot’s locations in the x–y plane. The figure on the left is numbered as a and that on the right is numbered as b

120

Y. Fuse et al.

2.2 Learning to Maintain Physical Distance in the Group The proposed navigation model aims to enable a robot to find its own location in a group of humans. To learn the physical distances that are maintained in the human group, the robot derives the physical distances among the humans from the location of other group members. Then, the robot estimates the physical distance maintained in the group. The robot has the value function V (d) and the reward function R(d) to estimate the maintained distance in the group. The d indicates the physical distance between any two of the group members. The value function V (d) outputs the value of maintaining the distance d in the group, whereas the reward function R(d) indicates the reward for maintaining the distance d. To consider the changes in the humans’ personal spaces based on the passage of time, we renewed the value function V (d) using Eqs. (1), (2), and (3).    V (d ) V (d) ← (1 − α)V (d) + α R(d) + γ max d    (di − d)2 u(d − di ) × ex p − R(d) = 2σ 2 i  u(d − di ) =

0(d < di ) 1(d ≥ di )

(1) (2)

(3)

In Eq. (1), α and γ are parameters in reinforcement learning. When n human group members belonged to a group, there were n(n − 1)/2 values for the distance between to humans. For example, Fig. 3 shows three Hs and distances d1 , d2 , and d3 . Therefore, each distance represented {di |i = 1, 2, 3, . . . , n(n − 1)/2}.

2.3 Criterion for Decision–Making When the robot in a certain state s was too close to the other group members or away from them, the robot needed to adjust its location to maintain a distance that the other group members were maintaining with one another. The robot used the proposed model to select an action a on the basis of the value Q(s, a), which meant that the robot moved to a certain location in its neighborhood. The Q value indicates the value of executing the action s in the state s as follows: Q(s, a) =

 i

   2  2 ex p − di − argmax V (d ) /2σ d

(4)

The robot moves in the environment by executing anm on the basis of the highest Q value for executing an action in a certain state.

Robotic Navigation Model Considering Group Norms ...

121

3 Simulation and Evaluation We investigated whether a robot could move to a location where it did not invade the personal spaces of any group member during group interactions. In addition, the research participants evaluated the robot’s trajectory by watching video recordings on the trajectories of humans and the robot in the simulation. In this simulation scenario, each of the four agents at a certain location in the environment gathered in front of one human agent (HA); the robot was defined as the robot agent RA. We investigated how frequently the RA invaded the personal spaces of the HAs. In addition, we prepared questionnaires to evaluate the RA’s trajectory. Research participants observed the four agents’ trajectories in the simulation and judged which agent was the RA. Although the participants knew the ways in which the HAs and the RA moved to the environment, they did not know which agent was the robot.

3.1 Environment and Condition Table 1 shows the conditions for this simulation. The HA and RA were located in the environment of {x, y|0 < x ≤ 200, 0 < y ≤ 200, x ∈ Z, y ∈ Z}, which was equivalent to a plane of 20 × 20 m. Moreover, each agent was separated from another agent by more than 7.5 m and could move in its neighborhood showed Fig. 3 in a step, which is the range of 5 × 5 squares. To prepare 20 initial agent locations, we

Fig. 3 Physical distance di maintained by group members and the range of the robot’s neighborhood

Table 1 Conditions for simulation Environment Number of agents Range of neighborhood Number of steps Initial location of agents σ2 α γ

200 × 200 squares (1 square = 100 cm 2 ) Four kinds (H A1 , H A2 , H A3 , R A) 5×5 300 20 patterns 100 0.1 0.9

122

Y. Fuse et al.

Fig. 4 Transition of the agents’ movements for the questionnaire responses

investigated whether the RA movements were based on the HA’s personal spaces in different initial locations.

3.2 HA’s Way of Moving To express the shrinking of the personal spaces of HAs, we defined the parameter Closeness. Each HA shared the Closeness parameter, that is, each HA added the random number [0.0, 1.0] at each step. Moreover, the range of Closeness was [0, 100]. Because of increasing closeness, the physical distance that each HA wanted to maintain decreased from 7.5 to 1.0 m. The physical distance that HA wanted to maintain decreased using the following equation: d H A = 7.5 + ((1.0 − 7.5) ∗ Closeness) /10

(5)

Considering the closeness and their personal spaces, each HA does not intrude into any personal space and moves to the grid closest to the destination in its neighborhood.

3.3 Way of Evaluating the RA’s Trajectory Figure 4 shows an example of two video clippings used in the questionnaire. Each agent was expressed as a red, green, blue, or black drop. The red, green, and blue drops gathered around the black drop agent that did not move. One of the red, green, and blue drops was the RA in the video. Moreover, the color of the RA differed depending on each agent’s initial location. Thirty-one participants watched 20 videos and answered the questionnaires to judge which drop was the RA that was using the proposed model. Therefore, we executed the questionnaires based on the hypothesis that the RA’s trajectories would be considered human–like if the participants could not correctly identify the RA.

Robotic Navigation Model Considering Group Norms ...

Fig. 5 Average distances between the RA and HAs and the rate of encroaching Fig. 6 Degrees of encroaching on HA’s personal space

Fig. 7 The result of questionnaire and a percentage of the correct answers

123

124

Y. Fuse et al.

3.4 Result Figure 5 shows the rate of intrusion into each HA’s personal space. The horizontal axis shows the step number, whereas the left vertical axis shows the average of the distances from the RA to the HA; the right vertical axis indicates the rate of encroaching HA’s personal space. While the physical distance that each HA wants to maintain increases, RA moves by considering the average distance. However, the rate of encroaching the personal spaces shows that sometimes the RA intruded into an HA’s personal space. Figure 6 indicates the degree of the intrusion into their personal spaces; the distance of these intrusions was less than 20 cm. Therefore, if RA encroaches upon the other group members’ personal spaces, it can avoid encroaching the spaces more than ever. Moreover, Fig. 7 shows the result obtained from a questionnaire that investigated whether the research participants could find which drop color was the RA. In Fig. 7, the responses on the horizontal axis are a percentage of each participant’s correct answers, whereas the questionnaire in the horizontal axis is a percentage of each initial location’s correct answers. Both responses and questionnaires were approximately 40%; therefore, we concluded that the RA moved in a human–like way.

3.5 Discussion Our results prove that the RA using the proposed model moved in a human–like way while the HAs were getting closer to one another, and the physical distances among the HAs were decreasing. In addition, the RA trajectories were similar to the HA trajectories, which moved the environment based on their personal spaces. Therefore, the RA (as a group member) selected its pathway without encroaching upon the personal spaces of the HAs in the human group.

4 Conclusion In this study, we proposed a model to enable a robot to move in a human group without encroaching upon the changeable personal spaces of humans. The humans in a group maintain physical distances from one another in accordance with their closeness or their contexts. Therefore, to find an appropriate location, personal robots in human communities need to learn the physical distances that humans maintain in groups. In the simulation, we evaluated the proposed model used by the robot to move in the human group without encroaching upon the humans’ personal spaces; the robot’s trajectories resembled the trajectories of other human members. Therefore, we concluded that the robot as a group member selected its pathway without encroaching upon the personal spaces of the human agents.

Robotic Navigation Model Considering Group Norms ...

125

In our future research, we propose to improve upon the proposed model and carry out an experiment using a real robot and a human person in order to investigate whether the physical distance will be maintained in the human–robot groups.

References 1. Dell Technologies: The Next Era of Human–Machine Partnerships (2017). http://www.iftf.org/ fileadmin/user_upload/downloads/th/SR1940_IFTFforDellTechnologies_Human-Machine_ 070717_readerhigh-res.pdf 2. Hegel, F., et al.: Understanding social robots. In: 2009 Second International Conferences on Advances in Computer-Human Interactions, pp.169–174 (2009) 3. Kruse, T., et al.: Human-aware robot navigation: a survey. Robot. Auton. Syst. 61(12), 1726– 1743 (2013) 4. Chatterjee, I., Steinfeld, A.: Performance of a low–cost, human–inspired perception approach for dense moving crowd navigation. In: 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 578–585 (2016) 5. Chi, W., et al.: A human–friendly robot navigation algorithm using the risk–RRT approach. In: IEEE International Conference on Real-time Computing and Robotics (RCAR), pp. 227–232 (2016) 6. Truong, X.T., et al.: Approaching humans in crowded and dynamic environments. In: IEEE International Conference on Advanced Intelligent Mechatronics (AIM), pp. 476–481 (2016) 7. Lindner, F.: A conceptual model of personal space for human–aware robot activity placement. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5770– 5775 (2015) 8. Feldman, D.C.: The development and enforcement of group norms. Acad. Manag. Rev. 9(1), 47–53 (1984) 9. Fuse, Y., et al.: A robot model that obeys a norm of a human group by participating in the group and interacting with its members. In: IEICE Transactions on Information and Systems, vol. E102-D, no. 1, pp. 185–194, January 2019 10. Sundstrom, E., et al.: Interpersonal relationships and personal space: research review and theoretical model. Hum. Ecol. 4(1), 47–67 (1976) 11. Cristani, M., et al.: Towards computational proxemics: Inferring social relations from interpersonal distances. In: IEEE Third International Conference on Privacy, Security, Risk and Trust and IEEE Third International Conference on Social Computing, pp. 290–297 (2011) 12. Hayduk, L.A.: Personal space: understanding the simplex model. J. Nonverbal Behav. 18(3), 245–260 (1994)

ESYNDIAG: A Fuzzy Expert System for Eight Syndrome Diagnosis in Traditional Vietnamese Medicine Hoang Phuong Nguyen, Lam Tung Vu, Thuy Hong Truong, and Kaoru Hirota

Abstract A fuzzy rule—based expert system ESYNDIAG is presented for eight syndrome diagnosis in Traditional Vietnamese Medicine combining positive and negative rules. After designing and building a suitable inference engine for this system, efforts have been committed to create effective knowledge base consisting of more 800 positive rules for confirmation of conclusion and of more 100 negative rules for exclusion of the same conclusion. How the rule base is constructed, managed and used are focussed on for diagnosis of eight syndromes in Traditional Vietnamese medicine such as Yin syndrome, Yang syndrome, Superficial syndrome, deep syndrome, Cold syndrome, Hot syndrome, Deficiency syndrome, Excess syndrome. The inference engine shows how to combine positive and negative rules. The first evaluation of ESYNDIAG is presented by the traditional medicine expert’s group in Vietnam and confirmed that ESYNDIAG diagnoses with a high accuracy. Keywords Fuzzy expert systems · Traditional Vietnamese Medicine · Syndrome diagnosis

H. P. Nguyen (B) Thang Long University, Hanoi, Vietnam e-mail: [email protected] L. T. Vu Nam Dinh University of Technical Pedalogy, Nam Ði.nh, Vietnam e-mail: [email protected] T. H. Truong Thai Nguyen University of Medicine and Pharmacy, Thái Nguyên, Vietnam e-mail: [email protected] K. Hirota Beijing Institute of Technology, Beijing, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_12

127

128

H. P. Nguyen et al.

1 Introduction Vietnam medicine consists of traditional medicine and western medicine. As traditional medicine in India, China, Korea, Japan, the traditional Vietnamese medicine build on a profound reasoning basis base on Yin and Yang and fine element theory [7, 18]. With thousands of years of development, the traditional Vietnamese medicine has been making a lot of progress in diagnosis and treatment and as a complementary medicine, the traditional Vietnamese medicine help to treat patients in many disease syndromes. The eight syndrome diagnosis of traditional Vietnamese medicine consists of Yang syndrome, Yin syndrome, Superficial syndrome, Deep syndrome, Cold syndrome, Hot syndrome, Deficiency syndrome and Excess syndrome. Yin and Yang are opposed to each other and indicate the different types of disease; Superficial and Deep are opposed to each other and indicate the region of a disease; Cold and Hot are opposed to each other and indicate the nature of a disease; and Deficiency and Excess are opposed to each other and indicate the conditions of a disease. The superficial, hot and excess syndromes are all yang syndromes; the deep, cold and deficiency syndromes are all yin syndromes. In syndrome differentiation of traditional Vietnamese medicine, eight syndrome diagnosis in four oppositions are first differentiated. Knowing well the eight syndrome diagnosis, traditional medicine doctors can understand the tendency of a disease which belong to yang or yin syndromes in nature and based on this orientation, the correct treatment should be applied. The eight syndrome diagnosis mistakes can be very harmful. If a traditional medicine doctor misses eight syndrome diagnosis or diagnoses with not accuracy result, then no correct treatment is applied, and the patient’s health worsens. On the other hand, if a traditional medicine doctor mistakenly diagnoses eight syndrome diagnosis, the patient is going through a useless a disease treatment that means treatment methods such as herbal plants are applied not correctly and meanwhile his disease worsens his health. It is therefore important to diagnose eight syndrome diagnosis correctly. ESYNDIAG is a computer program which inputs the patient’s symptoms, and uses its inference engine and the knowledge base provided by traditional medicine doctors to determine whether the patient has eight syndrome diagnosis. In developing this system, we faced the following problem: symptoms such as hight fever, pale tongue, fatigue etc., are not precisely defined, while a computer requires the input data to be exact and digitized. Therefore, we must model the uncertainty of symptoms so that the computer will be able to handle the traditional medical input data. For such modeling, ESYNDIAG uses fuzzy logic and approximate reasoning methods, two mathematical tools specialized in handling and processing real traditional Vietnamese medicine’s fuzziness [2–6, 8–17, 21–24]. The paper is organized as follows: Sect. 2 presents a general structure and knowledge base of the system. Section 3 gives an example of the system’s diagnosis process and of the way the knowledge base is used in the process. The evaluation of the system’s ability by traditional medicine expert’s group is given in Sect. 4, and the

ESYNDIAG: A Fuzzy Expert System for Eight Syndrome Diagnosis …

129

system’s application scope in Sect. 5. Conclusion and future plans are discussed in Sect. 6.

2 Structure of the System ESYNDIAG was developed by using Visual C#.NET programming language, for Windows 7. From user’s viewpoint, ESYNDIAG is equipped with the nice and friendly graphic interface. The most important components of the system are: knowledge base, reasoning engine, knowledge acquisition and an explanation. Let us describe these subsystems.

2.1 Knowledge Base The knowledge base of ESYNDIAG contains rules provided by the traditional medicine doctors in National Hospital of Traditional Medicine. These rules come in three different forms: Positive form: E k → D j (μcR S D (E k , D j )) Negative form: E k → ¬D j (μeR S D (E k , D j )) Where E k is an elementary conjunction of symptoms Si in form of E k = S1 &, . . . , Sm , for each i, i = 1,…,m. D j is a syndrome such as syndromes about Yin syndrome, Yang syndrome, superficial syndrome, deep syndrome, Cold syndrome, Hot syndrome, Deficiency syndrome, Excess syndrome. Assume that μcR S D (E k , D j ) = 0 or μeR S D (E k , D j ) = 0, where μcR S D (E k , D j ), e μ R S D (E k , D j ) are two different fuzzy weights of fuzzy rules in [0,1]. It is impossible that E k both confirms and excludes D j . More precisely: μcR S D (E k , D j ) = 0 means the elementary conjunction E k of symptoms Si definitely not confirms the conclusion of D j . (ii) μcR S D (E k , D j ) = 1 means the elementary conjunction. E k of symptoms Si definitely confirms the conclusion of D j . (iii) 0 < μcR S D (E k , D j ) < 1 means the elementary conjunction. E k of symptoms Si confirms the conclusion of D j with some fuzzy degree. (i)

It is similar for the case of μeR S D (E k , D j ). Intermediary form: E k → Sh (μcR SS (E i , Sh )) μcR SS (E k , Sh ) = 0 means the elementary conjunction E k of symptoms Si definitely not confirms the symptom Sh . (ii) μcR SS (E k , Sh ) = 1 means the elementary conjunction E k of symptoms Si definitely confirms the symptom Sh . (i)

130

H. P. Nguyen et al.

(iii) 0 < μcR SS (E k , Sh ) < 1 means the elementary conjunction E k of symptoms Si confirms the symptom Sh with some fuzzy degree. In rules of all three types, the premise E k is a set of patient’s symptoms. Rules in positive form are used to make a positive diagnosis, i.e. a diagnosis confirming eight rules diagnosis. Rules in negative form are used to make a negative diagnosis, i.e. a diagnosis excluding eight rules diagnosis. Rules of the third type formalize the way doctors reason: “If a patient is suffering from these symptoms, so he should be suffering from this other symptom because this symptom is usually observed together with the first group of symptom”. ESYNDIAG’s knowledge base contain more than 900 rules which consists of Yang syndromes (contain 127 positive rules and 20 negative rules), Yin syndromes (contain 127 positive rules and 20 negative rules), Superficial syndromes (contain 63 positive rules and 10 negative rules), Deep syndromes (contain 110 positive rules and 12 negative rules), Cold syndromes (contain 120 positive rules and 15 negative rules), Hot syndromes (contain 120 positive rules and 15 negative rules), Deficiency syndromes (contain 125 positive rules and 17 negative rules) and Excess syndromes (contain 60 positive rules and 10 negative rules). The number of rules is increasing very fast due to continuing knowledge acquisition with traditional medical doctors.

2.2 Inference Engine In our system, we follow the traditional medical doctors’s four-step way of making a diagnosis: There are four methods of diagnostics in Traditional Vietnamese Medicine, which are often referred to as the Four Pillars of Diagnosis. They are: 1. Inspection: Inspection or “looking” focuses on visual analysis of the face, skin features, and particularly, the tongue. 2. Auscultation: Auscultation refers to the analysis of particular sounds such shouting, laughing, singing, weeping, and groaning. 3. Inquiry: Inquiry refers to analysis by asking questions about the person’s past health and habits. 4. Palpation: Palpation refers to analysis by feeling, particularly the wrist pulse, abdomen, and meridians points. After recognizing a particular pattern of disharmony, a doctor will prescribe treatment based on the diagnosis that was given. The treatment can include the more common practices of Traditional Vietnamese Medicine, such as herbal medicine, acupuncture etc. In general, the system accepts fuzzy descriptions of the patient’s symptoms Si (i = 1,…m) where μ R P S (Pq , Si ) is a fuzzy degree and it takes the value in [0,1]. – μ R P S (Pq , Si ) = 1 means symptom Si surely present for patient Pq . – μ R P S (Pq , Si ) = 0 means symptom Si surely absent for patient Pq .

ESYNDIAG: A Fuzzy Expert System for Eight Syndrome Diagnosis …

131

– 0 < μ R P S (Pq , Si ) < 1 means symptom Si present for patient Pq with some degree. We use the above-mentioned rules to determine the possibility of eight rules diagnosis i.e. Yin syndrome, Yang syndrome, Superficial syndrome, deep syndrome, Cold syndrome, Hot syndrome, Deficiency syndrome, Excess syndrome. This is done in three steps. On the first step, we use all positive rules and compute the degree μcR P D (Pq , D j ) of confirming D j by using as formula: 



μcR P D (Pq , D j ) = Max E  Min[μ R P S (Pq , E k ); μcR S D (E k , D j )], k



where E i varies over all elementary conjunctions of symptoms for which   μcR S D (E k , D j ) is positive, μ R P S (Pq , E k ) is a degree to which a patient Pq satisfies the   premise of the rule E k , and μcR S D (E k , D j ) is the degree with which the conclusion of this rule confirms D j . On the second step, we determine the degree μeR P D (Pq , D j ) of excluding D j by using as formula: 



μeR P D (Pq , D j ) = Max E  Min[μ R P S (Pq , E k ); μeR S D (E k , D j )]. k

On the third step, we combine these two degree into a single degree: c e μtot R P D (Pq , D j ) = μ R P D (Pi , D j )  μ R P D (Pi , D j )

in [−1,1]. The operation  is a group operation defined by Xy = x ⊕ −y Operation ⊕ is an ordered Abelian group operation on [−1,1]. We can use an operation from the medical expert system MYCIN [19], in which the MYCIN group operation ⊕ on [−1,1] is defined as follows: x ⊕ y = x + y + x.y for x, y ≥ 0 x ⊕ y = x + y − x.y for x, y ≤ 0 x⊕y=

x+y for all other x, y. 1 − min(|x|, |y|)

Finally, the consultation results are the following: 1. The total degree μtot R P D (Pq , D j ) = 1 means Absolutely Confirmation of conclusion of D j .

132

H. P. Nguyen et al.

tot 2. The total degree μtot R P D (Pq , D j ) such that 0.6 ≤ μ R P D (Pq , D j ) < 1 means Almost Confirmation of conclusion of D j . tot 3. The total degree μtot R P D (Pq , D j ) such that ε ≤ μ R P D (Pq , D j ) < 0.6 means Possible Confirmation of conclusion of D j . tot 4. The total degree μtot R P D (Pq , D j ) such that ε < μ R P D (Pq , D j ) < −ε means “unknown” about Confirmation of conclusion of D j . tot 5. The total degree μtot R P D (Pq , D j ) such that −0.6 ≤ μ R P D (Pq , D j ) ≤ −ε means Possible Exclusion of conclusion of D j . tot 6. The total degree μtot R P D (Pq , D j ) such that −1 < μ R P D (Pq , D j ) ≤ −0.6 means Almost Exclusion of conclusion of D j . 7. The total degree μtot R P D (Pq , D j ) = −1 means Absolutely Exclusion of conclusion of D j .

Where ε is a heuristic value and in our case ε = 0, 1. Let recall that D j consists of Yin syndrome, Yang syndrome, superficial syndrome, deep syndrome, Cold syndrome, Hot syndrome, Deficiency syndrome, Excess syndrome.

2.3 Knowledge Acquisition 2.3.1

Rules from Experts

Most rules are formed by traditional medicine doctors from National Hospital of Traditional medicine. To form these rules, we listed all the symptom seen in eight rules diagnosis patients (there are about more 60 such symptoms, each Yin syndrome, Yang syndrome, superficial syndrome, deep syndrome, Cold syndrome, Hot syndrome, Deficiency syndrome, Excess syndrome consists about 10 symptoms). We sorted these symptoms by the frequency of their occurrence in eight rules diagnosis patients. Then, we formed all possible combinations of the most frequent symptoms and some combinations involving less frequent symptoms, and asked the traditional medicine doctors to estimate the degree with which this combination of symptoms confirms or excludes eight rules diagnosis.

2.3.2

Statistical Approach

In this approach, for each combination of symptoms, instead of asking a traditional medicine doctor, we look into the database of already diagnosed patients, find all the patients who had these symptoms, and estimate the possibility degree as, e.g. the proportion of those who had syndrome of eight syndrome diagnosis [1]. This approach is efficient and fast, but to get statistically justified estimates, we must have a large database of patient’s records with correct diagnosis, and the existing database is sometimes not large enough.

ESYNDIAG: A Fuzzy Expert System for Eight Syndrome Diagnosis …

2.3.3

133

Verifying the Rule Base

Traditional medicine doctors are not perfect, and, as a result, their rules may not exactly correct. Similarly, statistical rules are gathered from a limited data, and some of them may therefore be wrong. It is desirable to maintain the correctness of the rule base and to avoid conflicts between the rules. The accuracy of expert rules depend on the traditional medicine doctor’s skills. At present, we simply combine all traditional medicine doctors’ rules together; in the future, we are planning to test the traditional medicine doctors’ diagnostic abilities and to “weight” rules proposed by different traditional medicine doctors based on their different diagnostic abilities [20]. At present, the rule base is maintained by a group consisting of leading traditional medicine doctors. These doctors evaluate all the rules, including the statical rules, and eliminate rules which they believe to be false. After that, the system is applied to different patients, and the results are shown to the traditional medicine doctors’ group. If the doctors’ group sees a wrong diagnosis, it proposes a way to correct the rules. To avoid the conflict between the rules, every time a new rule is being added to the rule base, ESYNDIAG checks whether this new rule is in conflict with any of existing rules; if there is a conflict, the doctors’ group decides which of the two conflicting rules to keep.

2.4 Explanation To make it easier to understand the reasoning process and the diagnostic result, ESYNDIAG must be able to explain how and why it comes to a certain conclusion about possibility of being eight rules syndromes of a patient. During the diagnosis, the reasoning engine browses the rule base and marks all the matched rules. When the diagnosis is completed, the explanation is formed by collecting all the matched rules and every step of reasoning using each matched rule. In this explanation, ESYNDIAG shows its final conclusion, all sets of patient’s symptoms which were used in reasoning, and the rules which matched with each set. As a result, the users can see the intermediate diagnostic conclusions from all three steps of the diagnosing process, and the way the rules affect the final conclusion.

3 An Example of the Performance of the System The domain experts use the knowledge acquisition module to enter the positive rules and negative rules for eight syndrome diagnosis of Vietnamese traditional medicine. An example of acquisition for the positive rules and negative rules is represented in Figs. 1 and 2.

134

H. P. Nguyen et al.

Fig. 1 An example of a positive rule for confirmation of Excess syndrome diagnosis

Fig. 2 An example of a negative rule for confirmation of Excess syndrome diagnosis

For example: IF Fever with afraid of wind, afraid of cold, headache, body pain, Stuffles and cough THEN confirms Superficial syndrome with degree 0.85. For example: IF Cold hand and leg THEN excludes Excess syndrome with degree 0.05. The system ESYNDIAG is oriented to be used in the areas where there is a lack of traditional medicine experts. This system communicates with users via menu mode, by selecting the appropriate answers. These answers are used as input for the diagnosis process. For starting a diagnosis, ESYNDIAG lists all possible symptoms of eight syndromes as Yin syndrome, Yang syndrome, superficial syndrome, deep syndrome, Cold syndrome, Hot syndrome, Deficiency syndrome, Excess syndrome with possible degrees of presence of each symptom. A user chooses symptoms which are present in a given patient, together with their degrees of presence. This information forms an input for the diagnosis process. Figure 3 illustrates the symptoms observed from the examined patient. The list of symptoms with the degrees of belief observed from the examined patient in Fig. 3 is the following:

ESYNDIAG: A Fuzzy Expert System for Eight Syndrome Diagnosis …

135

Fig. 3 An example of symptoms observed from the examined patient (in Vietnamese)

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.

Fever with afraid of wind: 1 Afraid of wind: 0.9 Headache: 0.9 Body pain: 0.8 Stuffles and cough: 0.8 Fever but love of cold dishes: 0.9 Red face: 0.9 Warm hand and leg: 0.7 Short stream of reddish urine: 0.9 Red tongue: 0.8 Dry and yellow coating: 0.8 High fever and thirsty: 0.9 Low sprits and fatigue: 1 Loud and rough breathing: 0.9 Abdominal fullness: 0.8 Constipation: 0.8 Yellow coating: 0.9 Excess and forcefull pulse: 0.9 Neurousness: 0.7

After the input is completed, the system processes this and returns the diagnosis conclusion together with its degree of belief. An example of the result of eight syndrome diagnosis is given in Fig. 4. Result of eight syndrome diagnosis translated in English in Table 1.

136

H. P. Nguyen et al.

Fig. 4 Result of eight syndrome diagnosis (in Vietnamese)

Table 1 Result of eight syndrome diagnosis translated into English Num.

Syndromes

1.

Excess syndrome

Diagnosis with degrees 0.79

Conclusions Almost Confirmation of Excess syndrome

2.

Superficial syndrome

0.79

Almost Confirmation of Superficial syndrome

3.

Hot syndrome

0.68

Almost Confirmation of Hot syndrome

4.

Yang syndrome

0.68

Almost Confirmation of Yang syndrome

5.

Deep syndrome

0.09

“unknown” about Deep syndrome

6.

Deficiency syndrome

0.09

“unknown” about Deficiency syndrome

7.

Yin syndrome

−0.05

“unknown” about Yin syndrome

8.

Cold syndrome

−0.05

“unknown” about Cold syndrome

4 Evaluation To test ESYNDIAG, we applied it to several hundred patients and compared the systems’s diagnoses with the diagnoses of leading traditional medicine doctor’s group. We also used hundred of archived patient’s records from the Hospital of traditional medicine as the medical input data for ESYNDIAG, and then compared the system’s conclusions with the recorded diagnoses. In the vast majority of cases, ESYNDIAG’s

ESYNDIAG: A Fuzzy Expert System for Eight Syndrome Diagnosis …

137

diagnosis was the same as recorded by the traditional medicine doctor’s group. There were differences in diagnosis only in the few cases in which patients had rare symptoms which we originally did not take into consideration in our design of the system. This drawback of our system was corrected by updating the list of symptoms, the list of set of symptoms, by adding new rules, by slightly adjusting the possibility values in several existing rules. After this correction, the system worked well on all recorded patient data. Our general conclusion is that ESYNDIAG diagnoses with a high accuracy. Its only drawback seems to be that it requires more than 900 rules; the results would be more intuitively clear and probably computationally faster if we could somehow compress these rules into fewer more general rules.

5 ESYNDIAG’s Application Scope The purpose of ESYNDIAG project is to solve the problem of eight rules diagnosis as a first basic diagnosis which can be as an orientation for treatment in traditional Vietnamese medicine. “If a patient has a Yang nature consisting of Yang syndrome, Hot syndrome, superficial syndrome, Excess syndrome, then in treatment, one should use herbal plants with Yin nature to make balance between Yang and Yin”. On the other side, “if a patient has a Yin nature consisting of Yin syndrome, deep syndrome, Cold syndrome, Deficiency syndrome, then in treatment, one should use herbal plants with Yang nature”. Once the rule base of ESYNDIAG is completed, it would be able to diagnose as well as an experienced traditional medicine doctor. We expect it to diagnose even better than an experienced traditional medicine doctor, because our system contains knowledge provided not by a single traditional medicine doctor, but by many leading experienced doctors, and it also uses a large body of reliable records of previously correctly diagnosed patients. ESYNDIAG can be used as a consultation tool for eight rules diagnosis not only in Vietnam, but anywhere, in the world, especially, in the oriental countries such as China, Korea, Japan, India etc. In Vietnam, traditional medicine practitioners is decreasing due to the modern medicine is developed years by years. It is a reason, there is a lack of experienced traditional medicine doctors. Therefore ESYNDIAG is a very useful consultation tool for eight rules diagnosis, especially, in the provinces and country side in Vietnam. One application of ESYNDIAG is in education. Students and unexperienced doctors can use ESYNDIAG as a good teacher available anywhere at any time.

138

H. P. Nguyen et al.

6 Conclusions This paper presents an overview of ESYNDIAG project. In this paper, we described the key facts about eight syndrome diagnosis, and the reason why a medical expert system would be good solution to solve the problem of diagnosing eight syndromes of the eight syndrome diagnosis. We described the general structure of the system, the system’s knowledge base, and the diagnosis process which uses this knowledge base. In order to improve ESYNDIAG, we are currently working in two research directions. First, we keep maintaining and updating ESYNDIAG’s rule base. Second, we are trying to improve the form of positive and negative rules by including the importance of symptoms and reasoning engine so that it can diagnoses with higher accuracy and speed.

References 1. Phuong, N.H.: Towards intelligent systems for Integrated western and eastern medicine. The Gioi Publishers, Hanoi (1997) 2. Shortliffe, E.H.: Computer Based Medical Consultation: MYCIN. Elsevier, New York (1976) 3. Adlassnig, K.-P.: CADIAG-2: Computer – assisted medical diagnosis using fuzzy subsets. In: Gupta, M.M., Sanche, E. (eds.) Approximate Reasoning in Decision Analysis, pp. 219–247. North-Holland Publishing Company, Amsterdam (1982) 4. Daniel, M., Hájek, P., Nguyen, P.H.: CADIAG-2 and MYCIN-like systems. Artif. Intell. Med. 9(3), 241–259 (1997) 5. Kandel, A.: Fuzzy Expert Systems. CRC Press, Boca Raton (2000) 6. Shortliffe, E.H., Buchanan, B.G., Feigenbaum, E.A.: Knowledge engineering for medical decision making: A review of computer-based clinical decision aids. Proc. IEEE 67, 1207–1224 (1979) 7. Van Ky, T.: Handbook of Diagnosis and Treatment of Oriental medicine. Danang Publisher, Da Nang (2015) 8. Giaratano, J., Riley, G.: Expert Systems: Principles and Programming. PWS Publishing Company, Boston (1994) 9. Bennett, J.S., Engelmore, R.S.: Experience using EMYCIN. In: Buchanan, B., Shortliffe, E. (eds.) Rule-Based Systems, p. 314. Addison-Wesley, Reading (1984) 10. Buckley, J.J., Siler, W., Tucker, D.: A Fuzzy Expert System. Fuzzy Sets Syst. 20(1), 1 (1986) 11. Miller, R.A., Pople, H.E., Myers, J.D.: INTERNIST-1, an experimental computer-based diagnostic consultant for general internal medicine. New Engl. J. Med. 307(8), 468–476 (1982) 12. Zadeh, L.A.: Fuzzy Sets. Inf. Control 8, 338–353 (1965) 13. Zadeh, L.A.: The role of fuzzy logic in the management of uncertainty in expert systems. Fuzzy Sets Syst. 11, 199 (1983) 14. Phuong, N.H.: Fuzzy set theory and medical expert systems. Survey and model. In: Proceedings of SOFSEM 1995. Theory and Practice in Informatics, Lecture Notes in Computer Science, no. 1012, pp. 431–436. Springer (1995) 15. Phuong, N.H., Kreinovich, V.: Fuzzy logic and its applications in medicine. Int. J. Med. Inform. 62, 165–173 (2001) 16. Lu, H.C.: Chinese Natural Cures: traditional methods for remedies and preventions. Black Dog & Leventhal Publishers, Inc., New York (1986)

ESYNDIAG: A Fuzzy Expert System for Eight Syndrome Diagnosis …

139

17. Phuong, N.H.: FuzzRESS: a fuzzy rule-based expert system shell for medical consultation combining positive and negative knowledge. In: Proceedings of NAFIPS 2016, Annual Conference (North American Fuzzy Information Processing Society), El Paso, Texas, USA, 31 October–4 November 2016, pp. 271–277 18. Thuy, T., Nhac, P.H., Chau, H.B. (eds.): Traditional Medicine. Medical Publisher, Hanoi (2005) 19. Hajek, P., Havranek, T., Jirousek, R.: Uncertain Information Processing in Expert Systems. CRC Press, Boca Raton (1992) 20. Phuong, N.H., Cuong, T.N., Ha, M.: Approach to integrating expert knowledge for medical expert systems. In: Proceedings of VJFUZZY 1998: Vietnam-Japan Bilateral Symposium on Fuzzy Systems and Applications, Halong Bay, Vietnam, 30 September–2 October 1998, pp. 126–132 (1998) 21. Phuong, N.H., Santiprabhob, P., Hirota, K.: A fuzzy modelling for modifying standard prescriptions of oriental traditional medicine. J. Adv. Comput. Intell. Intell. Inform. 7(3), 339–346 (2003) 22. Thang, C., Cooper, E.W., Hoshino, Y., Kamei, K., Phuong, N.H.: A proposed model of diagnosis and prescription in oriental medicine using RBF neural networks. J. Adv. Comput. Intell. Intell. Inform. 10(4), 444–450 (2006) 23. Phuong, N.H., Nguyen, A., Thuy, T.H.: An approach to pulse symbols based fuzzy reasoning in diagnosis of traditional Vietnamese medicine including the importance of symptoms. In: Kearfott, R.B., et al. (eds.) Fuzzy Techniques: Theory and Applications, pp. 406–416. Springer, Cham (2019). (2019 IFSA World Congress and NAFIPS Annual Conference, 18–22 June 2019, Lafayette, Louisiana, USA) 24. Nu, M.T., Phuong, N.H., Dung, H.T.: STRESSDIAG: a fuzzy expert system for diagnosis of stress types including positive and negative rules. In: Kearfott, R.B., et al. (eds.) Fuzzy Techniques: Theory and Applications, pp. 371–381. Springer, Cham (2019). (2019 IFSA World Congress and NAFIPS Annual Conference, 18–22 June 2019, Lafayette, Louisiana, USA)

Manifold Based Data Refinement for Biological Analysis Dao Nam Anh

Abstract This work presents the study into a new manifold method for dimension reduction in digital biological analysis. Extracting features from experiments for multiclass classification task using machine learning is challenging due to different resource populations and various biological sub domains. In data training with a large number of features and samples, errors in classification can occur if efficient feature detection method is not pursued. The aim of the paper is to make clear why some subsets of training samples and features are more appropriate than others. We used Bayesian reasoning under multivariate analysis of learning process to validate and then decrease the number of features used in both training and testing. During training, the number of samples is also reduced by suitability assessment. The method have been designed for rapid and scalable learning by combining selection of features and filtering training samples. Further the article includes experiments of the method with SVM classification model and performance evaluation for digital biological analysis.

1 Introduction Inspection of biological data by assistance of data analysis approaches now is widely recognized as efficient approach [1]. However, dealing to biological data like genes and diseases with large volume data often raise concerns with biologists themselves as to the time-consuming and incomplete study. The concept of big data relating aspects of volume, velocity, value, veracity and variety have been used to manage effectively these aspects [2]. Very often, the big data analysis process involves an essential step to reduce the volume of dataset by determining consistency of features in classification. There remain, however, a number of significant difficulties with the approach. D. N. Anh (B) Faculty of Information Technology, Electric Power University, Hanoi, Vietnam e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_13

141

142

D. N. Anh

• While a group of features has proven successful in depicting specialties of biological data, other features are not well-defined enough for similar distinctness. Instead, a large number of features may lead to increase the computing time for training and testing tasks. • In a multi-source dataset, difference of acquisition conditions is observed and possibly, makes the dataset inconsistent in training. The contrariety of training samples of the same class may reduce accuracy of classification. We here address these questions in the context of high-dimensional data: the study of the finding significant low-dimensional patterns obscured in their high-dimensional observations [3]. In our approach of Manifold Data Refinement (MDR) for biological analysis, the manifold solver method is extended and applied for feature dimension reduction basing on Bayesian reasoning of learning process. The proposed MDR method processes the evaluation of each biological feature in accordance with annotated classes, and selection of features for further learning tasks. We will show that the a reasonable solution of dimension reduction for samples and features can provide constructive data refinement and effective improvement of performance in both classification accuracy and computing time. The proposed method is applied for biological analysis on a high-dimensional database. The paper is structured as follows: Sect. 2 outlines the prior works, while Sect. 3 describes MDR method applied for biological analysis. Experimental results and discussion are presented in Sect. 4, while Sect. 5 concludes the paper.

2 Prior Works It is necessary to study the specific contributions of the rational methods and their implementation in biological analysis. For wide databases, a large number of features and a small number of objects are observed. In particular, selecting a subset of features from the high-dimensional data is broadly studied [4]. Authors of [5] use crossvalidation for pre-selection before applying cross-validation protocol to test a classifier using selected features. Here we use the two sided approach for selecting features and training samples. This helps to remove inaccurate subset of data and the proper preparation can improve classification performance. Each feature of the samples is characterized by an indefinite kernel in multiple kernel learning for feature selection by [6]. The global optimization problem can be non-convex and a primal framework of indefinite kernel support vector machine was addressed to reformulate the non-convex optimization problem as a difference of convex functions. Within classification methods, deep learning, which is developed from neural networks, describes a class of machine learning algorithms, has lately showed remarkable results across a diversity of domains including biological analysis [7]. In learning parameters for selection of features by deep learning, an approach uses two different aspects of Kestrel bird behavior to adjust the learning rate until the optimal value of the

Manifold Based Data Refinement for Biological Analysis

143

parameter is achieved [8]. Moreover, recurrent neural network with long short term memory network was applied to select features and the accuracy of classification. To simultaneously manage sample and feature selection, unsupervised learning approach was addressed by [9]. Using the a framework to conduct jointly active learning and feature selection based on the CUR matrix decomposition [10], both the selected samples and features can approximate the original dataset. It has been noted that certain learning methods, such as the k-nearest neighbor, fuzzy k-nearest-neighbor, SVM, perform dissimilarly on the same image dataset when applying different methods for extracting features from biological fields. In such cases the problem of feature consistency is complicate [11]. So, the feature selection is important, especially for dimension reduction in case of high dimension database. In this paper the dimension of data is proposed to be considered for reduction in both sides: training samples and features. The objective is to improve the quality of data training which can be implemented for most of supervised learning methods.

3 The Method of Manifold Refinement The biological analysis consists of two phases: (1) Training: kernels estimation for a machine learning model; and (2) Classification: the analysis involving class detection by checking the new observation likelihood to the estimated kernels. Details of this are given in the next subsections.

3.1 Training As can be observed, all the features f are layered for learning samples s which represent biological instances. This confirms that the number of samples is large enough to provide the diversity of training data and the variety of real life data. And this clearly requires that each training sample is annotated by a class c for supervised machine learning [12]. The interpretation of the learning process to inspect validity and computing performance must consider at least four distinct sources of uncertainty including samples s, class c, feature f and classification error rate e (Fig. 1). The chain graph of the learning process can be interpreted initially as follows: p(s, c, f, e) = p(e|s, c, f ) p(s, c, f )

(1)

p(s, c, f, e) = p(e| f ) p( f |c) p(c|s) p(s)

(2)

144

D. N. Anh

Fig. 1 Probabilistic view of analysis process. s - sample, c - class, f - feature, and e prediction error rate

We typically use distribution associated with a chain graph in Fig. 1 as suggested in [13]. This graph identifies the chain components covering samples si , i = 1..m, features f j , j = 1..n and classes cl , l = 1..k by (3). p(s, c, f, e) =

  

p(e| f j ) p( f j |cl ) p(cl |si ) p(si )

(3)

i=1..m j=1..n l=1..k

We perform analysis of the distribution to compute error rate e by (4) and set a factual objective to minimize it as possible (5). p(e) =



p(s, c, f, e)

(4)

s,c, f

p(e) → min

(5)

One must therefore look up components of (3) in details. Given a class c, and by selecting a subset of samples labeled by the class c, the components p( f j |cl ) p(cl |si ) are distinct for predicfrom (3) can present how much the feature f j and the sample si  1 tion of the class cl . Hence, average on samples of the subset c s p( f |c) p(c|s) will show the distinction of the feature f in the ability of prediction of the class c, where c is the number of samples labeled by the class c. To encourage the classification we estimate the response of each feature to samples which are not labeled by the class c. ¯ The response is similar to mentioned case but for other subset of samples which are labeled by c¯ and the result is calculated by 1c ¯ p(c|s). ¯ We employ formula s p( f |c) ¯ (6) to measure the actual capacity of feature f in detection of samples which are annotated by class c. To state the learning distinction of a feature f considering all classes, one can use (7). dc ( f ) =

1  1  p( f |c) p(c|s) − p( f |c) ¯ p(c|s) ¯ c s c ¯ s d( f ) =



dc ( f )

(6)

(7)

c

Similarly, we perform classification capacity analysis of training samples instead 1 of features. Formula c f p( f |c) p(c|s) has been found to be fruitful for the interpretation of the actual efficiency of sample s for training to detect class c given all features f . In the following, a detailed feature of samples is not needed. What is significant is only the fact that the strength of a sample s in training for a class

Manifold Based Data Refinement for Biological Analysis

145

c prediction is measured by dc (s) (8). This will allow to estimate general training capacity of a training sample s for all the classes (9). dc (s) =

1  1  p( f |c) p(c|s) − p( f |c) ¯ p(c|s) ¯ c f c ¯ f d(s) =



dc (s)

(8)

(9)

c

The use of the original distribution of labeled samples was seen to calculate learning effectiveness of each feature by (10) and each training sample by (11). This can be associated to the demand of reducing dimension of training data both in sample and feature regardless of which learning method will be applied.   d( f ) = d( f j ) , j = 1..n

(10)

d(s) = [d(si )] , i = 1..m

(11)

Note that, for minimizing the learning error in (5), the features f and training samples s need to be selected by their capacity of classification which is delivered by (10) and (11). In fact, the reduction of dimension improves the prediction capability, turns down classification error rate and computational effort.

3.2 Manifold Based Data Refinement for Classification Massive data sets of biological data are regularly collected from various sources. The essential value of manifold techniques is the flexibility to deal with the large diversity laboratory condition, and to get fewer parameters than many other methods. The aim of manifold is to manage computational scalability and so to make the learning process computationally productive [14]. Ideally, the manifold refinement for biological analysis should take place when features are collected in high dimension and reduction should be considered for training samples and/or for features. In principle, the task of sample and feature selection is performed for data before training. The features f from (10) have values of d( f ) sorted and the features which are associated with large values can be selected. This is a similar case for training sample selection when the vector d(s) from (11) is sorted to choose the training samples associated to the highest values of d(s). By selecting features and/or training samples by their effectiveness with classification ability, the training tasks can produce an adequate knowledge base. Further, we can perform validation of test data by dimension reduction with the selected features (Fig. 2).

146

D. N. Anh

Fig. 2 Procedure of Manifold Refinement for biological analysis

4 Experimental Results To verify the performance of the algorithm, ALLAML, Carcinom and CLL_SUB_111 - benchmark biological database [15] are chosen for experiments. The ALLAML database consists of 72 samples in 2 classes of gene expression data. Each sample contains 7,129 features of gene. The CARCINOM dataset contains 174 samples in 11 classes and 9,182 genes features. The CLL_SUB_111 is also genes dataset which contains 111 instances, 3 classes and 11,340 features. The feature dimension of ALLAML, Carcinom and CLL_SUB_111 are large enough for implementation of manifold. For each test case we do 10 trails, each of them has a random split of the database into 2 sets, one for training and the other for test. Performance report is based on average of the test of all trails. Each sample is annotated and there are total different numbers of classes for each biological dataset. Given a predefined class for test data T and resulted classification by our algorithm P, their correctness can be measured by accuracy (12) [16] for estimating prediction accuracy instead of error rate:   |P ∩ T | +  P¯ ∩ T¯     accuracy(P, T ) = 2 ∗ |T | + T¯ 

(12)

For the reduction variation, we choose a number of features which are variable including 5%, 10%,..100% for creating a range of increasing number of features from (10). The number of samples in training is also assigned by a range of 10%, 20%,..100% of total number of training samples in each trail from (11). For such data sets, the SVM classification model was applied for experiments. Our first results of the evaluated accuracy (12) for SVM are reported in Table 1, where columns show the range of feature dimension and rows display variable reduction ratio of training sample dimension.

Manifold Based Data Refinement for Biological Analysis

147

Table 1 Accuracy by dimension of features and training samples for ALLAML Sample Feature 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

0.74 0.86 0.84 0.85 0.88 0.91 0.87 0.91 0.90 0.88

0.83 0.87 0.93 0.87 0.89 0.91 0.92 0.89 0.92 0.92

0.88 0.91 0.83 0.92 0.87 0.92 0.96 0.91 0.90 0.94

0.86 0.89 0.91 0.94 0.91 0.91 0.95 0.88 0.93 0.91

0.93 0.91 0.93 0.95 0.90 0.97 0.88 0.94 0.96 0.94

0.92 0.93 0.9722 0.93 0.95 0.94 0.91 0.94 0.94 0.91

0.87 0.93 0.92 0.91 0.93 0.94 0.94 0.91 0.91 0.93

0.86 0.92 0.95 0.93 0.93 0.92 0.89 0.90 0.95 0.94

0.88 0.90 0.93 0.89 0.91 0.96 0.94 0.92 0.94 0.94

0.89 0.93 0.93 0.87 0.94 0.89 0.95 0.96 0.94 0.89

Fig. 3 Accuracy by dimension of a features and b training samples for ALLAML

As this is a bi-variate sample - feature series, the Fig. 3a demonstrates how the accuracy is changed for each feature dimension while fixing sample dimension. The effectiveness of features for training by estimation function (7) is correctly demonstrated as the best accuracy (0.9722) is archived by 60% features and 30% training samples for SVM. Fig. 3b displays the tendency of accuracy change for each sample dimension while keeping feature dimension unchanged. The way to go down by decreasing sample dimension in Fig. 3b is certainly the pragmatic convince for the measurement of (9). The data analysis by the manifold technique can applied for biological databases. We also conducted the manifold analysis for Carcinom and CLL_SUB_111, their reports are presented in Tables 2 and 3. In terms of accuracy by changing the sample dimension, Fig. 4 displays advantage of dimension reduction for Carcinom while Fig. 5 demonstrates the benefit for CLL_SUB_111. The best accuracy is achieved for Carcinom (0.8625) when 20% features and 60% training samples are selected. The best accuracy of CLL_SUB_111 is 0.8437 by selecting 60% features and 40% training samples.

148

D. N. Anh

Table 2 Accuracy by dimension of train samples for CARCINOM Sample Feature 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

0.75 0.79 0.77 0.79 0.84 0.80 0.85 0.82 0.83 0.81

0.74 0.83 0.85 0.80 0.80 0.8625 0.84 0.79 0.77 0.81

0.77 0.83 0.85 0.81 0.84 0.80 0.85 0.84 0.82 0.81

0.75 0.81 0.82 0.76 0.84 0.79 0.81 0.84 0.90 0.85

0.79 0.79 0.83 0.82 0.84 0.82 0.85 0.86 0.85 0.83

0.82 0.83 0.82 0.83 0.84 0.88 0.84 0.78 0.83 0.85

0.81 0.81 0.81 0.84 0.83 0.83 0.83 0.84 0.77 0.83

0.79 0.80 0.81 0.82 0.84 0.86 0.83 0.80 0.81 0.85

0.81 0.82 0.85 0.78 0.80 0.85 0.85 0.85 0.78 0.77

0.77 0.78 0.85 0.85 0.84 0.85 0.83 0.86 0.82 0.79

Table 3 Accuracy by dimension of features for CLL_SUB_111 Sample Feature 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

0.67 0.71 0.77 0.74 0.82 0.81 0.81 0.82 0.82 0.82

0.71 0.77 0.82 0.81 0.77 0.82 0.81 0.82 0.85 0.82

0.70 0.77 0.83 0.81 0.78 0.80 0.82 0.84 0.82 0.83

0.79 0.83 0.84 0.79 0.81 0.82 0.78 0.76 0.84 0.82

0.77 0.79 0.81 0.82 0.83 0.80 0.83 0.80 0.82 0.83

0.83 0.83 0.80 0.8437 0.84 0.79 0.80 0.87 0.79 0.80

0.80 0.82 0.83 0.81 0.78 0.86 0.80 0.83 0.83 0.82

0.81 0.81 0.81 0.82 0.80 0.80 0.83 0.85 0.83 0.78

0.82 0.83 0.85 0.83 0.81 0.84 0.83 0.84 0.83 0.85

Fig. 4 Accuracy by dimension of a features and b samples for CARCINOM

0.84 0.83 0.79 0.84 0.78 0.86 0.83 0.85 0.84 0.84

Manifold Based Data Refinement for Biological Analysis

149

Fig. 5 Accuracy by dimension of a features and b samples for CLL_SUB_111 Table 4 Accuracy by dimension of features in SVM (average by number of training samples) Dataset Feature 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% ALLAML 0.86 CARCINOM 0.81 CLL_SUB_111 0.78

0.89 0.81 0.80

0.90 0.82 0.80

0.91 0.82 0.81

0.93 0.83 0.81

0.93 0.83 0.82

0.92 0.82 0.82

0.92 0.82 0.81

0.92 0.82 0.83

0.92 0.82 0.83

Table 5 Accuracy by dimension of samples in SVM (average by number of features) Dataset Feature 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% ALLAML 0.86 CARCINOM 0.78 CLL_SUB_111 0.77

0.90 0.80 0.80

0.91 0.82 0.81

0.91 0.82 0.81

0.91 0.92 0.83 0.83 0.80 0.82

0.92 0.83 0.82

0.92 0.83 0.83

0.92 0.82 0.82

0.92 0.83 0.82

Fig. 6 Accuracy by dimension of a features and b samples in SVM

Averaging test results by number of training samples, Table 4 and Fig. 6a highlight the effect of reduction of number of features for the accuracy. ALLAML, Carcinom and CLL_SUB_111 can keep their original accuracy when decreasing number of features to 50%, 50% and 90% accordingly. By averaging test results by number of features, Table 5 and Fig. 6b show the influence of reduction of number of training samples for the accuracy. ALLAML, Carcinom and CLL_SUB_111 can keep their original accuracy when reducing the number of training samples to 60%, 50% and 60% respectively.

150

D. N. Anh

Table 6 Total learning and testing time by dimension of features in SVM Dataset Feature 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% ALLAML 0.58 CARCINOM 0.74 CLL_SUB_111 0.82

0.66 0.78 1.06

0.77 0.80 1.06

0.74 0.89 1.20

0.83 0.92 1.37

0.90 1.00 1.39

0.88 1.02 1.55

0.93 1.22 1.67

0.97 1.26 1.83

1.00 1.31 2.05

Fig. 7 Total learning and testing time by dimension of features

Clearly, reduction of data size can improve the computing performance. Table 6 and Fig. 7 report the computing times for each dataset, with parameters of reduction. The time is decreased when the dimension is reduced for all datasets. In this performance of dimension reduction for feature and sample, the report can give additional information to improve the quality of learning process. Firstly, these measurements function (6)–(11) can be applied with specific ratios of dimension reduction to gain computing resource while dealing with large dimension data. Secondly, data of samples with negative performance can raise attention to find a solution (for ex. recapture or removal) for improving learning efficacy. Finally, the assessment to the samples and features allows us to make the manifold data scalable in an inexpensive way. Incorporating the proposed manifold method for features and training samples allows to reduce data dimension while enhancing the classification accuracy. The computing time for training and test is significantly shorten by reduction of the number of features and the number of training samples (Fig. 7). We may therefore report the state of the art classification accuracy for digital analysis of biological data in Table 7. The methods of cross-validation (CV) [15], Multiple kernel learning for feature selection (MKLFS) [6], Kestrel-based Search (KS) [8] and CUR [9] performed experiments on the same biological databases. As our method is based on reduction of feature and training samples dimension the results depend on the reduction ratios.

Manifold Based Data Refinement for Biological Analysis

151

Table 7 Classification accuracy Dataset

Samples

Features

Classes

Ours

CV [15]

MKLFS [6]

92.30

97.14

ALLAML

72

7,129

2

97.40

Carcinom

174

9,182

11

91.70

CLL_SUB_111

111

11,340

3

84.37

KS [8]

CUR [9]

78.11

71.40

78.47 88.70

5 Conclusions We have presented a novel method for data analysis based on manifold technique and demonstrated effectiveness for biological analysis. The data refinement approach is based on manifold and is computationally attractive without implementing data alignment task. Over the course of Bayesian analysis a method of measurement for the effectiveness of both training samples and features in learning process is derived. Selection method with computing complexity in linear order, which is often more reasonable than other computing extensive process, is our proposed frame for the manifold problem to cope with large dimension database. Definitely, the dimension reduction acts as filters which prevent classification conflict between training samples and between features. There are many similarities between results extracted from experiments for biological dataset including ALLAML, Carcinom and CLL_SUB_111, and our experiments demonstrated the accuracy improvement by selecting training samples and features. It describes the fact that inadequacy of a large data due to multiple sources can be analyzed and resolved with manifold technique and this can apply for different learning models. For biological analysis, one might also develop further the technique to manipulate reduction ratios which acts as data preparation task for learning.

References 1. Krohs, U.: Convenience experimentation. Stud. Hist. Philos. Sci. Part C: Stud. Hist. Philos. Biol. Biomed. Sci. 43, 52–57 (2011) 2. Breur, T.: Statistical power analysis and the contemporary “crisis” in social sciences. J. Mark. Anal. 4(2–3), 61–6 (2016) 3. Ma, Y., Fu, Y.: Manifold Learning Theory and Applications. CRC Press Taylor & Francis Group, Boca Raton (2012) 4. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003) 5. Kuncheva, L.I., Rodriguez, J.J.: On feature selection protocols for very low-sample-size data. Pattern Recogn. 81, 660–673 (2018) 6. Xue, H., Song, Y., Xu, H.M.: Multiple indefinite kernel learning for feature selection. In: Sierra, C. (ed.) Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI 2017), pp. 3210-3216. AAAI Press (2017) 7. Ching, T., Himmelstein, D.S., Beaulieu-Jones, B.K., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. Roy. Soc. Interface 15(141), 20170387 (2018)

152

D. N. Anh

8. Agbehadji, I.E., Millham, R., Fong, S.J., Yang, H.: Kestrel-Based Search Algorithm (KSA) for parameter tuning unto long short term memory (LSTM) network for feature selection in classification of high-dimensional bioinformatics datasets. In: 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), Poznan, pp. 15-20 (2018) 9. Li, C., Wang, X., Dong, W., Yan, J., Liu, Q., Zha, H.: Joint active learning with feature selection via CUR matrix decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1382–1396 (2019) 10. Christos, B., Woodruff, D.P.: Optimal CUR matrix decompositions. In: STOC 2014 Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing (2014) 11. Manoranjan, D., Huan, L.: Consistency-based search in feature selection. Artif. Intell. 151(1– 2), 155–176 (2003) 12. Tschandl, P., Rosendahl, C., Kittler, H.: The HAM10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 (2018) 13. Barber, D.: Bayesian Reasoning and Machine Learning. Cambridge University Press, Cambridge (2012) 14. Izenman, A.J.: Modern Multivariate Statistical Techniques Regression, Classification, and Manifold Learning. Springer, Heidelberg (2008). https://doi.org/10.1007/978-0-387-78189-1 15. http://featureselection.asu.edu/datasets.php 16. ISO 5725-1:1994: Accuracy (trueness and precision) of measurement methods and results part 1: general principles and definitions (1994)

Prediction of Wart Treatment Using Deep Learning with Implicit Feature Engineering Khang Nguyen, Nga Mai, An Nguyen, and Binh P. Nguyen

Abstract Warts are skin growths that are caused by the human papillomavirus (HPV) through direct or indirect contact with an object used by a person with the virus. There are different types of warts and different treatment methods accordingly. Of which the cryotherapy method is preferably used to enhance the existing conventional treatment methods. As the enhanced method, the treatment decision process is very important to keep tracked and support for similar cases in future. In this study, a decision support system using machine learning is proposed to predict whether the selected wart treatment method could be successful or not using actual samples from a public dataset. There are some machine learning researches in this field using artificial neuron network (ANN) to solve the problem due to the structured dataset which is more suitable for other methods like k-nearest neighbors (kNN) or Random Forest (RF). This study uses a deep learning neuron network (DNN) approach with an implicit feature engineering method to deal with categorical features to learn implicit interactions among them. Furthermore, the k-fold cross validation is used to evaluate the proposed algorithm and the proposed model achieves the results up to 97.78% of Accuracy, 99.94% of the Area Under the Receiver Operating Characteristics Curve (AUC), 98.00% of Sensitivity and 98.00% of Specificity to predict the wart treatment method using the public cryotherapy dataset from the UCI Machine Learning Repository. This confirms that the proposed framework outperforms other methods using the same dataset.

K. Nguyen (B) IBM Vietnam, Hanoi, Vietnam e-mail: [email protected] N. Mai Thang Long University, Hanoi, Vietnam e-mail: [email protected] A. Nguyen R&D Department, PetroVietnam Exploration Production Corporation, Hanoi, Vietnam e-mail: [email protected] B. P. Nguyen School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_14

153

154

K. Nguyen et al.

1 Introduction Machine learning is nowadays applied in almost domains and started impacting into medical applications such as understanding the disease developments, diagnosing and choosing a treatment method. Recently, there was a rule-based system [7, 8] using fuzzy logic developed to predict if warts should be treated by the specific wart treatment method. Warts are skin growths that are caused by the human papillomavirus (HPV) through direct or indirect contact with an object used by a person with the virus. There are different types of warts and different treatment methods accordingly including surgical treatment, cryotherapy, laser, and surgical excision methods. Cryotherapy is the most commonly used methods among the wart treatment methods. Cryotherapy or known as “freezing” treatment in the society is a form of treatment method based on the principle of freezing and destroying abnormal tissues and lesions. There are many researches related to determine the effects of these wart treatment methods using machine learning methods with significant results. This paper proposes to use the deep learning neuron network (DNN) algorithm [5] to solve this problem with some enhancements in applying featuring engineering techniques to identify and capture a relevant relationship among input information to classify the decision, thus can achieve better performance than other methods in these related works. In order to compare fairly with other related works, this research uses the same dataset of Cryotherapy treatment results from ninety patients published by the dermatology clinic of Ghaem Hospital, but the study will apply few other modern methods in feature engineering and use multi-layer perceptron to enhance the performance metrics.

2 Dataset and Proposed Method 2.1 Dataset Overview The Cryotherapy dataset was publicly obtained from the UCI repository which were acquired from the patients who had suffered from the plantar and common warts. This dataset was tracked by the clinic of Ghaem Hospital, Mashhad from 2013 to 2015. There are only ninety samples available in the dataset containing seven features from patients who used the cryotherapy treatment method. The feature “Response to Treatment” from the dataset is the expected binary output as the positive or negative for the decision supporting system.

Prediction of Wart Treatment Using Deep Learning ...

155

2.2 Data Exploration The dataset contains 6 input features and one target field. Except “Sex” is a flag feature and “Type” are categorical features, others are continuous, with the highlight data profile in the Table 1. “Age of patients” starts from 15 to 67, “Time before treatment” is between 0 and 12; “Number of warts” is from 0 to 12 months with mean of 7.7 and “Area of wart” and “Type of wart” are also included. The scatter plot diagram in Fig. 1 shows the relationship among these key features and colored by “Result of Treatment”.

Table 1 Raw features of cryotherapy dataset # Features Description

Ranges

Mean ± SD

1 2 3 4 5 6

Sex Age Time #Warts Type Area (mm2 )

1, 2 15...67 0.25...12 01...12 1, 2, 3 4...750

N/A (Flag) 28.600 ± 13.361 7.667 ± 3.407 5.511 ± 3.567 N/A (Category) 85.833 ± 131.733

-

Result of treatment

0, 1

N/A (Flag)

Male or Female Age in year Months before treatment Number of Warts Common, Plantar, both Surface area of the biggest wart Yes, No

Fig. 1 Plot diagram for key features

156

K. Nguyen et al.

Fig. 2 Data distribution for continuous features

The red bubble is positive, meaning to recommend for treatment, the blue bubble is negative and not recommended for treatment. The diagram shows that most of warts are occurred at the age between 15 and 30 and it is recommended to be treated within the first 10 months affected. The distribution of these features is visualized as the boxplot diagram in the Fig. 2, the data is normalized into the same range [0,1] as the data inputs are in different ranges. We observe there are few outliers in Age and Area inputs. As the dataset is tiny so we do not apply any outliners method to exclude these samples.

2.3 Related Works There are some related researches to determine the effects of these wart treatment methods. However, only a few studies have focused on determining whether these methods will become successful or not before starting the treatment. In a recent study, authors have provided that the cryotherapy method can be applied in wart treatment using statistical evidence levels. Khozeimeh et al. [7, 8] created Fuzzy Logic-based rule tables for the wart treatment method and they achieved the prediction accuracy of 80.7%. In another study, Uzun et al. [14] investigated the performances of these two wart treatment methods including cryotherapy and immunotherapy methods. They found a similar classifier performance by using the 7-nearest neighbor classifiers. They obtained higher classifier performances of accuracy at 85.46% by using Support Vector Machine (SVM). Akben [1] proposes to use decision tree with fuzzy informative images to achieve the prediction accurate rate at 90.44%.

Prediction of Wart Treatment Using Deep Learning ...

157

In another research by Cuvito˘glu and I¸sk [3] which performs the prediction with various AI models, including ensemble model, using the same cryotherapy dataset. Their results have passed 95% accuracy, of which Random Forest (RF) has the highest performance, NB and OC might also be considered successful with 86% and 87% accuracies, respectively. A good point in this paper that they performed the evaluation on many other AI models to select the best one and show that they achieve significant improvements than performance from Khozeimeh, Uzun et al. and Selahaddin Batuhan Akben’s studies as in the Fig. 3. There is a bit surprise in the this performance table that the neuron network (ANN) showed very low compared with other prediction model. This paper mentioned that they failed to achieve higher than 60% accuracy when using ANN even tested with different parameters. However, they did not provide detail information about the ANN model such as the how many neurons and layers are designed for the model and other hype-parameters, so we could not reproduce the results again to investigate why ANN was worse than other AI models. With the rew techniques recently, Putra et al. [9] proposes a boosting-algorithm that uses AdaBoost as a stronger with Random Forest (RF) as a weak learner for developing classification models to show the accuracy of 96.6%. And significantly, Md Mamunur Rahman et al. [11] tried several algorithms including k-NN, NB, Regression Tree and Binary Logistic Regression and they managed to achieve the best performance with k-NN using Radial Basis Function (RBF) kernel with 96.78% accuracy. They proposed to examine the important factors of features then only

Fig. 3 The results of different AI models including ANN [3]

158

K. Nguyen et al.

applied the top important features into the model to optimize the performance. There is one challenge to apply these models is that it requires a domain knowledge about the dataset or it requires an intensive data mining in order to select potential features for the tree to improve the performance. As analysis above, not much researches use the deep learning as the prediction model. This study proposes the DNN model with different techniques in feature engineering and model tuning to identify and capture the feature interaction and patterns implicitly to enhance the prediction performance in order to optimize the performance results.

2.4 Proposed Method This study is motivated by research in the field of deep learning using multiple layers to progressively and automatically capture higher degree features and their interactions from raw inputs. The dataset contains all numeric data with few raw inputs, therefore we propose the simplest type of deep learning algorithms, Feedforward Neural Network (FNN), using the following formula for each deep layer. h l+1 = f (Wl h l + bl )

(1)

where • h l ∈ Rnl is the l th hidden layer, h l+1 ∈ Rnl+1 is the (l + 1)th hidden layer. • Wl ∈ Rnl+1 ×nl which is randomly initiated from a Gaussian distribution having mean 0 and a standard deviation of 1. This matrix will be optimized in the training stage. • bl ∈ Rnl+1 as parameters for the l th deep layer. • f (.) uses function ReLU, f (x) = max(0, x). The final output layer uses the function sigmoid to calculate the probability for each class 1 (2) yˆ = 1 + e−x After training at baseline level with different hidden layers and neurons without any feature engineering steps, we achieve the best results with Accuracy at 81.48%, AUC at 87.9% and Specificity at 60%, Sensitivity at 94.12%. The baseline model is visualized in the Fig. 4 that we found after many attempts • One input layer with 6 features and 1 bias input. • Two hidden layers, each layer has 8 neurons and 1 bias input. We have tried to increase more neurons and hidden layers, but it seems that the dataset is small so the large hidden layer structure is not necessary. • One output layer with only one result of the model as negative or positive value.

Prediction of Wart Treatment Using Deep Learning ...

159

Fig. 4 The FNN baseline model to predict the Wart Treatment method

Even the baseline testing has not yet had a better performance than other published results using kNN or RF [3], but it is still better than the ANN results in the Ali Cuvito˘glu and Zerrin I¸sk’s research [3] that failed to achieve higher than 60%. We also observe that the DNN can automatically learn the importance of each feature by its corresponding weights as in the Fig. 5. According to this diagram, the prediction is highly impacted by the “Time before treatment” and the “Age of patients”, followed by “Area of Wart”. It is less impacted by sex and type of warts. This weighted predictor diagram is important for the feature selection procedure to fine-tune to model using the implicit feature engineering to enhance the performance results from this baseline model. We will analyze this problem deeper in the following section.

2.5 Implicit Features Engineering Feature engineering plays the crucial role for the success of many prediction models. Some models require non-trivial tasks with high business domain and data science skillsets through the explicit feature engineering or intensive try-and-error method. Due to the big data volume and velocity increasing and higher dimensionality of

160

K. Nguyen et al.

Fig. 5 The importance of inputs as predictors

data, there are some methods to bypass these extensive efforts in feature engineering, mostly are using auto feature engineering with embedding techniques, cross features and neural networks using deep learning approach. There is no doubt about the importance of highly skilled data science engineering, instead the feature engineering in this study focuses on how detect and learn the feature interactions automatically and implicitly to enhance the model performance using deep learning with less focus on business data science [13]. The dataset used in this paper is simple with few features without no missing data. As mentioned above, so we don’t exercise explicitly any extensive data featuring to create new complex features for the prediction model. Instead, the model is enhanced by applying the implicit feature engineering such as encoding, embeddings or feature crosses to capture the non-linear relationship internally among features’ categories and interactions externally among features from low- to higher-degree, then concatenated into embedding layer before feeding to the deep learning prediction model. Deep learning neural networks has the promise to capture hidden high-degree feature interactions using embedding and high-dimensional input vectors and non-linear activation functions. Features Scaling (Normalization) As the data ranges for input features are various and in different scales, the feature scaling is applied to normalize these ranges of independent features into the same projected scale. The normalization will ensure that each feature contributes approximately proportionately to the overall calculation. The scaling function is the standardization as below x − mean(x) (3) x = σ Where σ is the standard deviation of x. Features Embedding The most common input is as structured data with two data types as are continuous and discrete attributes. Continuous inputs such as Age, Area, Time... represented by real numbers, meanwhile discrete inputs such as Type, Sex... represented by labels

Prediction of Wart Treatment Using Deep Learning ...

161

or labeled integers (called labeled encoding). Labeled encoding is just used for convenience to label different vocabularies of one feature, such as 1 for Male and 2 for Female, and have no information in themselves. Another example Type in this dataset, which is labeled as 1, 2, 3 for three different types of warts. Such type of data input is called categorical feature [16] and it can be ordered or non-ordered values [4]. It is different from Random Forest (RF) model that is optimized to work with categorical features, artificial neuron network (ANN) is not suitable to handle the categorical features [15] as they do not have semantic continuity in their vocabularies and even if there are it is not explicit. That is probably why Ali Cuvito˘glu and I¸sik [3] can achieve the best performance for this prediction using RF rather ANN when they deal with labeled-integer data types in this dataset. We are trying to find efficient methods [6, 10, 12] to capture the continuity of the categorical features [15], thus we can enhance the capability of DNN to learn the data and capture the relationship among features’ categories and the interactions among these features. Such categorical features are often encoded as one-hot encoding vector with the length of number of categories in the whole feature vocabularies.  Vj =

1 if x is in category j 0 otherwise

(4)

The output vector is in the sparse form of [0,...,1,...0] if there are many categories. Therefore, this basic encoding technique leads to very high-dimensional spaces for high cardinality features, and often results in an unrealistic computing resource. One another shortfall, this method still considers each value of categorical features independently from the rest and not able to capture the relationship between them. Feature embedding is a technique to learn automatically the representation of categorical features in multi-dimensional spaces [13]. It helps to accompany a particular feature with similar effect close to others, and then identify implicitly the continuity of the data in the embedding space to capture the relationship between discrete categories and enhance ANN to solve the problem (Fig. 6). The embedding formula to transform the one-hot vector (sparse vector) to embedding vector (dense vector) (5) X d = W E×V × X v

Fig. 6 Embedding method for categorical features

162

K. Nguyen et al.

Where • #E: is the target embedding size and #V is the number of unique categories of one feature s vocabulary. • X v : the sparse vector (normally the one-hot vector) with #V dimensions represented for the v-category. • X d : embedding vector with E dimensions. • W ∈ R E x V is the weight matrix. With this method, an embedding vector is transformed to a lower-dimensional and ordinary vector in which its elements can be real rather only 0 or 1. Cross Featuring Using only input features for the deep learning model is often not be good enough to achieve the best performance. We normally need extract new features from combining input features to generate other combined features. This step requires deep knowledge about the business domain that dataset represents as well as the data science skillset to mine the data in various ways using various techniques. There is one way to do it automatically is to cross individual features obtained from categorical data or continuous features to create a new feature, also called as synthetic feature, which help to learn non-linear relationships by capturing the interactions at high-order feature interactions during the crossing task. The cross feature is a mathematical operation that returns an output feature from two or multiple input features, which is simulated as the below equation   X × Y = (xi , y j ) | xi ∈ X and y j ∈ Y The example of cross feature can be described as below diagram.

Fig. 7 The example of cross feature output

(6)

Prediction of Wart Treatment Using Deep Learning ...

163

It is recommended that deep learning models frequently cross one-hot feature vectors rather continuous features. Therefore, we can consider the feature crosses of the 2 one-hot feature vectors as logical conjunctions between them to identify the meaningful interaction at the crossing point. The cross feature is also an one-hot vector with the dimension as the multiple of the 2 input features’ dimensions. The only and single hot value (value 1) in the crossed one-hot vector identifies a particular conjunction of the 2 input features. Linear models scale well to massive data and using feature crosses on massive datasets is one efficient strategy for identifying the hidden feature interactions in highly complex prediction models.

2.6 Evaluation and Performance Metrics The model, proposed in this paper, is validated by the Cross-Validation (CV) method as it is proved as a significant impact on the evaluation of one deep learning model. In order to compare the results with other research, this paper keeps using the same dataset and 10-fold CV to evaluate the test performance. The dataset is partitioned to 10 groups. During the evaluation, there is one group is kept separated from others for testing and evaluating the performance, and the other 9 groups are used for training the classifier. This process is repeated until each group is used for the test purpose. The final performance result is the average of all 10-fold performance. The performance metrics are based on the confusion matrix and it will calculate particular metrics including Accuracy, Precision, Recall (Sensitivity) and Specificity as described below We define the confusion matrix: • TP (True Positive): measures the target that truly positive, classified as a selected output. • TN (True Negative): measures the target that truly negative, classified as a nonselected output. • FN (False Negative): measures the target that the predicted output is negative but actually positive. • FP (False Positive): measures the target that the predicted output is positive but actually negative. Using the confusion matrix, the following performance metrics are calculated as corresponding formula

164

K. Nguyen et al.

• Accuracy gives the percentage of correctly predicted results. Accuracy =

TN +TP T N + T P + FN + FP

(7)

• Sensitivit y measures the ratio of actual positives that are correctly classified, also called T r ue Positive Rate(T P R) or Recall. Sensitivit y(T P R/Recall) =

TP T P + FN

(8)

• Speci f icit y measures the ratio of actual negatives that are correctly classified, also called T r ueN egative Rate. Speci f icit y =

TN T N + FP

(9)

• Pr ecision measures the certainty or quality, indicating TPR of all positive predictions while Recall represents T P R of all actual positive samples, which is the measure of completeness or quantity. Pr ecision =

TP T P + FP

(10)

• F1—scor e is a mean of pr ecision and r ecall. F1 = 2 ×

pr ecision × r ecall pr ecision + r ecall

(11)

In addition, the Receiver Operator Characteristic (ROC) curve and the Area Under Curve (AUC) are computed. AUC-ROC curve is a performance evaluation for classification problem at different thresholds. ROC is a probability curve, meanwhile AUC represents degree or measure of separability • R OC visualizes the performance of a classifier with different thresholds using T P R and F P R. Every T P R and F P R plot a point of the curve. And the R OC is the curve connected by all points at different thresholds. • AU C gives a specific number as the summary of the R OC. A good model has high AU C closed to 100%, means that it has efficient measure of separability.

Prediction of Wart Treatment Using Deep Learning ...

165

3 Experimental Results and Discussion In order to demonstrate the efficiency of the implicit feature engineering for deep learning model, we keep using the baseline FNN model that we propose and keep track the performance in the Sect. 2. And then, we conduct many experimental executions by applying the feature embedding for some categorical features and the feature crosses for the top 3 weighted features that we found before: Age, Time and Area. We have also executed the embedding with more than 16 dimensions like 32 or 64, but we found 16 embedded dimensions for all categorial and crossed features achieve the best results. The Table 2 is the performance metrics for some successful executions to compare with the baseline result.

Table 2 Performance metrics using implicit feature engineering # Run Layers Embedding Cross Features 1

Baseline

8×8

No

No

2

Trial

8×8

All categorical features × 16

Age × Time Age × Area Time ×Area

3

Trial

8×8

Exclude Age and Age × Area Area

4

Trial

8×8

All categorical features × 16

Age × Area

5

Best

8×8×8×8× 8×8

All categorical features × 16

Age × Area

Performance Metrics ACC: 81.48% AUC: 87.9% Sensitivity: 94.12% Specificity: 60% ACC: 93.33% AUC: 98.12% Sensitivity: 96.67% Specificity: 90.22% ACC: 94.44% AUC: 98.64% Sensitivity: 94.67% Specificity: 94.44% ACC: 95.55% AUC: 98.89% Sensitivity: 94.67% Specificity: 96.67% ACC: 97.78% AUC: 99.94% Sensitivity: 98.00% Specificity: 98.00%

166

K. Nguyen et al.

The experimental results show that there are significant improvements in all performance metrics after we apply the implicit feature engineering methods into the baseline model. The final DNN model after being optimized with implicit feature engineering in Fig. 7 (Fig. 8). In comparison with the previous studies, this study is successful to propose the suitable method for implicit feature engineering in deep learning model to achieve the outperformance results in Table 3 below

Table 3 Comparison with other methods Method ACC (%) AUC (%) Fuzzy Logicbased Rule —by Khozeimeh et al. [7] Support Vector Machine —by Uzun et al. [14] Decision Tree w/ fuzzy images —by Akben [1] AdaBoost with Random Forest —by Putra et al. [9] Random Forest —by Ali Cuvito˘glu and Zerrin I¸sik [3] kNN (with BPF kernel) —by Rahman [11] Deep Learning w/ Implicit Feature Engineering —This study

Sensitivity (%)

Specificity (%)

80.70

-

-

-

85.46

-

-

-

94.44

-

89.58

100

96.60

-

93.70

100

95.00

95.00

88.00

98.00

96.78

98.80

95.63

98.10

97.78

99.94

98.00

98.00

Prediction of Wart Treatment Using Deep Learning ...

167

Fig. 8 The DNN model with implicit feature engineering

4 Conclusion and Future Work The achieved results shows that the deep learning neuron network (DNN) has a great potential to apply into the medical care domain. It helps to decrease the cost of treatment by reducing the time before deciding the treatment method for patients drastically. Basically, DNN could not show its power if we just run the model using raw features and input layer. DNN is good at capturing the hidden interactions among features if we feed the model the suitable data input layer even with less focus on business data science engineering. This study proposes the approach based on deep learning with some techniques like embeddings and feature crosses to identify and capture the high-degree features interactions. And we observe that it is able to improve the performance much better than raw FNN itself as well as other methods like RF or k-NN results in other researches. The study applies the prediction model into the wart treatment dataset and outperform a better response to predict the success of wart treatment method. For the future work, we are expected to have larger dataset which will help us better to apply more powerful deep learning algorithms into the decision support of treatment methods. The cross featuring and embedding are promising to apply in the Wide and Deep Learning model [2, 13] that is recently proposed to identify and capture the high-order feature interactions to enhance the prediction performance.

168

K. Nguyen et al.

References 1. Selahaddin Batuhan Akben: Predicting the success of wart treatment methods using decision tree based fuzzy informative images. Biocybern. Biomed. Eng. 38(4), 819–827 (2018) 2. Cheng, H.-T., et al.: Wide & deep learning for recommender systems. In: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRS 2016, pp. 7–10. ACM, New York (2016) 3. Cüvito˘glu, A., I¸sik, Z.: Evaluation machine learning approaches for classification of cryotherapy and immunotherapy datasets. Int. J. Mach. Learn. Comput. 4(4), 331–335 (2018) 4. Frank, E., Hall, M.: A simple approach to ordinal classification. In: De Raedt, L., Flach, P. (eds.) Machine Learning: ECML 2001, pp. 145–156. Springer, Heidelberg (2001) 5. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) 6. Juan, Y., Zhuang, Y., Chin, W.-S., Lin, C.-J.: Field-aware factorization machines for CTR prediction. In: Proceedings of the 10th ACM Conference on Recommender Systems, RecSys 2016, pp. 43–50. ACM, New York (2016) 7. Khozeimeh, F., Alizadehsani, R., Roshanzamir, M., Khosravi, A., Layegh, P., Nahavandi, S.: An expert system for selecting wart treatment method. Comput. Biol. Med. 81, 167–175 (2017) 8. Khozeimeh, F., et al.: Intralesional immunotherapy compared to cryotherapy in the treatment of warts. Int. J. Dermatol. 56(4), 474–478 (2017) 9. Putra, M.A., Setiawan, N.A., Wibirama, S.: Wart treatment method selection using AdaBoost with random forests as a weak learner. Commun. Sci. Technol. 3(2), 52–56 (2018) 10. Qu, Y., et al.: Product-based neural networks for user response prediction over multi-field categorical data. ACM Trans. Inf. Syst. 37(1), 5:1–5:35 (2018) 11. Rahman, M.M., Wang, S., Zhou, Y., Rogers, J.: Predicting the performance of cryotherapy for wart treatment using machine learning algorithms. In: Proceedings of the 2019 IISE Annual Conference, Orlando, FL, USA, pp. 1–6, May 2019 12. Rendle, S.: Factorization machines. In: Proceedings of the 10th IEEE International Conference on Data Mining, Sydney, NSW, Australia, pp. 995–1000, December 2010 13. Shan, Y., Hoens, T.R., Jiao, J., Wang, H., Yu, D., Mao, J.C.: Deep crossing: web-scale modeling without manually crafted combinatorial features. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 255–262. ACM, New York (2016) 14. Uzun, R., Isler, Y., Toksan, M.: Use of support vector machines to predict the success of wart treatment methods. In: 2018 Innovations in Intelligent Systems and Applications Conference (ASYU), pp. 1–4, October 2018 15. Wang, R., Fu, B., Fu, G., Wang, M.: Deep & cross network for Ad click predictions. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ADKDD 2017, pp. 12:1–12:7. ACM, New York (2017) 16. Zhang, W., Du, T., Wang, J.: Deep learning over multi-field categorical data—a case study on user response prediction. In: Ferro, N., et al. (eds.) Advances in Information Retrieval Proceedings of the 38th European Conference on IR Research, vol. 9626 of Lecture Notes in Computer Science, pp. 45–57. Springer (2016)

Visualization of Musical Emotions by Colors of Images Dao Nam Anh

Abstract Visualizing musical sound for content expression is very efficient application which allows presenting music in all its various facets. This article explores the significance of musical emotion anticipating image emotion features. This is a novel representation of the music data to show how the emotion features can add value to a set of existing sound aspects. The musical emotions were then represented by a filter with support of the Gaussian distribution to be used as a color balance filter diversified in term of musical features. With this filter, a color adjustment model can use RGB color system to modify color channels to produce color transform for image regions which are associated with the original musical emotions. As the transform filter is based on music emotions, the image has its color changed adaptively by the emotions. The visualizing solution is then performed in experiments with a music database and image dataset to evaluate the performance. The experiments show the productive visual effect of emotion taking place in the music database with a wide range of instruments and styles and should be of interest for applications of mapping the music and the visual data.

1 Introduction Visualizing music data has been one leading problem deriving a range of tasks in multimedia engineering and information science. Musical imaging can now play an essential role in the comprehensive multimedia system as it devotes to advanced grasp and more proficient music processing in all extensive musical materials. More and better study in musical imaging is needed to advance our knowledge about music patterns that were unrecognized without visualization, whose effects may depend on the type and number of participated instruments and the music forms.

D. N. Anh (B) Faculty of Information Technology, Electric Power University, Hanoi, Vietnam e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_15

169

170

D. N. Anh

Specific monitoring of the music aspects is fundamental for these improved efforts because the music forms, from simple folk songs to full orchestras have differing sensitivity to the particular music features. Imaging based solutions yield certain intuitive visualization of rhythmic flow of music without redundant details. The aim of this work was to see whether our initial findings on the visual components such as graphical objects, color pallets and image scenes were appropriately potent to be advantageous as an empirical way for the challenging problem of music data visualization. Because of the cross domain nature of question related to the sound data and image analysis, we may study propriety of techniques from both domains in order to yield a united solution. In particular, the essence of musical features covering temporal and structural pattern can be especially significant in enabling the system to track emotion movement with considerable anticipation of the segments. Hence, our first contribution of the study is the Bayesian analysis to align music emotion features with texture colors. We chose image regions of related musical emotions and customize their colors by a color filter which is as matched to the musical emotions as possible. This means that the emotion features from music data migrates into image features to get image regions, and this leads to yet more essential operations on images. This sound-imaging process arrives not to disturb the structure and color harmony of original selected image during the transform, because the major image data were detected and used as control for each process operation. As the result, our second contribution is the capacity to realize color modification for selected images with the colors manipulated by the music emotions. Experiments of the solution on a public music database and benchmark image database revealed that visual effects of sound enable listeners to get the music sense appropriately and delightfully.

2 Related Work We first review methods that visualize music data in three groups according to visual components used for visualization, namely rendered graphics, colors and image scenes. Of course, the first composer used color for accompanying music was Alexander Scriabin. His famous 1911 color-symphony, Prometheus explored relationship of keys with colors in his system of synesthesia [1]. Now, it is more commonly the case that graphical objects are being generated from the music data because of the availability of sound features extracted in frequency-time domain. In the group of rendered graphics, the Windows Media Player is a music player which uses animated art to visualize tracks. The system creates 2D graphical objects using track data [2]. A portable music player called iTunes which is associated software for Apple iPod3 also uses animated art but with 3D objects. The real-time model of [3] represents visualization environment allowing musician to construct a multimedia piece called Deep Surrender. In an attempt at performance art, an audio visualizer called Audicle provides multiple views of core elements of a virtual machine, including timing, processes, and scheduling [4].

Visualization of Musical Emotions by Colors of Images

171

The Audicle realizes drawing synthesis and animation. Since direct observation of sound estimation from raw schlieren video is hard due to very small audible sound field [5], the noise removal by spatio-temporal filters can increase visibility of the audible sound fields which are displayed like 2D graphical objects. A number of works use colors to raise visual attention. By maintaining relation of colors to the sound frequency spectrum, sphere particles are generated. Their size is related to volume, and appearance time is related to the forward spatial dimension [6]. One popular technique is to use individual notes from a MIDI keyboard or from a standard MIDI file to develop a music visualization scheme with color graphical objects [7]. In addition, a semantic characteristic of music is called tension which can be visualized by the objects [8]. The method realizes semantic music analysis by calculating tonal tension from the score and timbral tension from the performance. These tensions are scored by time-line and then represented by simple objects with colors. To represent effectively the structure of music and harmony with color, authors of [9] found it necessary to connect similar aspects in music and visual perception. It also appears to be acknowledgment of the hyper score with which the approach to traditional forms of composition is reduced to facilitate the illustration of complex structures [10]. If motives are specified and associated with colors by end-user, then it allows to “paint” a score by the means of the predefined colors, appeared in a way analogous to a color palette. Study of audio signals by [11] indicates that the acoustic features extracted from the signals were transformed into color thumbnail images. This method assists users to figure the musical contents. Within the term of WaveTerrain Synthesis by [12] an image is presented by the superposition of three planes, one for each of the red, green and blue. The planes allow mapping 2D monochrome terrains of sound to the intensity of the color. Then, addressing to group of methods using image scenes for visualizing music data, we find an emotion based music player which matches images with music tracks by common evoked emotions [13]. Low level features including harmony and temporal visual coherence are also considered for the matching task. In the current work we have not attempted to select a single visual component. Based on the survey of prior works, the approach of our study for a new solution addresses to two visual components: colors and image region. Next section gives detailed description of the concept, accompanied with an experimental reports to show efficacy of the solution.

3 The Method The relationship between the music sound and the perceived sense is not always easy to describe precisely. More commonly timbre or tone color is used to characterize different types of sound production, such as musical instruments and voice type including countertenor, tenor, baritone, and bass for men’s voice. However, the sense of music sound caused by hearing movement of pitch and tempo and the imaginable effects of variable images being loaded for illustration are important to note as an arrangement in this approach. The smallest change of music sound energy and therefore energy-based features were chosen to realize this arrangement.

172

D. N. Anh

3.1 A. the Music Emotions and Color It would be great benefit for multimedia if it was possible to visualize music emotion by color of image. The flow of music sound tempo and pitch is important to emotion effects. The relation between the sound s and emotion e is represented in the left part of Fig. 1a. Image texture marked by t covered by color c also can produces emotion effect. However, these resulted emotions are different. Given the emotion of sound and the color texture, it is likely that color of texture can be changed adaptively to obtain the same emotion of sound. A more common application of the process is the music illustration from emotion estimation and color modification of texture. Figure 1b displays expected synchronized emotion of music sound and texture color where sound and image in a multimedia flow express the same emotion. It is more helpful to have Bayesian analysis across the dependency of music sound texture, color and the emotion. A belief network structure [14] for the synchronized emotion of music sound and texture color has its conditional dependencies by applying factorization (1). p(s, e, c, t) = p(e|s) p(s) p(e|c) p(c|t) p(t) p(e,s)

(1)

p(e,c,t)

In fact, our specific imaging music problem is that of obtaining regions from image by the relation between music sound s and texture t feature of region by (2). The belief network (1) results a more feasible solution for evaluation of corresponding of image regions to music sound by (3). p(t|s) = p(t, s)/ p(s) =

 e

p(t|s) =

 e

p(s, e, c, t)/ p(s)

(2)

c

p(e|s) p(e|c) p(c|t) p(t)

(3)

c

It is possible to achieve music features f by analyzing music sounds. We can list the main features for music sound, namely peak, spread, and key, mode, harmonic change, novelty, relative entropy, root mean square energy and attack time [15].

Fig. 1 a Different emotion effects are caused by music sound and color texture. b Belief network structure for synchronized emotion of music sound and texture color

Visualization of Musical Emotions by Colors of Images

173

Fig. 2 Belief network structure for emotion of music features and color features

To retrieve relevance of music sound to emotion, the music features are observed in emotion analysis using machine learning techniques (Fig. 2). Then at a music sound we estimate the value of each feature and psychological emotion by (4). Using this procedure we can achieve evaluation of psychological emotions e=   {ei } like happy, sad, tender, anger, and fear given sound features f = f j by (5). p(e|s) =

 f

p(e, f |s) =



p(e| f ) p( f |s) p(s)/ p(s) = p(e, f,s)

f

p(ei |s) =





p(e| f ) p( f |s) (4)

f

p(ei | f j ) p( f j |s)

(5)

j

Having defined the music sound and emotion relationship, we now turn to consider relationship between color and emotion. Here it is studied that persons usually link colors with diverse emotions. The assumption of ability of color to impact a variety of human behaviors can be quite obvious in some situations [16]. Formula (6) depicts the conditional probability of emotion on particular color. In order to achieve the best match color for each emotion, the arguments of the maxima are used with estimation of the conditional probability by (7): p(ei |cj) = p(ei , cj) p(c j )−1

(6)

Hence, two terms in the right path of Eq. (3) are defined, namely, p(e|s) and p(e|c). The last term is also presented by the conditional probability (7) which can be estimated practically by Gaussian kernel (8). The implementation of filter (9) provides selection of a set of representative colors for each image region. p(c j |t) = p(c j , t) p(t)−1 

(t − c j )2 ) p(c j |t) = ex p − σ2 p(e j |t) > λ

(7)  (8) (9)

We believe that the presented approach will help detect the regions of image for a given music sound whose emotion is mostly appropriate to the emotion of color in the regions. Clearly, once an emotion has been determined on each music sound sentence, a set of regions of considered image can be allocated by the color associated with the music emotion.

174

D. N. Anh

3.2 Emotion Expression Enhancement Our initial approach to align music emotion with texture color has been to use the Bayesian analysis. So far we have been able to strengthen the emotion expression by adaptively changing color of image. The regions detected by formula (9) has parameter λ allowing to manage the degree of similarity between music emotion and color emotion. Texture of the regions obtains emotion similarity more than in other regions of image. However, the level of the similarity inside the regions is not constant and ranged in an interval from zero to one. For the purpose of improvement of the emotion similarity the Von Kries chromatic adaptation model [17] is applied to identify regions as follows. The estimated color c from (7) has its RGB components T  [c] = Rc G c Bc and each pixel of the image region t from (9) in RGB color  T scheme is presented by [t] = Rt G t Bt . The initial task is transformation of color c and image region t from RGB into a cone response domain by (10), where [M] is the matrix of Von Kries coefficients (11) [17]. ⎡ ⎤ ⎡ ⎤ ρc ρt ⎣ γt ⎦ = [M] ∗ [t] ⎣γc ⎦ = [M] ∗ [c] βt βc ⎤ 0.40024 0.70760 −0.08081 M = ⎣0.22630 1.16532 0.04570 ⎦ 0 0 0.91822

(10)



(11)

The output color of chromatic adaptation model is then calculated by (12). ⎤ ρc /ρt 0 0 = ⎣ 0 γc /γt 0 ⎦ ∗ [M] 0 0 βc /βt ⎡

[t ∗ ] = [M]−1

(12)

Thus, the chromatic adaptation for image region t by reference color c makes the region have strong similarity to the color. This means, that emotion effect of the image region is adapted toward the predefined emotion of the music sound. Figure 3 provides the primary tasks of the described method for color modification based on music emotion. Central to robustly presenting music emotion using color modification [18] for images is determining a metric which captures music emotions from music sound features. The music emotion variation in time represents the shape of change quite instinctively with favorable and rich information which is necessary for diversity of emotion colors. Having defined the emotion colors, image regions are defined. This is accomplished by the Gaussian kernel to arrange region of influence for color modification.

Visualization of Musical Emotions by Colors of Images

175

Fig. 3 Features of music sound is the base for emotion estimation

An example of color modification with music sound emotion is given in Fig. 4. Being music sound represented in wave form (Fig. 41), the detected music features allowed to the segmentation (Fig. 42). Using the feature in the each segment we obtain estimation of emotions, Fig. 43 shows the emotion values of the first segment in scale from zero to one. The values play role of weights for searching image region (Fig. 46) from original image (Fig. 45) having similar emotion colors (Fig. 44). The Fig. 46 shows where the patterns of the image are related to the music emotions. The color modification for regions with tender emotion gives its result in Fig. 47, while color modification for emotion of fear and happy produce changes by Fig. 48 and Fig. 49. Other emotions have limited space for color change. Final output of process is displayed in Fig. 410.

Fig. 4 An example of music sound visualization

176

D. N. Anh

3.3 The Algorithm Based on analysis given for the applicable visualization of music emotion in color modification we describe algorithm 1 in pseudo-codes with reference for presented formulas. Thus, we have a music sound and a set of images as inputs for the algorithm. ALGORITHM 1.Visualization of Musical Emotions by Colors of Images (VMECI) Input: Music sound x, color images u, threshold λ, deviation σ ; Output: The images v in new color. 1: g:= Music Sound Segmentation (s) 2: for g do 3: f = Music Sound Features (g); 4: e = Music Emotion ( f ); // by (4, 5) 5: for e do 6: c = Color of Emotion (e); // by (6,) 7: for c do 8: t = Region of Image (u, c); // by (7, 8, 9 9: t ∗ = Chromatic Adaptation (t, c); // by (10, 11, 12) 10: v = Export Image (t ∗ , u); // 11: end for 12: end for 13: end for

To estimate music emotions, the music sound is analyzed for detection of music features. This is especially suitable for music sound where the feature can be used for segmentation and evaluation of emotion for each segment. In particular, with emotion values regions of image are allocated for chromatic adaptation by the color associated with the emotions. We make use of the MIRtoolbox1.7.2 [15] to extract music emotions from the music features f of each segment, to provide the emotion estimation e for the initial music data. At color of emotion, a set of colors c is selected in accordance to emotions by the arguments of the maxima of (7) [16]. The image regions t related to music emotion e are allocated using Gaussian kernel (8) to perform color modification by color balance using the Von Kries chromatic adaptation with the emotion color. After modifying color for each emotion, we can combine color modification in regions allowing creation of final output image v. The rhythmic features f of a music sound track represents descriptions including the event density, beat length, pulse clarity and average note duration. Relying on the features could therefore give segmentation of the music track into segments basing on optimal time lags. In pursuit of sound segmentation, emotion p(e j |s) by (5) of each segment is estimated. Here, the emotions e j are classified into five classes: happy, sad, tender, anger and fear [15]. We collected information on the relation between color and emotion and found that preference of color of people in different emotions can be learned through additional information like age, ethnicity, profession. Once the data of color and emotion is trained, color can be predicted for emotion. In the work we assume the probability of p(ci |e j ) (6) is predefined to allow estimating directly color from emotion.

Visualization of Musical Emotions by Colors of Images

177

Extending the color—emotion relation to texture—emotion permits evaluation of p(ci |t) by (6): Given the color composition of the texture region t, the distance of color of the texture and color of emotion in (7) let us estimate p(ci |t) for each ci . Then, selection of the largest value shows the emotion which is the most suitable for the texture region. The resulting emotion of texture can then be used to compare with emotion of current musical segment to detect specific regions having similar emotion. These regions are then applied color modification, the aim is to increase the color of classified emotion in the regions and around them. Implementing the Gaussian kernel helps the color modification appeared naturally and smoothly. Notice that the algorithm can result in a different effect when the input covers a sequence of images and musical flow like a video. Our method can be seen as establishing on these techniques, combining the discrimination benefits of music feature extraction and the novel use of emotion evaluation for visualization of music.

4 Experimental Evaluation This section provides the experimental results by applying the approach described above to a database of music sounds and a set of color images. The collection of 1000 variable music tracks used in this project are audio examples from GTZAN+ [19] where 10 music genres are available. Each genre in the database is represented by 100 “.wav” files of 30 s each. The selection of ECSSD image database [20] aims at illustration of the music sounds. In this dataset, there are 1000 color images. In order to measure robustness of color modification against variable instruments a range of metric is enclosed to our experiment. Thus, metrics of sum of absolute differences (SAD) [21], structural similarity (SSIM) [22], mean squared error (MSE) [23] and peak signal-to-noise ratio (PSNR) [24] were used as tools for similarity measurement. Computing the resemblance of original image and its modified version is a matter of understanding and measure of relevance of images once the two images are given. In fact, there are distinct colors achieved by applying the VMECI for the image with blues, classical and reggae, and this is demonstrated in Fig. 5. Once the sound wave has been changed, the emotion estimation (4–5) is revised to give new color to images whilst it keeps the image structure unchanged. A good example illustrating effect of music emotion from three genres and applying the VMECI algorithm is shown in the second row of Fig. 5 which highlights a starfish with light white by the sound of blues, green by classical and little orange by reggae. The signal of different genres are presented in the wave form in the first row. This allows the dove image in the third row to obtain diversified colors. Since the experiment was performed of each pair of music sound and illustration image for emotion options, the statistics for the metrics are resumed for each emotion of blues genre and are shown in Table 1.

178

D. N. Anh

Fig. 5 Color by Genres. The first column shows original image, other columns demonstrate how color is changed by different music sounds Table 1 Color modification by emotions of blues Blues 1-MSE 1-SAD Happy Sad Tender Anger Fear

0.9993 0.9995 0.9993 0.99885 0.9989

0.9847 0.9880 0.9903 0.9822 0.98456

SSIM

PSNR/100

0.9999 0.99998 0.9999 0.9998 0.9998

0.8405 0.87184 0.8740 0.8134 0.8324

The list of emotion to get involved in this report covers happy, sad, tender, anger and fear. A graph exhibits obtained metrics values in color bars, including 1-MSE, 1-SAD, SSIM and PSNR/100 in Fig. 6. The tender emotion achieved PSNR at 87.40% while anger gained only 81.34%. Thus, the anger emotion of blues promotes the most the change of image in comparison with other emotions in our experiment.

Fig. 6 Color modification by emotions of blues

Visualization of Musical Emotions by Colors of Images Table 2 Color modification by emotion of music genres PSNR Happy Sad Tender Blues Classical Country Disco Hip-hop Jazz Metal Pop Reggae Rock

84.1 86.1 90.0 90.7 89.8 89.2 90.2 89.0 91.2 90.4

87.2 88.7 91.5 91.7 92.3 90.8 91.4 91.4 93.4 91.5

87.4 89.5 92.9 93.0 91.0 92.2 92.9 91.6 91.9 93.1

179

Fear

Anger

83.2 87.4 89.3 90.2 91.0 88.9 89.6 88.9 90.8 89.6

81.3

91.2

It is useful when color modification by sounds in different, namely blues, classical, country, disco, hip-hop, jazz, metal, pop, reggae and rock. The PSNR of a music sound based color modification is showed in Table 2 and is accompanied with a graph in Fig. 7. The most change associated with the lowest PSNR is seen at blues music, while the reggaes produce the less image modification. It is possible to keep the column “anger” of table not empty, but our experiments on the music database do not detect the emotion appeared for some genres like classical and disco. Note that due to lack of availability of other work on the music visualization by emotion, comparison with other methods is not derivable for the moment. By choosing the large dataset of images of illustration and a database of music genres, we’ve conducted experiment of visualization for music emotion by colors of images. We’ve applied the algorithm for each pair of music track and illustration

Fig. 7 Color modification by emotion of music genres

180

D. N. Anh

image from our database described above. In fact, there are distinct colors achieved by applying the VMECI for the image with music genres. Once the sound wave has been changed, the emotion is revised to give new color to images whilst it keeps the image structure unchanged. The level of color change depends on the nature of music emotion and the original image. Due to the diversity of music data and emotions that are applied to the image to illustrate music movement, the results for the modification are much distinctive. This can be useful as an alternate method for imagination by enlarging influence region as the level of color change is manageable by setting parameter λ (9). For visualization of music data, Table 3 presents published methods with arguments on sound emotion, synthesized graphical objects or use existing images. It has been shown that Deep Surrender [3], Audicle [4], Virtual Reality [6], Music Visualizer [7], Tension [8], Visualization of harmony [9] and WaveTerrain Synthesis [12] increase visibility of music by creating different graphical objects. Methods of “Paint a score” [11] and “Emotion detection” [13] prepare a set of images for ilustrating music, where the emotion is addressed in the method of [13]. Our method covers music emotion detection and uses color-emotion relation to change slightly color of image. Thus, this allows to free selection of images for music visualization, and keep major structure of images by considering their color regions.

Table 3 Music visualization methods Method Sound emotion Create object

Use images

Deep Surrender [3] Audicle [4] Virtual Reality [6] Music Visualizer [7] Tension [8] Visualization of harmony [9] WaveTerrain Synthesis [12] Paint a score [11]

No

Yes

No

No No

Yes Yes

No No

No

Yes

No

No No

Yes Yes

No No

No

Yes

No

No

No

Yes

Emotion detection [13]

Yes

No

Yes

Our method

Yes

No

Yes

Type of images

Thumbnail images 398 photos with annotated emotions 1000 images without annotated emotions

Visualization of Musical Emotions by Colors of Images

181

5 Conclusions The experimental results presented here competently validate the efficiency of the emotion feature to measure music signal emotion and associated method for visualization of the music emotion. The VMECI algorithm, originally based on sound analysis for detecting emotion, has been developed in order to be used on visualization of music emotion by color of images. The proposed method is highly productive and reliable under particular circumstances. The flexibility and reliability for visualization of music track is achieved by imposing implementation of visual transforming sound features of music track in the form of emotion detection through feature analysis. Using chromatic adaptation the structure and color harmony of original image are maintained well during the visual transform. These results recommend that the VMECI method can identify divergence in the emotion of music sound and could therefore assist media artists in their illustration of music. The proposed algorithm is robust to make the music sound visible in dynamic color world of images. It is also effective on capability of managing the level of color change during illustration process.

References 1. Peacock, K.: Instruments to perform color-music: two centuries of technological experimentation. Leonardo 21(4), 397–406 (1988) 2. Langer, T.: Music information retrieval and visualization. Trends in Information Visualization (2010) 3. Robyn, R.T., Torres, D.: Real-time music visualization using responsive imagery (2007) 4. Wang, G., Cook, P.R.: Audicle: a context sensitive, on-the-fly audio programming Environ/mentality. In: Proceedings of the Inter Computer Music Conference, pp. 256–263 (2004) 5. Chitanont, N., Yaginuma, K., Yatabe, K., Oikawa, Y.: Visualization of sound field by means of schlieren method with spatio-temporal filtering. In: ICASSP (2015) 6. Outram, B.I.: Synesthesia audio visual interactive sound and music visualization in virtual reality with orbital observation and navigation. In: IEEE Inter Workshop on Mixed Reality Art (MRA) (2016) 7. Nanayakkara, S.C., Taylor, E., Wyse, L., Ong, S.H.: Towards building an experiential music visualizer. In: IEEE ICICS (2007) 8. Herremans, D., Chuan, C.H.: A multi-modal platform for semantic music analysis: visualizing audio and score-based tension. In: IEEE ICSC (2017) 9. Klemenc, B., Ciuha, P., Subelj, L., Bajec, M.: Visual and aural: visualization of harmony in music with colour. IPSI BgD Trans. Internet Res. 7(1), 48–53 (2011) 10. Farbood, M., Pasztor, E., Jennings, K.: Hyperscore: a graphical sketchpad for novice composers. Comput. Graph. Appl. 50–54 (2004) 11. Yoshii, K., Goto, M.: Music thumbnailer: visualizing musical pieces in thumbnail images based on acoustic features. In: Proceedings of ISMIR (2008) 12. Sedes, A., Courribet, B., Thiebaut, J.-B.: From the visualization of sound to real-time sonification: different prototypes in the Max/MSP/Jitter environment. In: Proceedings of ICMC (2004) 13. Chen, C.H., Weng, M.F., Jeng, S.K., Chuang, Y.Y.: Emotion-based music visualization using photos. In: MMM2008, pp. 358–368 (2008)

182

D. N. Anh

14. Barber, D.: Bayesian Reasoning and Machine Learning. Cambridge University Press, New York (2012) 15. Laurier, C., Lartillot, O., Eerola, T., Toiviainen, P.: Exploring relationships between audio features and emotion in music. In: Proceedings of Conference ESCOM (2009) 16. Sutton, T.M., Altarriba, J.: Color associations to emotion and emotion-laden words: a collection of norms for stimulus construction and selection. J. Behav. Res. 48, 686 (2016) 17. Guy, K.K.: Colour Constancy using von Kries, Transformations - Colour Constancy “goes to the Lab”. Res. Lett. Inf. Math. Sci. 13, 19–33 (2009) 18. Dao, A.N.: Partial ellipse filter for maximizing region similarity for noise removal and color regulation. Multi-disciplinary Trends in Artificial Intelligence, MIWAI, Lecture Notes in Computer Science, vol. 11248. Springer (2018) 19. Iloga, S., Romain, O., Tchuente, M.: An accurate HMM-based similarity measure between finite sets of histograms. In: Pattern Analysis and Applications, pp. 1–26. Springer, London (2018) 20. Yan, Q., Xu, L., Shi, J., Jia, J.: Hierarchical saliency detection. In: CVPR (2013) 21. Richardson, I.E.G.: H.264 and MPEG-4 Video compression: video coding for next-generation multimedia. Wiley, Chichester (2003) 22. Sogaard, J., Krasula, L., Shahid, M., Temel, D., Brunnstrom, K., Razaak, M.: Aplicability of existing objective metrics of perceptual quality for adaptive video streaming. Electron. Imaging, (13), 1–7 (2016) 23. Bermejo, S., Cabestany, J.: Oriented principal component analysis for large margin classifiers. Neural Networks 14(10), 1447–1461 (2001) 24. Ponomarenko, N., Ieremeiev, O., Lukin, V., Egiazarian, K., Carli, M.: Modified image visual quality metrics for contrast change and mean shift accounting. In: 2011 11th International Conference the Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), pp. 305–311 (2011)

Saliency Guided Image Watermarking for Anti-forgery Pham Quang Huy and Dao Nam Anh

Abstract When hiding information in a host image, different types of image features can be used as secret key. The cryptanalytic watermarking approach is recognized as robustness improvement solution for authentication stems and anti-forgery from attack. This work contributes a novel learning technique using saliency features of sub regions to establish a secret key and then apply the key like a parameter for both watermark embedding and watermark extracting. However, there is modification of the image features through insertion of information in watermark embedding. We propose to use learning methods together with saliency models, for assuring robustness of watermark extracting. Here, the image watermarking method is described with SVM learning and assistance of a number of saliency models. Our results show that the cryptanalytic watermarking method is sufficient to achieve invisibility and stability of watermark. Experimental results on a benchmark indicate the advantage of the saliency feature based method for anti-forgery applications.

1 Introduction Watermarking is a fundamental problem in multimedia technologies with many applications such as for authentication stems and anti-forgery. For digital watermarking to take place, particular procedure consists of four steps including embedding, distribution, attack and extraction, where attack is seen as a try for content modification [1]. Hence, it’s necessary to keep robustness which addresses to similarity between original watermark and extracted watermark. Based on human perception, the watermark is designed distinctly to be visible or invisible. Imperceptible watermarking is motivated by ownership protection, where awareness of embedding an invisible P. Q. Huy · D. N. Anh (B) Faculty of Information Technology, Electric Power University, Hanoi, Vietnam e-mail: [email protected] P. Q. Huy e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_16

183

184

P. Q. Huy and D. N. Anh

watermark in image assists to prevent piracy. On the other hand, visible marks are more suitable for textual copyright attention. There have been significant consideration to watermarked image intensifying the watermark in the image regions where it is well hidden. Image saliency [2] is in many ways similar to extraction for image features; although they put more priority on visual interest and on region interpretation which region involves the most visual interest. Since subregions in high attention of visibility are seen as major parts of images, then they are usually not cropped out in image modification. Hence, the salient subregions are reasonable for embedding watermark visibly to raise textual copyright attention. Furthermore, the image regions in low attention of visibility are highly potential for embedding watermark which is expected to be imperceptible. The second approach belongs to steganography with support for concealed communication. Assuming that the encryption method in invisible saliency based watermarking does not cause degradation of image quality and consequently, region of embedded mark remained non salient, then watermark should be undetectable as expected. However, the encryption process changes the image and non-salient regions of encrypted image surely are not the same as of the original image. Big variation of saliency models in allocating salient region means that every steganalytic saliency based method must be incorporated as comparatively with saliency model to ensure robustness of decrypted watermark. The main contribution of the present work is thus to adapt saliency features of sub regions to establish a secret key and then apply the key as parameter for both watermark embedding and watermark extracting. We further propose saliency feature learning, which realizes the way of creating secret key from saliency feature of sub regions and assures robustness of watermark extracting, improving significantly its performance.

2 Related Work As mentioned above, saliency feature addressing visual attention was segmenting regions of visual interest. This can be considered as advisable notion if a region of host image is relevant for embedding watermark, and disaffecting the perceived visual quality. We will discuss some principles of estimating visual attention. To specify the saliency detection in term of local data structure of image [3], the similarity of feature at a pixel of interest to its neighbors has to be measured. In another paper [4], location of sparse foreground is approximated on sparse background with Discrete Cosine Transform (DCT). Research has also been done on the local contrast and global rarity measure [5] for feature extraction. Describing contrasts against randomly selected neighboring and peripheral regions, separating targets and distractors is the method for saliency evaluation in [6]. Next we will discuss some digital watermarking works, which are one of visual attention-based methods, as they are related to our approach. Since the saliency map represents saliency level, visually salient and non-salient regions can be detected.

Saliency Guided Image Watermarking for Anti-forgery

185

To avoid affecting the perceived visual quality when encoding watermark into host image, lower and higher strength watermarks are embedded in visually salient and non-salient regions, respectively [7]. The use of wavelets was proposed as a tool for watermarking. On contrary, in this work we introduce saliency learning instead of wave-lets to select proper region for embedding watermark without affecting the perceived visual quality. There is also interest in the watermarking community in steganographic technique for embedding message bits into a host image such as least significant bit (LSB) matching. In this method, the choice of whether to embed or extract watermark from the host image pixel is random. In the modification of LSB by [8] the change in host image is eliminated by using the choice to set a binary function of two cover pixels to the desired value. Further modification by [9] using calculation of score matrix for a near-optimal solution among all the permutation orders. Like an implement of LSB, bits of watermark are implanted at least significant location of image by [10]. In our proposed method, however, the concept of LSB is applied for changing host image by water-mark with assistance of subregion extraction based on saliency features. Here, the choice of whether to embed or extract watermark from the host image pixel is based on the machine learning of saliency features of subregions. From a visual saliency perspective, regions of interest (ROIs) can present essential information of image. The robust watermark used [11] is embedded into the DCT coefficients of ROIs while the fragile watermark is embedded into low sub-band or the watermarked image. The watermarking framework version is based on the visual saliency-based just noticeable distortion [12]. The human visual system (HVS) allows to detect salient in the different orientation of the image, a compound orientation feature map. A linear fusion of different visual features is used to get the final map, which is employed in quantization watermarking. However, orientation feature map is not used in our method. A saliency based method for sub-region watermarking is proposed in this work, allowing to apply different saliency models for estimating saliency features of sub-regions. In this way, the saliency features of sub-regions are used in learning process to select a sub-region to hide other image. The sub-region is associated with the highest confidence of low change by embedding watermarking.

3 The Method The Saliency Guided Image Watermarking (SGW) algorithm specified here is addressed to antiforgery [13]. The watermark therefore is usually fragile to protect the data. In a similar manner to [14], we use mathematical notions to express watermarking process. The task of embedding information is comparable to encoding in a cryptanalysis point of view, and in particular, is described by function (1) that maps the host image u, watermark w and key k to the embedded/encoded image v. k means that k may not be included. The task of information extraction which is equivalent decoding in cryptanalysis domain is formulated by function (2).

186

P. Q. Huy and D. N. Anh

v = encode(u, w, k)

(1)

v = decode(v, u, k)

(2)

Many saliency models exist that could serve as a feature detection for the host image and then the key k can be generated from the feature. So that the encode functions (1) can have hidden key and decode functions (2) can estimate the key from input list (3). Furthermore, only encoded v is used for decoding function (4). The final blind anti-forgery checking is displayed by function (5) which returns yes or no if exists the watermark w in image v. v = encode(u, w)

(3)

v = decode(v)

(4)

b = detect (v.w), b ∈ {0, 1}

(5)

In this method, a multi-subregion approach is applied. This is because the domain space of the host image is 2D Cartesian plane in nature and there are several ways to split the space into subregions ri to embed a watermark. Thus, for each image in training set L, saliency feature can be calculated for its subregions u(ri ). This allows to determine a saliency feature matrix f by (6). j

j

f i = saliency(u(ri )), i = 1..n, j ∈ L

(6)

In the splitting image space into subregion for hiding watermark this is important to choose location of subregions. In the case of watermarking for anti-forgery, the subregion should be set nearby image borders where non-saliency is usually observed. The simple arrangement consists of four small rectangles (n = 4) having their long edges next to edges of the image. Given different versions by encoding watermark in subregion ri , we want to identify a subregion where saliency feature is changed the least by inserting watermark. After applying a saliency model for each region, before and after encoding, the variation can by estimated by (7). At that point, a subregion which is associated with the smallest variation is outlined by indicating its index (8).       j j j  di = saliency u(ri ) − saliency v(ri )  , i = 1..n j

i j = arg min di i

(7) (8)

Having specified region index (8) like a class, we now turn to train the data set from (6) to achieve SVM kernel K basing on saliency feature f i of each subregion i in a training set j and its index i which is used for classification. For this purpose several learning methods can be applied. In our approach, the Support Vector Machine (SVM) [15] is proposed to implement because its capacity of efficiently performing a non-linear classification, implicitly mapping inputs into high-dimensional feature spaces (9).

Saliency Guided Image Watermarking for Anti-forgery

187

j

K = training( f i=1..n , i ∗j ), j ∈ L

(9)

At testing stage we have a data matrix f (10) with rows each carrying saliency feature of subregions from test images T . The learned kernel K provides benefits to estimate the position of subregion (associated with index i) where saliency features fi could be changed the least by embedding watermark (11). The index of the subregion plays a role of secret key for watermarking in both encoding and decoding stages. f ik = saliency(u(ri )), i = 1..n, k ∈ T

(10)

j

i ∗j = test ( f i=1..n , K ), j ∈ T

(11)

The feature matrix is based on saliency model estimating visual attention for host image space (6). Here we apply four saliency detection methods. The first model uses the likeness of a pixel x to its surroundings y for estimating saliency by Self-Resemblance (SR) measure by (12) with the matrix cosine similarity d(u(x), u(y)) [3]. saliency S R (u(x)) = 

 y ex p

1 −1+d(u(x),u(y)) σ2



(12)

In the case of using Sparse Signal Mixing (SSM), the saliency is estimated by spatial location of a sparse foreground hidden in a spectrally sparse background [4] by (13–14) assuming the image background is sparsely supported in the basis of the Discrete Cosine Transform (DCT) and inverse DCT [16] with a Gaussian kernel g. u¨ = i DC T (sign(DC T (u)))

(13)

¨ saliencySS M (x) = g ∗ (u¨ · u)

(14)

A bottom-up analysis showed that comparing low-level luminance and chrominance features results medium-level features as image orientations [5]. Then crossscale rarity quantization is performed on occurrence probability of pixels to estimate saliency, assuming that the locally contrasted and globally rare features are salient (15). Here, c is scale and ok is the occurrence value of the current pixel u(x) within the k th scale in resolution level.   c  1 ok (15) saliency R A R E (x) = −log c u(x) k=1 Following the success of the principal component analysis (PCA) [17], visual saliency estimation is based on PCA to distinguish visual targets and distractors [6]. Here, projecting images on various subspaces allows to evaluate average contrasts between randomly selected neighboring for image patch. Then learning for subspace selection and combination leads to finding optimized weight q (16) and saliency

188

P. Q. Huy and D. N. Anh

estimation (17), where φ is Random Contrasts (RC) for a patch, K is the training set of images, T and D are the sets of targets and distractors for images u [6]. q ∗ = min q

 K

T



ex p q T φ(D) − q T φ(T ) , 0  q  1

(16)

D

saliency RC (x) = q ∗ φ(u)

(17)

To manage the steganography complexity we adopt the Arnold Transformation algorithm [18] for initial encoding the watermark by (18). Then adopting concept of LSB, the watermark is LSB hidden with the blue channel of region with the guide of saliency feature f by (19). w A = Ar nold(w)

(18)

v = mi xing L S B (w A , u(rr ∗ , f (r ∗ ))

(19)

The decoding process is started by detecting a watermark w A from encrypted region in the blue channel of image v by (20) with LSB and the guide of saliency. Finally, the inverse Arnold transform extracts the watermark w from w A . w A = detect Ls B (v(ri ∗ ), f (ri ∗ )) ∗

v = mi xing L S B (w A , u(rr ∗ , f (r ))

(20) (21)

The Fig. 1 displays Saliency Guided Image Watermarking (SGW) algorithm by block-schema. We explored how the image feature from different saliency models (12–17), irrespective of its estimation approach SR [3], SSM [4], RARE [5] and RC [6], can be used for watermark encoding and decoding tasks (3–11). ALGORITHM 1.Saliency Guided Image Watermarking (SGW) Input: Training host images u, watermarks w; test images u ∗ , watermarks w ∗ ; Output: The encoded image v ∗ and extracted watermark t ∗ ; 1: for u do // Training stage 2: f := Extract Saliency Feature (u) // by (6,12-21) 3: i j = Estimate Sub Region ( f ); // by (7,8) 4: v = Embed Watermark to Image Sub Region(u, w, i j ); // by (18-19) 5: t = Detect Watermark from Imag Sub Regione (v); // by (20-21) 6: end for 7: K = Training ( f, i j ,w, t); // by (9) 8: for u ∗ do // Test stage 9: f ∗ := Extract Saliency Feature (u ∗ ) // by (6,12-21) 10: i j = Estimate Sub Region ( f ∗ , K ); // by (7,8) 11: v ∗ = Embed Watermark to Image Sub Region (u ∗ , w ∗ , i j ); // by (18-19) 12: t ∗ = Detect Watermark from Image Sub Region (v ∗ , K ); // by (20-21) 13: a = Estimate Accuracy (w ∗ , t ∗ ); // 14: end for

Saliency Guided Image Watermarking for Anti-forgery

189

Fig. 1 Process of watermarking with the saliency features by SGW algorithm

Figure 1 illustrates the watermarking process for an example of host image u. After estimating saliency for image, saliency feature f of each subregion is defined. Here, SVM kernel helps to select a subregion r located in the top image. Since watermark w is embedded the new version of host image v is slightly different of its original version. Additionally, it can be attacked, for example by cropping. Once saliency feature f ∗ for the image v ∗ is found, SVM kernel again shows the sub-region r ∗ where watermark is embedded. Therefore watermark w ∗ can be decoded.

4 Experimental Evaluation Robustness of the described method accompanied by certain saliency models is addressed in this section. We have tested the method on MSRA10K Salient Object Database [19] consisting of 10.000 real images. The database was split randomly into training set and test set. By conducting experiments with four saliency models we compare the performance utilizing kernels generated by the SVM. Furthermore, the attack test contains different attack types includes scaling, rotation, additive Gaussian noise, salt and pepper noise, histogram equalization, median filter, and cropping. Figure 2 demonstrates an example of embedding a black white log to an image from MSRA10K, applying four saliency models SR [3], SSM [4], RARE [5] and RC [6] (Fig. 2C). As expected the attack by additive Gaussian noise makes encoded host image noisy (Fig. 2E). The decoded watermarks from four cases are degraded by different levels due to noise attack and guided saliency features (Fig. 2F). An example of attacks is shown in Fig. 3. Given a host image (A) and watermark (B), encoded image (D) is created with assistance of saliency feature (C). The image is then attacked by rotation of 90◦ , −90◦ (E), additive Gaussian noise (F), salt and pepper noise (G), median filter (H) and cropping 90% (I). Watermarks extracted

190

P. Q. Huy and D. N. Anh

Fig. 2 Embedding a black white logo to a sample image from MSRA10K, applying four saliency models: SR [3] in the 1st row, SSM [4]—2nd row, RARE [5]—3r d row and RC [6]—4th row

Fig. 3 After embedding a black white logo to an image from MSRA10K, different attacks are applied for the watermarked image

from attacked images are observed with different degradation (J). One can see that degradation by rotation of 90◦ , −90◦ (F) is negligible while strong degradation is caused by median filter. The other attacks also produce violent degradation but keeping major part of the watermark. So far we have looked at pairwise matches for host image u and encoded image v checking invisibility of watermark with assistance of precision, recall, fmeasure [20], sum of absolute differences (SAD) [21], structural similarity (SSIM) [22], mean squared error (MSE) [23] and PSNR [24]. Since decoded watermarks t are degraded by attacks, they are matched with initial version w to highlight the fragility of watermark. For each saliency model, Table 1 reports average metrics for invisibility of watermark. The best scores are printed in bold. In fact there is high invisibility observed for these four performed saliency models (Fig. 4). This is an advantage of the proposed saliency based method due to selection of non-saliency region for hiding watermark and the fact that embedding information does not change strongly saliency features. Notice that we observed watermark degradation caused by embedding watermark itself: Saliency feature is changed after

Saliency Guided Image Watermarking for Anti-forgery

191

Table 1 Imperceptibility of watermark Saliency Precision Recall Fmeasure MSE

SAD

SSIM

PSNR

SGW-SR SGW-SSM SGW-RARE SGW-RC

0.0002 0.0001 0.0001 0.0002

1.0 1.0 1.0 1.0

97.57 97.56 97.56 97.34

0.99 0.99 0.99 0.99

0.99 0.99 0.99 0.99

99.99 99.99 99.99 99.99

0.0002 0.0001 0.0001 0.0002

Fig. 4 Imperceptibility of watermark by comparing initial image u and encoded image v

encoding and then blind decoding process used saliency feature of encoded image to extract the watermark. In order to measure stability of watermark, comparison of initial watermark with extracted watermark can be measured by quality metrics, including precision, recall, fmeasure SAD, SSIM, MSE, PSNR and the normalized cross-correlation (NC) [1]. Table 2 demonstrates the quality of encoding process for four saliency models. The method with saliency models SR [3], SSM [4], RARE [5] obtained the best Precision, PSNR and NC scores. The method of RM [6] won the best MSE and SSIM (Fig. 5). The PSNR scores of methods [9] and [8] are included in the Table 2 for reference only as their test data consist of four images which are different to our test data covering 5000 images. During test for stability of watermark again attacks, we addressed several attack types, which are detailed in the first column of Table 3. Each saliency model produces average PSNR for attack tests in columns 2–5. Figure 6 shows that our method is strong again attack of rotation 90◦ , −90◦ , and weak with attack by histogram equalization and rotation of 45◦ , −45◦ . In particular, results of other methods in PSNR with similar attack types are referred in columns 6 and 7. However the results are produced from different dataset. The data tested method of ROI [11] contains three host images and one watermark. There are other four images used for test in LSB [10]. Hence two columns in the right of Table 2 are reserved for reference only. The experiments shows that kernels were constructed for each saliency models and for each single sub region. The robustness of saliency can be explained by constructed kernel machines.

192

P. Q. Huy and D. N. Anh

Table 2 Stability of watermark Method

Precision Recall

Fmeasure MSE

SAD

SSIM

PSNR

NC

SGW-SR

0.99

0.96

97.90

0.02

0.02

0.99

68.91

0.95

SGW-SSM

0.99

0.96

97.89

0.02

0.02

0.99

68.91

0.95

SGW-RARE 0.99

0.96

97.88

0.02

0.02

0.99

68.91

0.95

0.95

0.81

86.26

0.17

0.17

0.98

63.37

0.75

SGW-RC LSB [9]

54.32

LSB [8]

52.43

Fig. 5 Stability of watermark by comparing initial watermark w and decoded watermark t

Table 3 Stability of watermark again attacks in PSNR PSNR SR SSM RARE Scale .8 Scale 1.2 Rotate 30◦ , −30◦ Rotate 45◦ , −45◦ Rotate 90◦ , −90◦ Additive Gaussian noise Salt and pepper noise Histogram equalization Median filter Crop 0.9

RC

ROI [11]

LSB [10]

51.65 52.85 51.85 51.71 64.37 51.98

51.66 52.87 51.84 51.70 64.49 51.97

51.65 52.85 51.85 51.71 64.37 51.96

51.41 52.05 51.55 51.43 59.85 51.76

25.05

54.58

54.58

54.58

54.62

32.43

51.47

51.47

51.47

51.45

53.80 53.38

53.80 53.38

53.80 53.38

52.89 52.57

28.63

As saliency detection and embedding message are employed in our method, their computational performance plays central role for estimation of computation efforts O = Os + Ow , where Os and Ow are computational complexity of saliency model and message embedding method. Let m stand for the size of image, n—the size of sub region.

Saliency Guided Image Watermarking for Anti-forgery

193

Fig. 6 Stability of watermark again attacks by comparing initial watermark w and decoded watermark t after attacks

Note that the saliency detection is performed for whole image whilst message embedding is applied for a sub region only. Based on the description of selfresemblance [3] sparse saliency [4], rare saliency [5] and the random contrasts [6] image embedding, all these methods have linear complexity with the size of input image O f = O(m), Ow = O(n), m ≥ n. Consequently, O(m) is computational complexity for proposed algorithm by the selected saliency and image embedding method: O = O(m) + O(n) ≤ O(m).

5 Conclusions In this paper a new blind image watermarking method has been investigated based on two terms: a robust saliency features with a slight modification after embedding watermark and a confident learning kernel, achieved by available large saliency database and robustness of learning method. While each of these approaches has analyzed its use before, this paper has shown that their integration provide an efficient solution. Here, saliency features of sub-regions have been created and learned to allow for embedding and extracting watermark from sub-regions of host image. The method demonstrates promising imperceptibility and stability of watermark through experimental results on a benchmark database. The ease and computational complexity is very appropriate in promoting the use of saliency features. Furthermore, the effective available saliency models contribute necessary schemes for many diverse ways of implementing blind image watermarking for hosting images, particularly in regard to authentication stems and anti-forgery.

194

P. Q. Huy and D. N. Anh

References 1. Cayre, F., Fontaine, C., Furon, T.: Watermarking security: theory and practice. IEEE Trans. Signal Process. 53(10), 3976–3987 (2005) 2. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998) 3. Seo, H.J., Milanfar, P.: Nonparametric bottom-up saliency detection by self-resemblance. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 45–52 (2009) 4. Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 194–201 (2012) 5. Riche, N., Mancas, M., Gosselin, B., Dutoit, T.: Rare: a new bottom-up saliency model. In: 2012 19th IEEE International Conference on Image Processing, pp. 641–644 (2012) 6. Fang, S., Li, J., Tian, Y., Huang, T., Chen, X.: Learning discriminative subspaces on ran-dom contrasts for image saliency analysis. IEEE Trans. Neural Netw. Learn. Syst. 28(5), 1095–1108 (2017) 7. Oakes, M., Bhowmik, D., Abhayaratne, C.: Visual attention-based watermarking. In: 2011 IEEE International Symposium of Circuits and Systems (ISCAS), Rio de Janeiro, pp. 2653– 2656 (2011) 8. Mielikainen, J.: LSB matching revisited. IEEE Signal Process. Lett. 13(5), 285 (2006) 9. Xu, H., Wang, J.J., Kim, H.J.: Near-optimal solution to pair-wise LSB matching via an immune programming strategy. Inf. Sci. 180(8), 1201–1217 (2010) 10. Basu, A., Talukdar, S., Sengupta, N., Kar, A., Chakraborty, S.L., Sarkar, S.K.: On the implementation of a saliency based digital watermarking. In: Mandal, J., Satapathy, S., Kumar Sanyal, M., Sarkar, P., Mukhopadhyay, A. (eds.) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol. 339. Springer (2015) 11. Tian, L.H., Zheng, N.N., Xue, J.R., Li, C., Wang, X.F.: An integrated visual saliency-based watermarking approach for synchronous image authentication and copyright protection. Sig. Process. Image Commun. 26(8–9), 427–437 (2011) 12. Li, J., Zhang, H., Wang, J., Xiao, Y., Wan, W.: Orientation-aware saliency guided JND model for robust image watermarking. IEEE Access 7, 41261–41272 (2019) 13. De Vleeschouwer, C., Delaigle, J.F., Macq, B.: Invisibility and application functionalities in perceptual watermarking—an overview. Proc. IEEE 90, 64–77 (2002) 14. Pfitzmann, B.: Information hiding terminology. In: 1st Workshop on Information Hiding, pp. 347–350 (1996) 15. Ben-Hur, A., Horn, D., Siegelmann, H., Vapnik, V.N.: Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2001) 16. Ahmed, N.: How I came up with the discrete cosine transform. Digit. Signal Proc. 1(1), 4–5 (1991) 17. Jolliffe, I.T.: Principal Component Analysis. Series: Springer Series in Statistics, 2nd ed., XXIX, 487 p. 28. Springer(2002) 18. Wu, L., Zhang, J., Deng, W., He, D.: Arnold transformation algorithm and anti-arnold transformation algorithm. In: Proceedings 2009 First International Conference on Information Science and Engineering, pp. 1164–1167 (2009) 19. Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., Torr, P.: Deeply supervised salient object detection with short connections. In: IEEE TPAMI (2018) 20. Wackerly, D., Mendenhall, W., Scheaffer, R.L.: Mathematical Statistics with Applications, 7th edn. Thomson Higher Education, Belmont (2008) 21. Richardson, I.E.G.: H.264 and MPEG-4 video compression: video coding for next-generation multimedia. Wiley, Chichester (2003) 22. Brunet, D., Vass, J., Vrscay, E.R., Wang, Z.: On the mathematical properties of the structural similarity index (PDF). IEEE Trans. Image Process. 21(4), 2324–2328 (2012)

Saliency Guided Image Watermarking for Anti-forgery

195

23. Winkler, W.E.: Overview of record linkage and current research directions. Research Report Series (2006) 24. Ponomarenko, N., Ieremeiev, O., Lukin, V., Egiazarian, K., Carli, M.: Modified image visual quality metrics for contrast change and mean shift accounting. In: Proceedings 2011 11th International Conference the Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), pp. 305–311 (2011)

Big Data for Public Health Surveillance in Vietnam: Challenges and Opportunities Long Bui, Thai Son Ha, Ngoc Ha Nguyen, Khandmaa Sukhbaatar, and My Hanh Bui

Abstract Several research and review articles have addressed the potential of big data in public health surveillance. The advances in big data revolution will enormously improve the quality and timeliness of health-related information availability, and provide better prospects for veracious data integration, aggregation, analysis, and visualization of heterogeneous data sources for public health surveillance. However, opportunities also come with numerous challenges in terms of technology, ethics, and strategies. This paper discusses the opportunities and challenges for big data, which integrates data from multiple sources for public health surveillance in Vietnam.

1 Introduction Public health surveillance has been defined as “the ongoing, systematic collection, analysis, and interpretation of health-related data with the a priori purpose of preventing or controlling disease or injury and identifying unusual events of public L. Bui Center for Research, Consulting and Support of Community Health, Hanoi, Vietnam e-mail: [email protected]; [email protected] T. S. Ha Administration of Medical Services, Hanoi, Vietnam e-mail: [email protected] N. H. Nguyen People’s Police Academy, Hanoi, Vietnam e-mail: [email protected] K. Sukhbaatar School of Public Health, Mongolian National University of Medical Sciences, Ulaanbaatar, Mongolia e-mail: [email protected] M. H. Bui (B) Hanoi University Hospital, Hanoi Medical University, Hanoi, Vietnam e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_17

197

198

L. Bui et al.

health importance, followed by the dissemination and use of such information for public health action” [1]. By this definition, public health surveillance by itself is data-driven. An effective surveillance system has several functions: (1) detection and notification of health events; (2) collection and consolidation of pertinent data; (3) investigation and confirmation (epidemiological, clinical, and/or laboratory) of cases or outbreaks; (4) routine analysis and report-making; (5) information feedback to data providers; (6) reporting and feed-forwarding data to the next administrative level [2]. Big data consists of extensive datasets - fundamentally in the characteristics of volume, velocity, variety, and veracity about the quantities of data, the speed of collection and use, and the various types and forms they arrive in that require a scalable architecture for efficient storage, validation, manipulation, and analysis. Other characteristics of big data are variability, indicating variations that occur in the data flow rates, and complexity, reflecting the fact that big data is often produced through multiple sources [3]. Numerous authors have described and made predictions about the role of big data in health care surveillance. Leveraging big data is promised to improve surveillance by faster detection and forecasting of disease outbreaks as well as a reduction in administrative impediments [4–6]. With the rapid development and implementation of health information systems in Vietnam, a greater quantity of health-related data is available from more sources than ever before, creating tremendous opportunities for big data platform to enable massive integration, aggregation, analysis, and visualization of heterogeneous data sources, which could help to achieve goals of public health surveillance in Vietnam. However, many challenges in terms of technology, ethics, and strategies persist. In the past few years, rich sources of medical data were generated through medical records and regulatory requirements in Vietnam. The administrative and health insurance claims data, currently covering nearly 90% of the Vietnam population, are the most promising data sources. According to the Decree 4210/2017/QÐ-BYT, health facilities are responsible for submitting electronic data to the Health Data Portal of the Ministry of Health and Health Insurance portal of the Social Insurance in Extensible Markup Language format.

2 Variety of Health-Related Data Sources The implementation of Hospital Information System (HIS), comprising structured and unstructured data from, but not limited to, Electronic Medical Records (EMR), Laboratory Information System (LIS), Picture Archiving and Communication System (PACS), Radiological Information System (RIS), and physicians’ notes. According to the MoH, until 2018, almost 100% of 1500 public hospitals implemented the Hospital Information System (HIS). The full implementation of EMR has been facilitated with several pilot projects. The MoH expects the EMR to be implemented nationwide by 2025. Several data sources come from national health

Big Data for Public Health Surveillance in Vietnam …

199

Fig. 1 The integration of available health-related data sources into a big data platform for public health surveillance in Vietnam

programs, such as Infectious Diseases Surveillance System [7], Vietnam TB Information Management Electronic System [8], Immunization registry [9], Antimicrobial resistance (AMR) surveillance system [10], and Pharmacovigilance system [11]. Furthermore, health data can be combined with demographic and socio-economic, geographical locations and landscape, environmental, and climatic data, which are mostly in a structured format. The demographic and socio-economic data are useful in estimating morbidity and mortality of diseases and comparing the burden of disease between populations and time periods. The socio-economic and environmental data may serve to investigate the factors associated with health outcomes. Unstructured and text data, including social media (e.g., Facebook, Twitter, etc.) and news, can complement the structured data for surveillance purposes [12]. Geographical Information System and spatial analysis techniques are also important tools for analysis and visualization of trends and geographic distribution of diseases [5]. Figure 1 shows available health-related data sources and their potential integration for building a big data platform for public health surveillance in Vietnam.

3 Challenges Volume: The various health-related data sources increases the volume of datasets when data are integrated. The growing amount of data in the health-care industry necessitates a decent storage infrastructure, technical capacity, and the adoption of big data technologies [13].

200

L. Bui et al.

Veracity: The quality of data sources affects the veracity of big data for public health surveillance. Assessing data quality is the prerequisite step to ensure the performance of a surveillance system. Maintaining data quality requires a lot of effort to utilize current national standards and practices. Currently, there is a lack of literature on the performance of screening and diagnostic tests for diseases in Vietnam. However, heterogeneities are observed on the screening and diagnosis by physicians and health staff in different health facilities and levels of the health system in Vietnam. Besides, the electronic data entry forms in HIS or EMR are purely digitalized versions of existing paper-based forms, which are obsolete and in most circumstances, not for surveillance purposes. A study conducted in 2018 has shown that about 80–90% of dengue and measles outbreaks had a laboratory confirmation date, while only 10% of diphtheria, H5N1, and H1N1 outbreaks had laboratory date records [14]. The absence of a widely adopted and consistently implemented medical terminology system is another problem. For example, since 2000, the use of the International Statistical Classification of Diseases (ICD-9, and more recently ICD-10) was mandated by the Ministry of Health for all hospitals. Nevertheless, recently, ICD-10 volumes 1 and 2 were translated. However, due to the complexity of the Vietnamese language, there are variations in the coding of other clinical terms beyond diagnosis. Notably, it is suggested that the national unique identification could facilitate the linkage and reduce duplicates of health data generated by information systems [15]. However, the nationwide implementation of national unique identification—which is under the administration of the Ministry of Public Security—is in progress and would remain incomplete until 2020. Velocity: Timely notification and reports are for quick responses to health events. Although big data is expected to provide real-time or near real-time information, fast signal detection and forecasting of disease outbreaks, these goals are affected by the speed of data flow and integration, namely, the velocity of the big data. The velocity of big data for public health surveillance largely depends on current medical practices. For example, a study showed an overall mean time of 7.5 weeks from the onset of cough to TB treatment for new smear-positive pulmonary TB patients in Vietnam [16]. With a large amount of data for processing, it should not waste time in converting different formats or offloading data manually into a centralized platform. Therefore, technical solutions, as well as common data exchange standards, such as Health System Level 7, are essential. Ethical considerations: Big data approach to human health have been subjected to a broad spectrum of ethical, legal, and social challenges, including but not limited to, privacy concerns, personal autonomy, and trade-off between population health and personal rights, as well as effects on public demand for equity, transparency, clarity, and trust [17, 18]. It requires a decent legal framework and new standards of ethical review. Unfortunately, there is no specific law or guidance for these issues in Vietnam. Although the privacy of private information has been regulated in the Civil Code No. 91/2015/QH13, the Law of Medical Examination and Treatment No. 40/2009/QH12, and Cyber Security Law No. 24/2018/QH14, it certainly needs a

Big Data for Public Health Surveillance in Vietnam …

201

dedicated legal mechanism. The Decree for personal data sharing is under review and far from official enactment. A data-driven strategy: Despite showing great commitment, Vietnam lacks a welldefined data-driven strategy that incorporates public health surveillance principles, such as prioritizing diseases, setting objectives, defining methods for conducting surveillance and detailed plans, policies, and programs that control, protect, and enhance the value of data. Moreover, the establishment of a governing body of big data for public health surveillance with experts in public health as well as other disciplines (e.g., data scientists, security experts, bioinformaticians, legal experts, etc.) plays a vital role for appropriate planning, monitoring, and evaluation of big data public health surveillance system. In parallel, public health officials should be well-equipped with sufficient skills in computer programming, data management, analysis, visualization, and interpretation as well as effective communication. This might require specific educational courses or other training activities on data science, bioinformatics, and cyber-security. In Vietnam, there are several ongoing projects that combine data from multiple sources for surveillance purposes. For instance, the Infectious Surveillance System links data from hospitalized patients and community surveillance of 42 infectious diseases. The Administration of Medical Services and Vietnam Post and Telecommunications Group are cooperating to develop an online system for real-time visualization, geographic pattern recognition, and forecasting (using machine learning algorithms) of several diseases with databases from more than 7,000 health facilities, including health stations, district, provincial, and central hospitals. The system also applies Natural Language Processing for detecting abnormal results from physicians’ comments. The Vietnam Red Cross and German Red Cross are currently implementing a Forecast Based Financing project, collecting and combining data on admission related to respiratory diseases in several hospitals in Hanoi and climate forecasts for early prediction and response to reduce the suffering of the vulnerable population due to more frequent and severe heat waves. National TB Program aims to link administrative data from VITIMES with clinical data, including about 30,000 radiology images and apply various machine learning algorithms for rapid diagnosis and detection of TB as well as lung diseases following the guidelines of the World Health Organization (WHO) [20] (Fig. 2). Although several projects that aim to adopt big data for surveillance purposes in Vietnam are available somewhere, final results are still missing.

4 Opportunities A hybrid public health surveillance system: The big data platform will be an excellent environment for automated analyses and interactive visualization to rapidly detect the trend of diseases concerning population groups, locations, and time; report early

202

L. Bui et al.

Fig. 2 AI-assisted patient-initiated TB screening pathway. Adapted from Systematic screening for active tuberculosis: an operational guide, WHO, 2015 [20]

detection, communicate, and respond to disease outbreaks in real-time or near realtime. However, it is worth noting that the success of a public health surveillance system could not be achieved with big data without human judgment, especially in investigation and confirmation of disease outbreaks. It also is expected to break new ground for Artificial Intelligence (AI) and its branches such as data mining (e.g., classification and clustering), and machine learning (ML—e.g., artificial neural network and support vector machines) [19]. A platform for medical research and bioinformatics training: Big data offers useful data sources for epidemiological studies, mathematical modeling, or to predict the effect of proposed interventions or to estimate the impact of a certain disease. Big data analytics benefits health researchers in generating hypothesis and causation inference help in a deeper understanding of disease patterns and associated factors. It has been suggested that using large samples with adequate instrumental variables and covariates enhances statistical power [18]. For instance, a team from Hanoi Medical University is investigating the associated factors and the economic burden of surgical adverse events (including patient’s clinical conditions and distance to health facilities) using a dataset of 2.7 million patients provided by Health Insurance Agency. Data marts of specific diseases can also be extracted from the platform for biostatistics and bioinformatics training curriculums.

Big Data for Public Health Surveillance in Vietnam …

203

5 Conclusion This paper discusses the opportunities and challenges of big data for public health surveillance in Vietnam. We argue that big data is in its infancy in public health in Vietnam. The rapid emergence of health-related data sources brings unprecedented opportunities to collect and build a big data platform for public health surveillance. However, public health is different from other disciplines with considerable challenges of data quality, privacy, and regulatory requirements. Moreover, the centralization of health-related data sources may take a long time during which there will be unpredictable shifts in epistemology and paradigms of the big data revolution. Therefore, to unlock the opportunities to achieve the full potential of big data public health surveillance to improve human health, we need collaborative efforts of scientific and regulatory communities. Further research, as well as pilot projects, are needed to translate the potential of big data to scrutinize solutions for better public health surveillance.

References 1. Richards, C.L., et al.: Advances in public health surveillance and information dissemination at the centers for disease control and prevention. Public Health Reports (Washington, D.C.: 1974), 132(4), 403–410 (2017) 2. Groseclose, S.L., Buckeridge, D.L.: Public health surveillance systems: recent advances in their use and evaluation. Annu. Rev. Public Health 38, 57–79 (2017) 3. National Institute of Standards and Technology. NIST Big Data Interoperability Framework: Volume 1, Big Data Definitions (2018) 4. Kostkova, P.: A roadmap to integrated digital public health surveillance: the vision and the challenges. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 687– 694. ACM, Rio de Janeiro, Brazil (2013) 5. Ola, O., Sedig, K.: The challenge of big data in public health: an opportunity for visual analytics. Online J. Public Health Informat. 5(3), 223 (2014) 6. Bansal, S., Chowell, G., Simonsen, L., Vespignani, A., Vi-boud, C.: Big data for infectious disease surveillance and modeling. J. Infect. Dis. 214(Suppl 4), S375–S379 (2016) 7. Nguyen, T.N., Ha, S.T., Braa, J.: Assembling a national health information infrastructure: The case of Vietnam. The Electronic Journal of Information Systems in Developing Countries 66(1), 1–18 (2015) 8. Thai, L.H., Nhat, L.M., Shah, N., Lyss, S., Ackers, M.: Sensitivity, completeness and agreement of the tuberculosis electronic system in Ho Chi Minh City. Viet Nam. Public Health Action 7(4), 294–298 (2017) 9. Nguyen, N.T., Vu, H.M., Dao, S.D., Tran, H.T., Nguyen, T.X.C.: Digital immunization registry: evidence for the impact of mHealth on enhancing the immunization system and improving immunization coverage for children under one year old in Vietnam. mHealth, vol. 3, p. 26 (2017) 10. Bordier, M., et al.: Antibiotic resistance in Vietnam: moving towards a One Health surveillance system. BMC Public Health 18(1), 1136 (2018) 11. Nguyen, K.-D., et al.: Overview of pharmacovigilance system in Vietnam: Lessons learned in a resource-restricted country. Drug Saf. 41(2), 151–159 (2018) 12. Bui, D., Mekaru, S., Freifeld, C., Brownstein, J.S.: Development of a Vietnamese language outbreak mapping and surveillance system. Online J. Public Health Informat. 5(1) (2013)

204

L. Bui et al.

13. El aboudi, N., Benhlima, L.: Big data management for healthcare systems: Archi-tecture, requirements, and implementation. Adv. Bioinformat. (2018) 14. Lawpoolsri, S., et al.: Data quality and timeliness of outbreak reporting system among countries in Greater Mekong subregion: Challenges for international data sharing. PLoS Negl. Trop. Dis. 12(4) (2018) 15. Son, V.H., Abdul-Quader, A., Suthar, A.B.: Using strategic information for action: lessons from the HIV/AIDS response in Vietnam. BMJ Global Health 3(5), e000793 (2018) 16. Huong, N.T., Vree, M., Duong, B.D., Khanh, V., Loan, V., Co, N.V., Borgdorff, M.W., Cobelens, F.G.: Delays in the diagnosis and treatment of tuberculosis patients in Vietnam: a cross-sectional study. BMC Public Health 7(1), 110 (2007) 17. Kostkova, P.: Disease surveillance data sharing for public health: the next ethical frontiers. Life Sci. Soc. Policy 14(1), 16 (2018) 18. Mooney, S.J., Pejaver, V.: Big data in public health: terminology, machine learning, and privacy. Annu. Rev. Public Health 39, 95–112 (2018) 19. Benke, K., Benke, G.: Artificial intelligence and big data in public health. Int. J. Environ. Res. Public Health 15(12), 2796 (2018) 20. Systematic Screening for Active Tuberculosis: Principles and Recommendations. WHO Guidelines Approved by the Guidelines Review Committee. World Health Organization, Geneva (2015)

A New Computational Method Based on Heterogeneous Network for Predicting MicroRNA-Disease Associations Thanh Van Thai, Duong Hung Bui, Xuan Tho Dang, Thanh-Phuong Nguyen, Dang Hung Tran, and The Dung Luong

Abstract MicroRNAs (miRNAs) are a class of small non-coding RNAs that are involved in the development of various complex human diseases. A great effort has spent to uncover the relations between miRNAs and diseases for decades. Although most of known miRNA-disease associations are discovered by experimental methods, the experimental methods are in general expensive and time-consuming. Another approach using computational methods to predict potential miRNA-disease associations has been attracted many computer scientists in recent years. However, computational methods suffer from various limitations that affect the prediction accuracy and their applicability. In this paper, we proposed a new computational method that would be able to predict reliable miRNA-disease associations. We integrate different biological data sources such as known miRNA-disease associations, miRNA-miRNA functional similarity, and disease-disease semantic similarity into a miRNA-disease heterogeneous network. The structural characteristics of this network are represented as a feature vector dataset via meta-paths and a binary classification problem is formulated. However, because the number of known miRNA-disease associations is very T. Van Thai · T. D. Luong Academy of Cryptography Techniques, Hanoi, Vietnam e-mail: [email protected] T. D. Luong e-mail: [email protected] D. H. Bui Hanoi Trade Union University, Hanoi, Vietnam e-mail: [email protected] X. T. Dang · D. H. Tran (B) Hanoi National University of Education, Hanoi, Vietnam e-mail: [email protected] X. T. Dang e-mail: [email protected] T.-P. Nguyen Life Science Research Unit - Systems Biology Group, University of Luxembourg, Megeno, Luxembourg e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_18

205

206

T. Van Thai et al.

small, we face with an imbalance data classification problem. To solve this issue, a clustering-based under-sampling algorithm has been proposed. Training classification models using SVMs, we obtained results of 2–5% higher in AUC measures when compared to previous methods. These results implied that our proposed model could be used to discover reliable miRNA-disease associations in the human genome.

1 Introduction MicroRNA (miRNA) is a small RNA, about 22–26 nucleotides, belonging to the noncoding RNA class [1]. They do not encode for protein, but are important for inhibiting protein coding of some messenger RNAs (mRNAs). Since the first discovery in 1993, the roles of miRNAs have been clearer for cell development. The associations of miRNAs and some common diseases, such as liver cancer, breast cancer, and diabetes have been observed by web-lab experiments [1–3]. However, experimental methods are time-consuming and costly, and the experiment is conducted only on miRNAs and diseases separately [5, 9, 12]. With the development of the intelligent computational methods, there is an approach using machine learning algorithms for building models that can be predicted associations between miRNAs and diseases [5, 10, 11, 18]. These models are trained on datasets, which generated by experimental methods, then these models are used to predict the association between miRNAs and diseases. The evaluation of goodness of the predictive model is done in two respects, one is the accuracy of the models and the other is the verification of the predicted results from the knowledge published in the literature or biology databases. Although the predicted results may not completely accurate, it also provides another way to help biologists narrow down the objects that need to be tested, which will save time and cost in studying the roles of miRNAs. In recent years, the research community has devoted a great deal of effort to develop computational models for discovering hidden knowledge in biological data. Among them, the problem of predicting associations between microRNAs and diseases has attracted much attention in the field of computer science. There are two main approaches for building a model, the first one is the network based methods and the second is machine learning based methods. Both of these approaches are based on an assumption that, if a miRNA and a disease have more interactions with the other elements in the biological network, they are likely to have a relationship [7, 10, 12, 14, 17, 22]. Between these two approaches, the machine learning based approach can help improve the predictive performance and recently many machine learningbased models have been proposed to predict potential miRNA-disease associations. For example, Jiang et al. [10] extracted feature sets based on known associations and unknown associations for training support vector machines (SVMs) classifiers to predict potential miRNAs and disease associations. Xu et al. [20] built a network of interactions between miRNAs and target genes based on target gene prediction software and then trained SVMs to identify disease-associated miRNAs. However, the target gene prediction software results of such methods had high false positives

A New Computational Method Based on HN for Predicting MDAs

207

and false negatives. Zeng et al. [22] constructed a heterogeneous network by integrating neighborhood information in the neural network to predict potential associations between miRNAs and diseases. A challenge with supervised learning methods is to need negative data, but in this case, it is difficult to identify reliable negative data. Therefore, some other studies have used the semi-supervised learning models instead of supervised learning models. Chen et al. [6] proposed a model of regularized least squares for miRNA-disease associations (named as RLSMDA), which was a semi-supervised method that does not need negative samples. However, how to obtain appropriate parameter values in RLSMDA was not well clear. Recently, Zhao et al. [24] developed an adaptive boosting method to build a strong classifier for miRNAs and diseases relationship detection. This method has quite high predictive results, but the explanation of the model is still low. The above mentioned methods have their different strengths and weaknesses, each of which may be suitable for a kind of biological data. In this paper, we developed a new computational method to uncover potential miRNA-disease associations. We first integrated the known miRNA-disease associations obtained from the HMDD database [13], disease semantic similarity, and miRNA functional similarity for building a heterogeneous biological network. We then represented topological and structural characteristics of this network into a feature vector dataset using the meta-path concept and formulated a binary classification problem. Like previous methods, we also faced with the problem of imbalance data, since there is a small number of known miRNA-disease associations which were obtained. To address this issue, we proposed a new undersampling method for construction of negative examples. This method is based on a clustering-based under-sampling algorithm. Finally, SVMs algorithm was used for training and testing the predictive models. Experimental results show that our model had higher accuracies when comparing to cases without using any data balancing strategies as well as the previous methods.

2 Materials and Methods 2.1 Method Overiew We proposed a framework for predicting miRNA-disease associations and it contains four steps. First, we build a heterogeneous network from three available biological databases: miRNA functional similarity, disease semantic similarity, and known miRNA-disease associations. Second, a feature vector dataset was constructed based on meta-paths through above network and a binary classification problem was formulated; note that each sample is a pair of one miRNA and one disease. Third, we propose a new clustering-based under-sampling for solving the imbalance data problem. Finally, a supervisor model is trained for predicting potential miRNA-disease associations. Illustration of the framework is presented in Fig. 1.

208

T. Van Thai et al.

2.2 Biological Datasets Construction In this section, we describe how to construct the biological data, including known miRNA-disease associations, miRNA functional similarity, and disease semantic similarity, detail of each dataset as follows. Known miRNA-Disease Associations. The known miRNA-disease association dataset used in our paper was downloaded from the HMDD v2.0 [13]. The set of diseases  is denoted by D= {di ; i = 1, ..., m} and the set of miRNAs is denoted by M = m j ; j = 1, . . . , n . The association matrix is represented by M D ∈ R mxn . In matrix M D, if disease di and miRNA m j have known association according to HMDD v2.0, the element M Di, j is set to 1, otherwise, M Di, j is set to 0. MiRNA Functional Similarity (M F S). Based on the assumption that functionally miRNAs are more likely to be associated with similar diseases and vice versa. We constructed miRNA functional similarity matrix M F S ∈ R nxn according to MISIM method proposed by Wang et al. [19], where M F S is a symmetrical square matrix and n is number of miRNAs. This method used the disease semantic similarity and the known associations between miRNAs and diseases to structure miRNA functional similarity matrix. Here, M F Si, j is the functional similarity score between miRNAs m i and m j . Disease Semantic Similarity (DSS). In order to describe the relationship among diseases, Directed Acyclic Graph (D AG) is constructed according to MeSH database [16]. We can use a subgraph D AG(di ) = (di , T (di ), E(di )) to denote the disease di , where T (di ) is a set of nodes consisting of all the ancestor nodes of the disease di and di itself, E(di ) is a link set containing all direct edges which start at parents nodes and end with child nodes. Given a disease t in D AG(di ), its contribution to the semantic score of disease di can be calculated by the following equation:  1 if t = di   DSdi (t) = (1)   max  ∗ DSdi (t )|t ∈ children of t if t = di Where the disease t ∈ T (di );  is the semantic contribution factor and set to 0.5 according to previous studies. The contribution of disease di to its own semantic value is 1.0, while the contribution of another ancestor diseases to the semantic of disease di will decrease with the increase of the distance between other diseases and disease di . Based on the contribution of ancestor diseases and disease di itself, the semantic value of disease di can be calculated as follows:  DSdi (t) (2) SS(di ) = t∈T (di )

Then, based on the assumption that the greater the disease pairs share their D AGs, the larger their similarity is, we defined the semantic similarity between disease di and disease d j as follows:

A New Computational Method Based on HN for Predicting MDAs

Fig. 1 General workflow containing four main steps

209

210

T. Van Thai et al.

 DSS(di , d j ) =

t∈Tdi ∩Td j (DSdi (t)

+ DSd j (t))

SS(di ) + SS(d j )

(3)

Finally, a symmetrical square matrix DSS is defined as the disease similarity matrix of diseases, where each element DSSi, j denotes the disease semantic similarity between di and d j .

2.3 Construction of MiRNA-Disease Heterogeneous Network and Prediction of MiRNA-Disease Associations For convenience, we first provide heterogeneous network and meta-path concepts and describe how to integrate the three above biological datasets into a miRNAdisease heterogeneous network. Then, we show how to build a feature vector dataset by representation of miRNA and disease pairs in that network. Heterogeneous Network. Let G = (V, E) be a network, where V denotes the set of nodes, and E represents links set. Its network schema, TG = (A, R) is a metatemplate of G, where A and R represent node type sets and edge type sets, respectively. Define a mapping function from node set to node type set as φ : V → A and a mapping function from link set to link type set as ϕ : E → R, which makes every node v ∈ V belongs to one special node type φ(v) ∈ A and every link e ∈ E belongs to one special link type ϕ(e) ∈ R. If a network contains more than one node types |A| > 1 or link type |R| > 1, this kind of network is called a heterogeneous network. Fig. 2a is a heterogeneous network consist of the two node types and two edge types. Meta-path. In a heterogeneous network, a meta-path P is defined as a path on a schema TG = (A, R) representing a series of connections between object types: R1

R2

Rk

P : T1 → T2 → · · · → Tk+1 , where Ti ∈ T, i ∈ {1, 2, · · · , k + 1} and Ri ∈ R, i ∈ {1, 2, · · · , k} . For example, Fig. 2b shows the two meta-paths, which extracted from the miRNA-disease heterogeneous network in Fig. 2a. miRNA-Disease Heterogeneous Network. In order to construct miRNA-disease heterogeneous network, we first convert M F S and DSS matrices to binary matrices, it means that M F Si, j is set to 1 if M F Si, j > t, otherwise, M F Si, j is set to 0, and the same for DSS matrix, where t is a certain threshold values. We have tried with various t values in our experiments. Then, we define a M D H matrix as follows:   MFS MD M DH = M D T DSS We can see that, M D H is a square binary matrix with size of (n + m)x(n + m) and it is a representation of the miRNA-disease heterogeneous network (see Fig. 2a). This network has two node types (miRNA and disease) and two edge types (similarity and association).

A New Computational Method Based on HN for Predicting MDAs

211

Fig. 2 An example of heterogeneous network and extracted meta-paths

Prediction of miRNA-Disease Association. We measured the association of a disease and a miRNA in the miRNA-disease heterogeneous network via meta-paths. Given a meta-path Ph , we defined a similarity score between a disease di and a miRNA m j as follows: D(di , m j |Ph ) =  k∈M

2xP_count (di , m j |Ph )  (4) P_count (di , m k |Ph ) + k∈D P_count (dk , m j |Ph )

where P_count (di , m j |Ph ) is number of the meta-path Ph between di and m j ;  P_count (di , m k |Ph ) is thetotal number of paths that satisfy the meta-path k∈M Ph starting from disease di ; and k∈D P_count (dk , m j |Ph ) is the total number of paths that satisfy the meta-path Ph ending at m j . The idea of this similarity measure was defined based on an assumption that two objects are said to be highly related to each other when the strength of their relationship is high and the number of relationships with other objects is less. It means that two objects are said to be highly related to each other when the exclusivity and the strength of their relationship are high. In general, it is difficult to numerate all of different meta-paths between the two vertices on a heterogeneous, because this is an NP-hard problem. According to previous literature research [4], we only select the meta-paths with length from 1 to 4. List of all meta-paths used in our paper is illustrated in Fig. 3. As we can see, there are fifteen different meta-paths in total. Notably, in context of this paper, each meta-path starts with a disease and ends in a miRNA. Subsequently, we have converted the topology characteristics into a feature vector dataset, where each pair (di , m j ) is represented by a fifteen-dimensional vector corresponding to the fifteen meta-paths; the value of each dimension is calculated according to the formula (4) as above.

212

T. Van Thai et al.

Fig. 3 Fifteen meta-paths with length from 1 to 4

With this data representation, we formulate the problem of identifying the association between miRNAs and disease as a binary classification problem, in which we train an SVM classifier with miRNA-disease pairs as the input data and the label of each pair is set to 1 or 0. We denote miRNA-disease pairs, verified by experiments, as positive samples (label = 1) and unverified miRNA-disease pairs as unlabeled samples (label = 0). Because a miRNA associated with a disease can only be verified through biological experiments, but verifying a miRNA not associated with any disease is very difficult, therefore, we have no reliable negative samples. Moreover, the number of the verified miRNA-disease pairs is very small when comparing to unverified miRNA-disease pairs. In order to solve this problem, several previous studies randomly selected samples from unlabeled samples as negative samples [10, 22, 24]. However, the random selection process was not guaranteed the quality of the negative sample set. Unlike previous studies, in this paper, we build a negative example set by using a clustering-based under-sampling algorithm, which presents in the next section.

2.4 A New Clustering-Based Under-Sampling Method As mentioned above, the number of known miRNA-disease associations is very small compared to the number of remaining elements, which is an imbalanced class distribution problem, one of the most difficult problems in data mining. In 2017,

A New Computational Method Based on HN for Predicting MDAs

213

Fig. 4 The idea of the clustering-based under-sampling method

Lin et al. [15] proposed a clustering-based under-sampling method to reduce the number of majority class samples based on a clustering algorithm. Their method was presented in three steps. Firstly, they assign the number of clusters equal to the number of samples in the minority class. Secondly, the cluster center is selected by kmeans clustering algorithm in the majority class samples. All majority class sample are substituted by these cluster centers. Finally, both majority and minority class have the same number of samples. This method works well in some cases, however, due to the disadvantage of selecting the cluster center of k-means, this method is limited. Fig. 4a illustrates the drawback of this method, due to the influence of noise elements affecting the cluster center selection. In order to solve this problem, we propose a new clustering-based under-sampling method. We added a weight to the samples in each cluster. The weight of each sample in the cluster is calculated by the number of nearest neighbors in the radius r. For example, as shown in Fig. 4b, samples A, B, and C with the nearest number of neighbors are 4, 3, and 3, alternately; while samples D has no any neighbors in the radius r. The cluster center will then be updated according to the following formula:  x j ∈Ci x j xw j  (5) Oi = wj Where x j is a data sample in the dataset, w j is number of nearest samples of x j in the radius r , Ci is a cluster (set of data samples) and Oi is the cluster center (the center of the cluster of Ci ). Fig. 4c illustrates the new cluster center better when use the new formula.

3 Experiments and Results As we described in Sects. 2.2 and 2.3, in order to construct a miRNA-disease heterogeneous network, we downloaded experimental verified miRNA-disease associations from HMDD v2.0, this dataset consists of 383 diseases, 495 miRNAs, and

214

T. Van Thai et al.

5430 miRNA-disease pairs. These miRNA-disease pairs are considered as positive examples, while negative examples are constructed using the clustering-based undersampling algorithm presented in Sect. 2.4. To balance the data distribution between the two classes, we set the negative examples equal to the positive examples. The miRNA-miRNA functional similarity and disease-disease semantic similarity matrices are calculated as in Sect. 2.2. Then, M F S and DSS matrices are converted to binary matrices using a threshold value τ , where M F Si j is set to 1 if M F Si, j ≥ τ , otherwise, M F Si j is set to 0 and DSSi j is set to 1 if DSSi j ≥ τ , otherwise, DSSi j is set to 0. To systematically evaluate performance of our methods, the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) is used. AUC is established by plotting the true positive rate (TPR) against false positive rate (FPR). The TPR and FPR could be calculated as follows: T PR =

FP TP and F P R = T P + FN T N + FP

(6)

where TP and TN are the number of correctly predicted positive and negative samples, respectively; FP and FN are the number of incorrectly predicted positive and negative samples, respectively. The AUC was calculated for a numerical evaluation of model performance. AUC = 0.5 denotes a purely random prediction while AUC = 1 denotes a perfect prediction.

3.1 Prediction Performance and Comparison To verify the prediction performance of our method, we carry out experiments with SVMs classification algorithm (using the kernlab package with default parameters in R [8]). We conducted 5-fold cross-validation experiments with value of τ ranging from 0.40 to 0.70. Fig. 5 shows the ROC curves of classifiers. As we can see, in the case of τ = 0.5, the AUC value is the best (0.961), therefore, we used the value of τ = 0.5 for further experiments to compare with other methods. We compare our proposed method, which choose negative examples using a clustering-based under-sampling algorithm, with random selection negative examples method and a method using all unverified miRNA-disease as negative examples (i.e. without using any sampling method). Done with 5-fold cross-validation experiments, we found that our method (with τ = 0.5) achieved AUC = 0.961, while the random method had AUC = 0.949 and the unused method sampling is 0.933. ROC curves comparison between the three methods is shown in Fig. 6. We further compared the prediction performance of our model with six up-to-date computational prediction models (i.e., Jiang et al. [10], RLSMDA [6], PBMDA [21], Katz-ML [23], ABMDA [24], and NNMDA [22]). RLSMDA was a semi-supervised learning method based on the framework of Regularized Least Squares (RLS) representing a good try in machine learning algorithm. ABMDA was an adaptive boosting model for detecting potential miRNA-disease associations. NNMDA was a newly

A New Computational Method Based on HN for Predicting MDAs

215

Fig. 5 Prediction performance of classifiers with τ values, ranging from 0.40 to 0.70 Table 1 Prediction performance of our method and several previous methods

Paper

Method’s name AUC

Jiang et al. 2013 [10] Chen et al. 2014 [6] You et al. 2017 [21] Zhang et al. 2019 [23] Zhao et al. 2019 [24] Zeng et al. 2019 [22] Our method

– RLSMDA PBMDA Katz-ML ABMDA NNMDA

0.888 0.842 0.916 0.919 0.917 0.937 0.961

published method representing the current level of computational prediction models in this domain. The performance comparisons in the 5-fold cross-validation experiments were shown in Table 1. As a result, PBMDA, Katz-ML, ABMDA, NNMDA, and our method achieved AUCs of 0.916, 0.919, 0.917, 0.937, and 0.961, respectively. Therefore, our proposed method significantly improves prediction performance of previous computational models by demonstrating its reliable and robust performance from these experiments.

216

T. Van Thai et al.

3.2 Evaluating Performance on New Diseases We further evaluated the performance of our method when predicting potential miRNA-disease associations for new diseases (no known associated miRNAs). To reach this goal, we conducted a case study by random selection of 12 diseases as a testing dataset (see Table 2). Therefore, the testing dataset includes 12 × 495 miRNAdisease pairs, including 322 known miRNA-disease associations. The remaining training set consists of 183,645 miRNA-disease associations after removing all links associated with 12 diseases in the testing dataset, in which the number of known miRNA-disease associations is 5108. Experimental results shown in Fig. 7. indicate

Table 2 Name of 12 diseases for testing No. Disease name 1 2 3 4 5 6

Thyroid Neoplasms Tongue Neoplasms Tourette Syndrome Toxoplasma Schizophrenia Scleroderma, Localized

No. Disease name 7 8 9 10 11 12

Pre-Eclampsia Pregnancy, Ectopic Neoplasms, Squamous Cell Neoplasms Muscular Disorders, Atrophic Muscular Dystrophies

Fig. 6 Comparison of prediction performance between our method, random under-sampling, and without using any sampling methods

A New Computational Method Based on HN for Predicting MDAs

217

Fig. 7 Comparison of AUC between random under-sampling and clustering-based under-sampling on the testing dataset

that the clustering-based under-sampling method still achieves a high AUC of 0.958, clearly higher when comparing to the random under-sampling (AUC = 0.946). These results allow us to conclude that our method can achieve the reliable prediction performance and could be applied to predict associated miRNAs for a new disease.

4 Conclusions In this study, we proposed a new computational model to predict the underlying miRNA-disease associations based on heterogeneous network and clustering-based under-sampling method. We integrated different biological information sources into a miRNA-disease heterogeneous network and representing the structural characteristics of this network as a feature vector dataset. Considering the problem of predicting miRNA-disease associations as a binary classification problem. We have also proposed a new clustering-based under-sampling algorithm that helps us to deal with an imbalance classification problem. To demonstrate the effectiveness of the proposed method, we used AUC values and ROC curves to measure the prediction performance and the experimental results. We then compared our model with six up-to-date methods and the comparison results further confirmed that our model was better than those. These results implied that our model could be used to discover reliable miRNA-disease associations in the human genome.

218

T. Van Thai et al.

Acknowledgements This research was supported by the Vietnam Ministry of Education and Training, project B2018-SPH-52.

References 1. Ambros, V.: The functions of animal microRNAs. Nature 431, 350–355 (2004) 2. Akao, Y., Nakagawa, Y., Naoe, T.: MicroRNA-143 and-145 in colon cancer. DNA Cell Biol. 26, 311–320 (2007) 3. Bandyopadhyay, S., Mitra, R., Maulik, U., Zhang, M.Q.: Development of the human cancer microRNA network. Silence 1(1), 6 (2010). https://doi.org/10.1186/1758-907X-1-6 4. Cao, X., Zheng, Y., Shi, C., Li, J., Wu, B.: Meta-path-based link prediction in schema-rich heterogeneous information network. Int. J. Data Sci. Anal. 3(4), 285–296 (2017) 5. Chen, H., Zhang, Z.: Similarity-based methods for potential human microRNA-disease association prediction. BMC Med. Genom. 6, 12 (2013) 6. Chen, X., Yan, G.-Y.: Semi-supervised learning for potential human microRNA-disease associations inference. Sci. Rep. 4, 5501 (2014) 7. Chen, X., Liu, M.-X., Yan, G.-Y.: RWRMDA: predicting novel human microRNA-disease associations. Mol. BioSyst. 8, 2792–2798 (2012) 8. Karatzoglou, A., Smola, A., Hornik, K., Zeileis, K.: Kernlab - an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004) 9. Iorio, M.V., Ferracin, M., Liu, C.-G., Veronese, A., Spizzo, R., Sabbioni, S., et al.: MicroRNA gene expression deregulation in human breast cancer. Cancer Res. 65, 7065–7070 (2005) 10. Jiang, Q., Wang, G., Jin, S., Li, Y., Wang, Y.: Predicting human microRNA-disease associations based on support vector machine. Int. J. Data Min. Bioinfor. 8, 282–293 (2013) 11. Kozomara, A., Griffithsjones, S.: miRBase: integrating microRNA annotation and deepsequencing data. Nucleic Acids Res. 39, D152–D157 (2011) 12. Lan, W., Wang, J., Li, M., Liu, J., Wu, F.X., Pan, Y.: Predicting microRNA-disease associations based on improved microRNA and disease similarities. IEEE/ACM Trans. Comput. Biol. Bioinform. 15, 1774–1782 (2016) 13. Li, Y., et al.: HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 42, D1070–1074 (2014) 14. Liu, Y., Zeng, X., He, Z., Zou, Q.: Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans. Comput. Biol. Bioinform. 14, 905–915 (2017) 15. Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in classimbalanced data. Inf. Sci. (Ny) 409–410, 17–26 (2017) 16. Lowe, H.J., Barnett, G.O.: Understanding and using the medical subject headings (meSH) vocabulary to perform literature searches. JAMA 271, 1103–1108 (1994) 17. Luo, J., Xiao, Q.: A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. J. Biomed. Inform. 66, 194–203 (2017) 18. Mugunga, I., Ju, Y., Liu, X., Huang, X.: Computational prediction of human disease-related microRNAs by path-based random walk. Oncotarget 8(35), 58526–58535 (2017) 19. Wang, D., Wang, J., Lu, M., Song, F., Cui, Q.: Inferring the human microRNA functional similarity and functional network based on microrna-associated diseases. Bioinformatics 26, 1644–1650 (2010) 20. Xu, C., Ping, Y., Li, X., Zhao, H., Wang, L., Fan, H., et al.: Prioritizing candidate disease miRNAs by integrating phenotype associations of multiple diseases with matched miRNA and mRNA expression profiles. Mol. BioSyst. 10, 2800–2809 (2014) 21. You, Z.H., Huang, Z.A., Zhu, Z., Yan, G.Y., Li, Z.W., Wen, Z., Chen, X.: PBMDA: a novel and effective path-based computational model for miRNA-disease association prediction. PLoS Comput. Biol. 13(3), e1005455 (2017)

A New Computational Method Based on HN for Predicting MDAs

219

22. Zeng, X., Xuan, Z., Liao, Y., Pan, L.: Prediction and validation of association between microRNAs and diseases by multipath methods. Biochim. Biophys. Acta 1860, 2735–2739 (2016) 23. Zhang, X., Zou, Q., Rodriguez-Paton, A., Zeng, X.: Meta-path methods for prioritizing candidate disease miRNAs. IEEE/ACM Trans. Comput. Biol. Bioinf. 16(1), 283–291 (2019) 24. Zhao, Y., Chen, X., Yin, J.: Adaptive boosting-based computational model for predicting potential miRNA-disease associations. Bioinformatics 35, 4730–4738 (2019). https://doi.org/10. 1093/bioinformatics/btz297

An Approach for Building Effective Real Estate Chatbots in Vietnamese Tuan-Dung Cao and Quang H. Nguyen

Abstract This paper presents a method for building a real estate chatbot automatically to support customers in Vietnamese. The chatbot is trained with data set collected on Facebook groups and from the famous real estate website in Vietnam. Using Logistic Regression, user’s intent recognition task achieves precision = 0.93, recall = 0.87 and F1-score = 0.89, while the automatic entity labeling achieves 83% accuracy thanks to the development of a real estate knowledge base. Besides, we report our experience on the design of dialog management modules.

1 Introduction According to the 2018 statistics of the Vietnam Real Estate Association (VNREA), the demand for real estate in big cities like Hanoi and Ho Chi Minh City is huge and developing relatively stable over the years. This has produced a huge demand related to real estate transactions and exchanges. But every day on real estate sites or Facebook groups, there are hundreds or even thousands of new articles, so reading the articles in turn to find results that match the user’s requirements or more quite hard and time-consuming, making it difficult for users. Moreover, with real estate sites with a large number of customers, the number of small support staff cannot be met. Therefore, the trend of using chatbots in helping customers is getting more and more attention. The real estate chatbot system can capture the user’s intention in the sentence; extract important information in the user’s request, thereby connecting the seller/lessor to the right buyer/tenant. T.-D. Cao and Q. H. Nguyen—Contributed equally to the work. T.-D. Cao · Q. H. Nguyen (B) School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam e-mail: [email protected] T.-D. Cao e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_19

221

222

T.-D. Cao and Q. H. Nguyen

Chatbot is a software built on the basis of artificial intelligence, can have a discussion/conversation with a human in natural language [6]. For questions posed by users, chatbots will help them find the best answer instantly, to their delight. Chatbots work independently, can automatically respond to questions or perform an activity as appropriate in practice. The scope and ability to meet the requirements of a chatbot depends on the application domain and the features that the developer defines [1]. So far “chatbot” is still a relatively new term, but essentially the foundation and basic concepts for chatbot have existed many years ago and are developing increasingly perfect [2]. In 1966, Joseph Weizenbaum developed a program called ELIZA [17]. By recognizing the main words and phrases from the user’s input and giving the corresponding answers using pre-written scripts, ELIZA has been seen as the first chatbot to appear. In 1984, William Chamberlain and Thomas Etter developed RACTER, which is an artificial intelligence program that randomly generates texts in English [18]. 2010 to 2016 is a booming time for virtual assistants. The first half of this decade saw the explosion of virtual personal assistants like Siri developed by Apple in 2010, Alexa developed by Amazon in 2014, Cortana by Microsoft in 2015 and Google Assistant by Google in 2016, Bixby by Samsung Electronics in 2017. With natural language analysis and processing capabilities, these virtual assistants connect to Web services to answer questions and meet user requirements. There are two approaches to chatbot development: rule-based chatbots and artificial-based chatbots. A rule-based chatbot will respond to questions based on a number of pre-defined rules. These rules can be designed from simple to very complex by developers. This is a fairly easy-to-understand and easy-to-follow approach. However, this approach does not work well with questions of which the structure is not pre-defined in the rule set [3]. In the second approach, through artificial intelligence methods such as support vector machines [7], naive Bayes classifier [8], neural networks [9], finite state machines [15] chatbots will be trained with a data set has been labeled called sample data set. After that, in the conversation, from the knowledge gained, a chatbot will grasp the context and intentions in the user’s sentence to take action or provide an appropriate answer. One of the advantages of this approach is the short construction time because it does not take time for defining rules. Since the classifier is automatically generated through learning from a training data set, the use of rules is no longer necessary. The coverage on the question content is higher and can be expanded by collecting additional training patterns. The rest of the paper presents a method to automatically build a chatbot specializing in customer support in real estate domain. Section 2 introduces the general architecture of the proposed chatbot. Section 3 disscuses a proposed method for user intent identification. Sections 4 and 5 present how we solve the problem of entities labeling and dialog management. The final section gives a conclusion and direction for development in the coming time.

An Approach for Building Effective Real Estate Chatbots in Vietnamese

223

2 Architecture and Components In this section, we discuss the overall architecture of the proposed real estate chatbot system. The process of building a Chatbot based on machine learning or AI-based approach has been introduced through several studies [14]. However, when deploying to a specific field as real estate, we proposed suitable methods for each step in the process. We have also added the necessary processing steps for the Vietnamese language. The architecture of the AI-based chatbot is described in Fig. 1. This architecture includes the following components: • Natural Language Understanding is the component that handles natural language [4–6, 12]. It has the function of analyzing user’s sentences in the conversation, including the user intent classification and entity identification [11]. • The Dialog Manager is the component that makes the decision to choose the next action. It combines the results obtained from Natural Understanding such as the intent of the sentence, the list of entities with the current state information of the conversation. In some cases, this component uses a knowledge base to retrieve necessary data to maintain the conversation and satisfy user requirements. • Knowledge Base acts like a chatbot’s database. This part stores knowledge as well as data in the predefined application domain. • Natural Language Generation is a component that plays the role of generating answers in natural language based on existing data, then returning the results to the user [16].

Natural Language Understanding Dialog Manager

Human Input Extract Intent

Extract Entities

Intent model

Entities model

Fig. 1 Overview architecture of AI chatbot

Knowledge Base

Natural Language Generation

Chatbot Output

224

T.-D. Cao and Q. H. Nguyen

3 User Intent Identification Using Logistic Regression In a conversation, each user’s sentence contains an implication or intention. The correct identification of the user’s intent is important in chatbot development because this information will determine the structure and script of the next conversation. Chatbot is built with the goal of automatically supporting customers in the real estate sector, so the intents have been pre-selected appropriately (Fig. 2). Data is collected from the batdongsan.com.vn Website, the leading real estate information website in Vietnam. The process of collecting and labeling data is carried out manually. For example, the sentence “Which street this house belongs to?” was labeled “Ask for address information”. The dataset includes 1720 sentences, divided into train set and test set in the ratio of 50:50. The process of question’s feature extraction is as follows: • Vocabulary set construction: this vocabulary set consists of 300 hand-selected words, which are important words in the classification of intent in the real estate sector. • Removal of stop-words: stop words are words that have no meaning in classification. Whether or not these words do not change the intent type of that sentence. • Word segmentation: the input sentence will be tokenized into single words that make sense. Different from many Latin languages, Vietnamese language does not use blanks for separating words and common tokenizers such as replacing blanks with word boundary does not work. We reused the ViTokenizer library for this task. For example, given a sentence as (I need to rent an apartment in Dong Da district), the obtained words are , , , , , . • Sentence representation using Bag of Words Vector (BoWV): BoW does not care about word order in sentences and semantics of words. This method models sentences by counting the number of occurrences of each word in that sentence. An example is illustrated in Fig. 3. Logistic Regression model was used for training in the user intent identification module. The experimental results obtained with test set are: precision = 0.93, recall = 0.87 and F1-score = 0.89.

Intents of Chatbot

Greeting

Need to buy a house

House for sale

Ask for address information

House for rent

Request a higher price

Fig. 2 List of intents for chatbot

Requires a larger area

Request lower prices

Ask for house price information

See illustrations

Ask the area

Ask home direction

Ask for the room number

Additional information

Need to rent a house

Negative

An Approach for Building Effective Real Estate Chatbots in Vietnamese

“The cat sat on the hat”

Bag of Words

225

{ 2, 1, 1, 1, 1, 0, 0, 0 }

Vocabulary: { the, cat, sat, on, hat, dog, ate, and }

“The dog ate the cat and the hat”

Bag of Words

{ 3, 1, 0, 0, 1, 1, 1, 1}

Fig. 3 The example illustrates the method Bag of Word uses to encode text sentences

In case the intent of the sentence is not defined or does not belong to the real estate sector, we propose the following method: • Firstly, we build a set of vocabulary containing important words in real estate. After that, the frequency of the words is statistically listed and sorted in descending order. After that, the importance of each word is evaluated manually. If the word does not mean much in the field of real estate, it will be removed. This last set of words includes 130 words. • Next, we set up a process for assessing whether a sentence is in the field of real estate or not as follows. The user sentence is segmented using the ViTokenizer tool to get a list of meaningful words. Each of these words is then checked to see if it is in the above real estate vocabulary. If none of them belong to the important vocabulary set, it is decided that this sentence is not related to the real estate sector. These sentences will be stored in the database for later review and use.

4 Automatic Entity Recognition and Labeling Extracting entities in a user’s sentence helps the chatbot identify specific objects that the user is referring to in the sentence, from which the appropriate answer can be found out. In the problem of developing real estate chatbot, the entities that need to be concerned are ones related to the address, the price, area.. of real estate. The types of entities are defined and described in Fig. 4. We carried out the following steps to create the dataset which will be used for the entity recognition task: • Data collecting: text data containing real estate information from sources such as Facebook groups about real estate and posts on batdongsan.com.vn website are crawled and store in a corpus.

226

T.-D. Cao and Q. H. Nguyen

Entities of Chatbot

Real estate codes

Price

Type

Transaction Type

Area

District

Street

Number of bedrooms

Ward/Commune

Direction

Address

Name

Phone Number

Email

Fig. 4 List of entities for chatbot

• Word segmentation: As presented in previous section, the input sentence will be separated into meaningful words using the ViTokenizer library. • Part of Speech (POS) analysis: POS explains how a word is used in a sentence, for example whether it is a verb, noun, adjective and so on. The results of this task are labeled words. Here are some example of POS a for , , , Vietnamese sentence: , , . • Entity labeling: Entities are manually assigned to specific types. For example: The which means “apartment” in english, having POS label as “N” entity with the is labeled as REAL_ESTATE_TYPE. While the named entity POS label as “Np”, is assigned as LOCATION. As a result, the collected data set included 2935 samples for training set and 860 samples for the test set. One of the most appropriate machine learning methods for the problem of entity identification is CRF (Conditional Random Field [10, 13]). In the feature selection step, in addition to the word w, we also use some features such as the prefix/suffix of w, the words surrounding w, whether w is in uppercase or lowercase, whether w is a number, or contains digits, the POS tag of w, and those of the surrounding words, whether w is or contains a special character [19]. The results obtained on the Test set give an accuracy of 83%. The entity recognition and labeling system that we developed predicts quite effectively some types of entities such as address, area, person name ... However, for some typical entities such as email, phone number, price, the use of regular expressions brings much better accuracy. The designed regular expressions are: req_email to identify the email, req_phone to determine the phone number, area to determine the area. In addition, the price_range regular expression can identify the price range expressed in the different forms: 10–15 million, 10 to 15 million, 10 million–15 million, 10.5 million–11.5 million. Correspondingly, the generated query will contain “price between 10 and 15”. Similarly, area_range is used to identify the area in various forms such as 100–150 m2 , 100 to 150 m2 , 100 m2 –150 m2 , ... and the query that chatbot uses to ask the database will contain “area between 10 and 15”.

An Approach for Building Effective Real Estate Chatbots in Vietnamese

227

One problem we faced is how to distinguish administrative units of address entities. It is quite difficult to distinguish the address information that is the name of a province, a district, a commune/ward, a street or a project. Therefore, the system only identifies the entity address with the same label as location. After extracting this address information, it will be checked based on the dataset of districts, communes/wards, streets, real estate projects in Hanoi area.

5 Designing Dialog Management The dialog management aims at taking care of the control so that the conversation takes place naturally and selecting the next action of the chatbot to respond to customer requests. To accomplish the mission, it should be able to determine the actual context of the dialogue. The inputs of this sub-system are the intent of the user sentence, extracted entities, the state of conversation and its history and based on these pieces of information, the chatbot decides the next action to perform. In the action-execution step, you may need to manipulate the database or check the status of the dialog to generate additional questions and pose them to the user. Our method for dialog management is mainly based on the script and the current state of conversation: • The chatbot scenario is a sequence of tasks that a chatbot needs to perform during its interaction with customers. It is designed to describe the conversation stages for each type of customer and the action plan for each customer’s intent in that stage. It also indicates when to change the state and the stage of the conversation. Relying on such detailed descriptions, the chatbot can answer customer questions and lead the conversation smoothly. • In the real estate domain, the chatbot was designed with two roles. One is to collect information from the owner of the real estate in need of selling or renting. The second is finding relevant products for the customer who wants to buy or rent real estate. Therefore, we design correspondingly two kinds of scenarios: collecting information about real estate and searching and recommending real estate as illustrated in the next section.

6 Prototype The system has been implemented on the Python language and was designed to be easily integrated on other platforms such as Facebook, Web, Android, and IOS. Figure 5 depicts some conversations between user and the chatbot. The script of conversation is illustrated as below: • Chatbot: Hi, I’m Rech, a chatbot that can support you in the field of real estate. Are you in need of buying, selling, renting or leasing real estate?

228

T.-D. Cao and Q. H. Nguyen

Fig. 5 An example of conversation with chatbot

• • • • • • • • • • • •

User: I have a house to rent. Chatbot: Can you tell the address of the house or the project name it belongs to? User: 5 Nguyen Trai Street, Dong Da District Chatbot: Do you want to add detailed information such as price, area, number of bedrooms, main directions or photos of the house? User: This house is for rent for 6.5 million. It’s 80 m2 with 2 bedrooms. Chatbot: Do you want to provide more information on the main direction, photos of this house? User: No Chatbot: Please leave your name, phone number or email so we can contact you. User: My name is Hai, my phone number is 0353765791 Chatbot: Thank you, allow me to confirm the information: Mr Hai, contact phone: 0353765791; Transaction: House for rent at 5 Nguyen Trai - Dong Da, area 80m2, price 6.5 million with 2 bedrooms. Is the above information correct? User: Update for me the rental price as 7.5 million. Chatbot: Information has been updated.

7 Conclusions and Perspectives This article presents an artificial intelligence-based method for building real estate chatbots who can communicate with customers in natural language. The chatbot is trained with data set collected on Facebook groups and from the famous Real Estate Website in Vietnam. By applying Logistic Regression, the user intent identification component achieves precision = 0.93, recall = 0.87 and F1-score = 0.89, while the automatic entities recognition and labeling task achieve the accuracy of 83%. Besides, the article also describes the method of designing dialog management effectively. The chatbot developed using the proposed method can understand quite well the intention in the statement of the user and the entity information that the user mentioned. Once you understand the intent of the sentence, in combination with the state of conversation, the

An Approach for Building Effective Real Estate Chatbots in Vietnamese

229

chatbot will understand what to do next. And when it catches the right entities, the chatbot can interact or access the right data that the user wants. These two functions are achieved thanks to the application of machine learning methods and a variety of training data sets that provide relatively high predictive accuracy. In the future research, we will apply deep learning technique to improve performance of system. Acknowledgements We wish to thank Hai Nguyen for his valuable assistance in the technical implementation of the system. We would like to thank reviewers for their insightful comments on the paper, which have improved our manuscript substantially.

References 1. Chatbot’s definition on Wikipedia. https://en.wikipedia.org/wiki/Chatbot 2. MacTear, M., Callejas, Z.: The History of Chatbots. https://onlim.com/en/the-history-ofchatbots/ 3. Goebel, T.: Machine Learning or Linguistic Rules: Two Approaches to Building a Chatbot. https://www.cmswire.com/digital-experience/machine-learning-or-linguistic-rules-twoapproaches-to-building-a-chatbot/ 4. Pavel Surmenok, Chatbot Architecture 2016. https://medium.com/@surmenok/chatbotarchitecture-496f5bf820ed 5. Chowdhury, G.G.: Natural language processing. Ann. Rev. Inf. Sci. Technol. 37(1), 51–89 (2003) 6. Nhat, M.P.Q.: Overview of natural language processing problems in chatbot system development 2017. https://techinsight.com.vn/khai-quat-cac-bai-toan-xu-ly-ngon-ngu-tu-nhientrong-phat-trien-thong-chatbot/ 7. Evgeniou, T., Pontil, M., Support vector machines: theory and applications. In: Advanced Course on Artificial Intelligence, vol. 5, pp. 249–257. Springer, Heidelberg (1999) 8. Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence 4 August 2001, vol. 3, no. 22, pp. 41–46 (2001) 9. McGreal, S., Adair, A., McBurney, D., Patterson, D.: Neural networks: the prediction of residential values. J. Property Valuation Investment 16(1), 57–70 (1998) 10. Li, S.: Named Entity Recognition and Classification with Scikit-Learn 2018. https:// towardsdatascience.com/named-entity-recognition-and-classification-with-scikit-learnf05372f07ba2 11. Rabiner, L.R., Juang, B.H.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986) 12. Berger, A.L., Pietra, V.J., Pietra, S.A.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996) 13. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001) 14. Jurafsky, D., Martin, J.H.: Dialog systems and chatbots. Speech Lang. Process. 3 (2017) 15. Robino, G.: Dialogs modeled as finite state machines 2016. https://medium.com/ @solyarisoftware/dialoghi-come-macchine-a-stati-41bb748fd5b0 16. Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000) 17. Weizenbaum, J.: ELIZA, a computer program for the study of natural language communication between man and machine. Commun. ACM 9(1), 36–45 (1966) 18. Chamberlain, W.: The Policeman’s Beard is Half Constructed: Computer Prose and Poetry. Warner Books, New York (1984) 19. Performing Sequence Labelling using CRF in Python, Albert Au Yeung, May 23, 2017. http:// www.albertauyeung.com/post/python-sequence-labelling-with-crf/

Ensemble of Convolutional Neural Networks for the Detection of Prostate Cancer in Multi-parametric MRI Scans Quang H. Nguyen, Mengnan Gong, Tao Liu, Ou Yang Youheng, Binh P. Nguyen, and Matthew Chin Heng Chua

Abstract Prostate MP-MRI scan is a non-invasive method of detecting early stage prostate cancer which is increasing in popularity. However, this imaging modality requires highly skilled radiologists to interpret the images which incurs significant time and cost. Convolutional neural networks may alleviate the workload of radiologists by discriminating between prostate tumor positive scans and negative ones, allowing radiologists to focus their attention on a subset of scans that are neither clearly positive nor negative. The major challenges of such a system are speed and accuracy. In order to address these two challenges, a new approach using ensemble learning of convolutional neural networks (CNNs) was proposed in this paper, which leverages different imaging modalities including T2 weight, B-value, ADC and Ktrans in a multi-parametric MRI clinical dataset with 330 samples of 204 patients for training and evaluation. The results of prostate tumor identification will display benign or malignant based on extracted features by the individual CNN models in Q. H. Nguyen (B) School of Information and Communication Technology, Hanoi University of Science and Technology, Dai Co Viet Road, Hanoi 100000, Vietnam e-mail: [email protected] M. Gong · T. Liu · M. C. H. Chua Institute of Systems Science, National University of Singapore, 29 Heng Mui Keng Terrace, Singapore 119620, Singapore e-mail: [email protected] T. Liu e-mail: [email protected] M. C. H. Chua e-mail: [email protected] O. Y. Youheng Department of Orthopaedics, Singapore General Hospital, Outram Road, Singapore 169608, Singapore e-mail: [email protected] B. P. Nguyen School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_20

231

232

Q. H. Nguyen et al.

seconds. The ensemble of the four individual CNN models for different image types improves the prediction accuracy to 92% with sensitivity at 94.28% and specificity at 86.67% among given 50 test samples. The proposed framework potentially provides rapid classification in high-volume quantitative prostate tumor samples.

1 Introduction According to a current report of the American Cancer Society (ACS) for 2019, the number of deaths from prostate cancer and new cases of prostate cancer will be approximately 31,620 and 174,650, respectively. In the U.S, prostate cancer is the second leading cause of death, behind lung cancer [1]. This high disease burden represents a tremendous cost associated with the diagnosis of prostate cancer. Early-stage detection is amenable to a greater range of treatment options and could lead to better outcomes [2]. MP-MRI scanning is a noninvasive way to detect early stage prostate cancer. The investigation’s popularity has recently increased dramatically and it has become a common investigation in health screenings in males aged 40 and above. At present, such scans are interpreted manually by a radiologist and the workload for image review has grown exponentially. To address this issue, we developed a classification model that supports reading prostate MP-MRI scanned images using Deep Learning (DL) approaches. Our DL system is built up of multilayer neural networks to perform general quantitative classification [3]. When the outcome is positive, our system will automatically pass these images to the doctor for further analysis to prevent any false positive cases. When the outcome is negative, the images excluding extra pathological testing will be printed to enclose with the checkup report marked as negative label beside specificity performance [4]. With advances in deep learning algorithms and greater training data, it is hoped that the specificity will exceed that of a human radiologist. In most of the recent studies, researchers were focused on prostate tumor detection and segmentation on the tissue interface to identify the Gleason grade group for the prostate tumor scans [5]. In this paper, we focused on improving the specificity of negatively diagnosed cases. The proposed method tries to address both speed and accuracy challenges in the following sections. To implement a rapid CNN model for prostate cancer tumor classification, patches of tumor candidates were extracted from images with size of 8 × 8 pixels for all CNN models. Since the T2w is currently the best model with accuracy of 90% from prototyping and other studies [6], the researchers extracted the T2w patches with 32 × 32 pixels as the first weighted model. On the other hand, in the comparison between classification models using Support Vector Machine (SVM) and ones applying neural networks (NNs) [7], a system employing NNs provided better flexibility to improve accuracy by model ensemble. The ensemble framework relies on a major weighted CNN model of T2w image. Meanwhile, ADC, B-value, and Ktrans model will be held responsible for assisting T2w model when two or more of these three models give different results from T2w model [8] to finally come up with an adjusted result with better accuracy compared to single T2w CNN model.

Detection of Prostate Cancer in Multi-parametric MRI Scans

233

2 Data Preparation The dataset of prostate tumor that we use in this study was provided by SPIE-AAPMNCI PROSTATEx Challenges [8] that includes 204 research individuals with the raw images of types T2w, Bval, ADC and Ktrans together with a lesion information description of prostate tumor position, IJK, and voxel spacing. In 204 research individuals, there are a total of 330 clinically significant samples from different anatomic zones containing the lesion within the prostate scans indicating different zones, including anterior fibromuscular stroma, peripheral zone, seminal vesicle, and transition zone [9]. The main lesion information comprised of tumor position (scanner coordinate position of the findings) and voxel spacing (vector with x, y, and z spacing scalars for locating tumor from images) [8]. In the lesion information CSV spreadsheet, IJK is the descriptive information of image column, row, and slice coordinate of findings. Data cleansing consists of images resampling, histogram equalization, and class balance between benign and malignant [10]. Before extracting tumor patch from images, the raw images were converted into NIFTI format with resampling image size to fit into the NN model. Histogram equalization was used to improve the global contrast of each format image. The following sample images were processed with reference to the cumulative density function in Fig. 1. From the equalized images comparison above, it is clear that the contrast intensity value was dramatically improved. And the next step of data cleansing, data balancing between benign/malignant classes for the dataset was carried out. Initially, the class balance was approximately 3.3 to 1 for benign to malignant. Hence, those cancer samples were oversampled by 4 rotations (45, 90, 180, 270) to add to the final dataset. An example of T2w image format is displayed in Fig. 2. The primary data cleansing includes the three steps described in the above sections. We started with data cleaning where the four formats of images were extracted for tumor patches, applied into NN model separately, and the four models ensembled into a single framework for final prostate tumor predictions.

Fig. 1 Histogram equalization comparison of original and equalized images

234

T2 Original Image

Q. H. Nguyen et al.

450 Rotation

900 Rotation

1800 Rotation

2700 Rotation

Fig. 2 T2 original image rotation with cancer cases

The NIFTI format files were converted from raw images of MP-MRI, which would be resampled and saved as new NIFTI files after data cleansing. Therefore, the final dataset was created following the processing of image size normalization, histogram equalization, and malignant tumor samples rotations. The tumor patches were extracted from the resampled dataset as displayed in Fig. 3. Patch size of tumor extraction was based on performance evaluation of a prototype NN model with original image size. Particularly, for T2w, the original image size is 384 × 384 while for ADC, Bval, and Ktrans, the original image sizes are 128 × 84, 128 × 84, and 128 × 128, respectively. Therefore, the patch matrix sizes for T2w, ADC, Bval and Ktrans were designed as 32 × 32, 8 × 8, 8 × 8 and 8 × 8, respectively for data input of individual NN model. Figure 4 gives an overview of the whole dataset preprocessing pipeline from raw data to modeling and evaluation. Initially, the raw data included images of T2w, ADC, Bval, and Ktrans from 204 research individuals as described in section A. Dicom is the format of T2, ADC, and Bval, which need to be converted into NIFTI format via Dicom converter [11]. Additionally, MHD—Ktrans format also needs to be converted into NIFTI format

Fig. 3 T2w, ADC, Bval and Ktrans tumor patch extracting

Detection of Prostate Cancer in Multi-parametric MRI Scans

235

Fig. 4 Dataset preprocessing pipeline

via MHD converter. In data cleansing, the NIFTI files from original images will be resampled with image normalization, histogram equalization and class balance processes to be exported as new NIFTI files for tumor patch extraction. During the phase of tumor patch extraction, data was stored as a numpy array format—a data frame containing the necessary information including ProxID, DCMSerDesc, and the path to the resampled NIFTI file for prostate cases connection. Both of the CSV files can be joined to tabular information that is provided for further analysis. The numpy data then will be merged with the dataset of lesion information from a CSV file of image description and a CSV file of tumor findings. These data are eventually used for training the CNN model, validation, and evaluation of the model accuracy and loss.

3 Deep Learning Approach The proposed framework for negative prostate cancer classification is presented in this section. An overview of the proposed method is summarized in Fig. 5. T2w

ADC B-values

Ktrans Extracting Tumor patch

Applying CNN Models

T2w tumor patch ADC tumor patch

T2w CNN model ADC CNN model

B-values tumor patch Ktrans tumor patch

B-values CNN model Ktrans CNN model

Images Preprocessing

Fig. 5 Overview of quantitative prostate cancer classification approach

Identifying Negative Prostate Cancer Ensemble CNN model

236

Q. H. Nguyen et al.

The MP-MRI images consist of T2-weighted images, B-value images, Apparent diffusion coefficient (ADC) maps, and Ktrans (MHD/zraw) files. This image modality has been widely used in medicine, such as, in brain tumor diagnosis and surgery [12–14]. The images are marked with prostate tumor positions with voxel spacing in the description files. These images were then preprocessed and extracted into required data format as a numeric matrix to leverage them in training and predicting for a benign or malignant result using CNNs. Since our proposed method was constructed by an ensemble of NNs from different models including T2W, B-value, ADC, and Ktrans, we expect that it will improve the accuracy of the classification result. In the meantime, because the design of the proposed method aim to focus on negative prostate cancer exclusion, the predicted outcome from the ensemble of models need to be evaluated through binary classification (True/False and Positive/Negative) with sensitivity and specificity for the primary performance based on the negatively diagnosed recognition [9]. In the proposed method for prostate tumor recognition, there are four NN models corresponding to each application of T2w, ADC, Bval, and Ktrans. Since T2w-based extracted patch is 32 × 32, its results need more layers for training than other image formats. The architecture of each NN models is described in Fig. 6. Overall, the ensemble outcomes from the four models will serve as a base to output the final result depending on each model weighted-voting. The final result will be evaluated via binary conclusion (True/False or Positive/Negative) to calculate the specificity and sensitivity. On the other hand, in order to specify the best suitable epoch number of convolutional layers, each model will be validated via Nadam optimizer with early stop process based on the validation loss function. Optimizer Nesterov-accelerated Adaptive Moment Estimation (Nadam) is the combination of Adam and NAG. In order to integrate NAG with Adam [15, 16], variable mˆ t was modified from its momentum term βt . mˆ t and βt were estimators of the first moment (the mean) and the second moment (the uncentered variance) of the gradients respectively as display in the formula.   η (1 − βt )gt βt mˆ t + (1) θt+1 = θt −  1 − βt1 vˆt + ε With optimizer Nadam, the number of training epochs for each NN model was decided by validation loss attribute instead of manual setting. In the prototype of NN’s application, the ADC model was improved significantly by Nadam optimizer as shown in the comparative result of Fig. 7. The gradient of the ADC model starts vanishing around epoch 20 and subsequent layers do not contribute to improving accuracy. With Nadam optimizer, the decrease in loss function is monitored with each increasing epoch to ensure that each of the epoch added in the CNN model is valid and significantly contributes to training result.

RELU

Dropout (0.5)

MaxPooling (2x2)

Dropout (0.5)

Dropout (0.5)

Dense (256)

Dropout (0.5)

Dense (512)

3x3 Convolution 2D (32)

Dense (128)

Dropout (0.5)

Dense (256)

Dropout (0.5)

237

Dense (128)

Dropout (0.5) RELU Dropout (0.5)

Dense (1)

Dense (512)

Dropout (0.5)

MaxPooling (2x2) Sigmoid Dropout (0.5)

3x3 Convolution 2D (32) Dense (1)

Output

Sigmoid

T2w Patch (32x32)

MaxPooling (2x2)

Dropout (0.5)

Output

RELU

3x3 Convolution 2D (32)

RELU

b)

3x3 Convolution 2D (32)

ADC/Bval/Ktrans Patch (32x32)

Dropout (0.8)

RELU

3x3 Convolution 2D (32)

a)

RELU

3x3 Convolution 2D (32)

Detection of Prostate Cancer in Multi-parametric MRI Scans

Fig. 6 CNN models for a ADC, Bval, Ktrans and b T2w

4 Experimental Results The experiment was designed to evaluate the processing speed and accuracy of the proposed methods for quantitative prostate cancer recognition. The experiment was implemented using Python 3.7 programming language and a PC with an Intel(R) Core(TM) i5-7500 3.40 GHz CPU and 32 GB RAM. The average processing time of 50-test-sample prediction is 0.436 s in T2 image and 0.306 s in other individual images. The patch of the extracted tumor for T2w was 32 × 32 pixels while it was set as 8 × 8 pixels for ADC, Bval, and Ktrans. In the experimental dataset, there are 614 samples in total including 254 negative samples and 360 positive samples (after rotation of original positive samples). The dataset was divided into 3 parts consisting of 451 training samples, 113 validation samples, and 50 testing samples as displayed in Table 1. The CNN model was implemented with Keras based on Tensorflow back-end. This CNN model was built on four NNs for different types of image recognition and the performance of every single model is shown in Table 2.

238

Q. H. Nguyen et al.

Fig. 7 Comparison of epoch setting without/with Nadam optimizer Table 1 Experimental dataset Image type T2w Total params Trainable params Non-trainable params Train samples Validate samples Test samples

ADC

B-value

Ktrans

1,409,505 1,409,505 0

321,899 321,899 0

321,899 321,899 0

321,899 321,899 0

451 113 50

451 113 50

451 113 50

451 113 50

Table 2 shows that model T2w has the best performance among all four models while other models’ testing accuracies are all lower than 90%. Finally, the returned outcome was used in binary label with True/False or Positive/Negative validation based on the weighted voting ensemble model. Any negative predicted result of the individual model also contribute to the final result. The consensus in dealing with a problem is one of the best advantages of the ensemble model. The conceptual voting mechanism is displayed in Fig. 8.

Detection of Prostate Cancer in Multi-parametric MRI Scans Table 2 Performance of models Image type Validation accuracy (%) T2w ADC Bval Ktrans Ensemble model

92 86 82 78 N/A

239

Test accuracy (%) 90 86 86 74 92

Fig. 8 Ensemble model

According to the result obtained from the ensemble NN model, the testing dataset has 50 samples that needs to be used in a final test among four models with True/False or Positive/Negative label. The individual models and ensemble model performances validation is displayed in Table 3.

Table 3 Comparison of T2, Bval, Ktrans and ADC model performances T2 model (acc 90%) Bval model (acc 86%) Pre/Truth Positive Positive 32 Negative 2 Ktrans model (acc 74%)

Negative 3 13

Pre/Truth Positive Positive 31 Negative 3 ADC model (acc 86%)

Negative 4 12

Pre/Truth Positive Negative

Negative 2 4

Pre/Truth Positive Negative

Negative 6 14

Positive 33 11

Positive 29 1

240

Q. H. Nguyen et al.

Table 4 The performance of the weighted-voting ensemble model Pre/Truth Condition positive Condition negative Outcome positive Outcome negative

33 2 Sensitivity 94.28%

2 13 Specificity 86.67%

PPV = 94.28% NPV = 86.67%

The accuracy of the ensemble model is 92%, which is higher than any of the four models above. This model also attains the sensitivity of 94.28% and specificity of 86.67% on testing data as shown in Table 4.

5 Discussion and Conclusion Our proposed methods in this paper have demonstrated that the diagnostic procedure for prostate cancer detection could be remarkably improved by taking advantage of machine learning approaches in prostate cancer detection. This method can be further studied to widely apply in hospitals and healthcare centers. The effectiveness of the model can facilitate in reading scanned images in a more effective and accurate way by rejecting those negative scanned images. Our model was built of four NN models including T2w, ADC, Bval, and Ktrans. Moreover, as an ensemble model, our predicted outcome combines the results from four models and computes the final outcome based on each model’s weight. Interestingly, our model not only accepts four different formats from the MRI but also harnesses collective intelligence through model ensemble. The model ensemble makes the comprehensive usage of the MRI possible instead of separate analysis of different MRI formats with possible lost features. Additionally, the ensemble model performed better compared to every single model using the same dataset. Therefore, this approach may provide a rapid and quantitative advance in MP-MRI prostate cancer detection. However, there are some minor limitations that we need to improve in further studies. The limitations of our proposed model include: • The specificity of 86.67% is not sufficient to use in real medical applications because this model may miss the false negative cases of prostate cancer patients. • Since the result is confined to the dataset from the SPIE-AAPM-NCI PROSTATEx challenges, the performance needs to be verified using more extensive datasets. In conclusion, we have conducted a study on a rapid and quantitative model for prostate cancer recognition and excluded pathology using ensemble NN applied on Multi-Parametric MRI scanning images. Under the clear experimental evidence, our model has come up with satisfactory testing specificity instead of accuracy.

Detection of Prostate Cancer in Multi-parametric MRI Scans

241

References 1. Cancer.Net Editorial Board. Prostate cancer: Statistics (2019). https://www.cancer.net/cancertypes/prostate-cancer/statistics 2. Chung, A.G., Khalvati, F., Shafiee, M.J., Haider, M.A., Wong, A.: Prostate cancer detection via a quantitative radiomics-driven conditional random field framework. IEEE Access 3, 2531– 2541 (2015) 3. Sumathipala, Y., Lay, N., Turkbey, B., Smith, C., Choyke, P.L., Summers, R.M.: Prostate cancer detection from multi-institution multiparametric MRIs using deep convolutional neural networks. J. Med. Imaging 5(4), 044507 (2018) 4. Lemaître, G., Martí, R., Freixenet, J., Vilanova, J.C., Walker, P.M., Meriaudeau, F.: Computeraided detection and diagnosis for prostate cancer based on mono and multi-parametric MRI: a review. Comput. Biol. Med. 60, 8–31 (2015) 5. Tian, Z., Liu, L., Fei, B.: Deep convolutional neural network for prostate MR segmentation. In: Medical Imaging 2017: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10135, pp. 101351L. International Society for Optics and Photonics (2017) 6. Du, W., Wang, S., Oto, A., Peng, Y.: Graph-based prostate extraction in T2-weighted images for prostate cancer detection. In: Proceedings of the 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2015), pp. 1225–1229. IEEE (2015) 7. Chang, C., Hu, H., Tsai, Y.: Prostate cancer detection in dynamic MRIs. In: Proceedings of the IEEE International Conference on Digital Signal Processing (DSP 2015), pp. 1279–1282, July 2015 8. Cancer Imaging Archive Wiki. SPIE-AAPM-NCI PROSTATEx challenges (2019). https:// wiki.cancerimagingarchive.net/display/Public/SPIE-AAPM-NCI+PROSTATEx+Challenges 9. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. Ser. B (Methodol.) 57(1), 289–300 (1995) 10. Panca, V., Rustam, Z.: Application of machine learning on brain cancer multiclass classification. In: AIP Conference Proceedings, vol. 1862, pp. 030133. AIP Publishing (2017) 11. DiffusionKit. Data processing pipeline (2019). https://diffusionkit.readthedocs.io/en/latest/ userguide.html 12. Chen, X., Nguyen, B.P., Chui, C.-K., Ong, S.-H.: Automated brain tumor segmentation using kernel dictionary learning and superpixel-level features. In: Proceedings of the International Conference on Systems, Man, and Cybernetics (SMC 2016), Budapesh, Hungary, 9–12 Oct 2016, pp. 2547–2552. IEEE (2016) 13. Chen, X., Nguyen, B.P., Chui, C.-K., Ong, S.-H.: Reworking multilabel brain tumor segmentation - an automated framework using structured kernel sparse representation. IEEE Syst. Man Cybern. Mag. 3(2), 18–22 (2017) 14. Chen, X., Nguyen, B.P., Chui, C.-K., Ong, S.-H.: An automatic framework for multi-label brain tumor segmentation based on kernel sparse representation. Acta Polytech. Hung. 14(1), 25–43 (2017) 15. Ruder, S.: An overview of gradient descent optimization algorithms, 1–14 September 2016. CoRR, abs/1609.04747 16. Yousefian, F., Nedi´c, A., Shanbhag, U.V.: A smoothing stochastic quasi-newton method for nonlipschitzian stochastic optimization problems. In: Proceedings of the 2017 Winter Simulation Conference, pp. 183. IEEE Press (2017)

Adaptive Fuzzy Backstepping Sliding Mode Control for Omni Mobile Robot Over Network Control System Minh Ngoc Pham, Vinh Quang Thai, Duyen Kim Thi Ha, Tien Manh Ngo, Cuong Manh Nguyen, Manh Van Tran, and Hiep Quang Do

Abstract The performance characteristic of the control system over a network can be significantly affected by network delay effects. These effects can be more exacerbated when data loss occurs during network communication. Thus, in this paper network effects could be resolved by utilizing a structure of gain scheduler middleware (GSM) in setting robot path-tracking. Simultaneously, this teleoperation control approach combines with the Adaptive Backstepping Sliding Mode controller based on self—tuning Fuzzy adapting control parameters to stabilize the control system. The simulation result of the proposed controller operating over data network depicts that the control quality would be improved than normal Backstepping Sliding Mode Control without GSM methodology even in disturbance environment.

M. N. Pham (B) · V. Q. Thai Institute of Information Technology, Vietnam Academy of Science and Technology, Hanoi, Vietnam e-mail: [email protected] D. K. T. Ha Hanoi University of Industry, Hanoi, Vietnam e-mail: [email protected] T. M. Ngo · C. M. Nguyen · M. V. Tran Institute of Physics, Vietnam Academy of Science and Technology, Hanoi, Vietnam e-mail: [email protected] C. M. Nguyen e-mail: [email protected] M. V. Tran e-mail: [email protected] H. Q. Do University of Economics-Technology for Industries, Hanoi, Vietnam e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_21

243

244

M. N. Pham et al.

1 Introduction The omnidirectional mobile robot is a type of holonomic model that is widely disseminated in many fields, especially in the industry because of its flexible and precise movement. Currently, a great number of scientific articles in the Omni robot’s trajectory control was published from simple to more complex control including PID, fuzzy, Backstepping technique and adaptive control as in [1, 3, 4, 5, 6, 7], and [8]. A novel control method is “Backstepping technique aggregated with sliding mode control” in [1] is increasingly focused on. The advantage of this controller is the ability of disturbance elimination, short settling time; result in extending the scale of this controller in real-time. Nevertheless, choosing reasonable parameters plays a crucial role in the controller in order to the variable state of the Omni robot toward equilibrium. Therefore, based on fuzzy set theory, appropriate control parameters will be easy to find out by fuzzy logic control [4, 9–11], and [12] and optimize system quality via the network connection [12]. In general, control systems with its traditional communication architecture via wire could have exposed several drawbacks such as the limitation of spaces, high cost of wire and time-consuming to maintain. Hence, NCS (network control system) or network-based are distributed control systems is an advanced approach to overcome disadvantages traditional connection and satisfy modularity, integrated diagnostics and low cost. NCS controlled by a controller not situated at the same place with any sensors or actuators. Components of the system are able to operate through the communication network and create a closed control loop. Due to the aforementioned advantage, NCS have been disseminated in industrial applications comprising remote process control, data acquisition systems, and remote monitoring systems and automobile [13]; this class of system is referred to as teleoperation control. A central controller and a controlled plant will be installed in two different locations, called “local site” and “remote site” respectively. There are two general methods to design an NCS. The first method is to have several subsystems form a hierarchical structure, in which each of the subsystems contains a sensor, an actuator, and a controller by itself, as depicted in Fig. 1. In this case, a subsystem controller receives a setpoint from the control center. The subsystem then tries to satisfy this set point by itself. The data or status signal is transmitted back to the center via a network. The second method is direct structure in NCS that was applied to control Omni robot in this paper, as illustrated in Fig. 2. Components including the sensor and actuators connected directly to a network and create the control loop. With this approach, control signals and feedback signals are transmitted through the network among the controller and a plant. In this article, we choose the direct data transfer method with Gain Scheduler Middleware (GSM) structure to control Omni Robot online in order to navigate the desired trajectory, as shown in Fig. 3. The calculation system of the control signal in the desired trajectory, installed on the Omni robot control system to transmit to the

Adaptive Fuzzy Backstepping Sliding Mode Control for Omni Mobile Robot …

Fig. 1 Data-transfers of hierarchical structure

Fig. 2 Data transfers of direct structure

Fig. 3 Model of Omni robot control system via a network

245

246

M. N. Pham et al.

actuator via the communication. A challenging problem in the control of networkbased systems is the network delay effects. The time to read a sensor measurement and to send a control signal to an actuator through the network depends on network characteristics such as topology and routing schemes. Moreover, the overall performance of an NCS can be affected significantly by network delays. Some control architectures were present to address the influences of stochastic time delay related to random delays [12], and [13]. Nevertheless, these methodologies were utilized in specific situations, even the stability of network-based systems have been overlooked. Compare to some control structures in the past, GSM structure is more conducive to deal with stability, safety and network delay effects based on predicting and approximating time delays between a controller and a plant. To compensate for network delay, GSM amplification scheduling software technology mentioned in [13] illuminate network delay impact on control system based on communication networks. The rest of the paper is organized as follows: Sect. 2 describes the model of Omni Robot control system via a communication network. Section 3 shows the mathematical model of the Omni robot. Section 4 presents an adaptive Backstepping Slide controller based on a fuzzy logic system that adjusts the controller parameters. Section 5 discusses stability in Omni robot control and delay compensation. Finally, in Sect. 4, the simulation results and comparison with other control methods from which to draw conclusions.

2 Model of Omni Robot Figure 4 illustrated the Omni robot has four wheels that are 90° apart. Oxy represents the global coordinate axis, the distance between wheels and the robot center was defined by d. Meanwhile, the velocity of the robot contains vx , v y , and ω denoted robot’s angular velocity.

Fig. 4 Omni robot

Adaptive Fuzzy Backstepping Sliding Mode Control for Omni Mobile Robot …

247

2.1 Kinematic Model Kinetic equation of Omni Robot is referenced from documents [1, 2]. The robot’s  T coordinate vector is defined as q = x y θ and the velocity vector on the world’s −

axis is the derivative of q . Through the following equation, the aforementioned −

velocity vector can be turned into the velocity on the robot’s axis ⎛

⎞ cos θ − sin θ 0 = ⎝ sin θ cos θ 0 ⎠v− = H v−  − 0 0 1

(1)

 T where:q = x y θ indicates the position and angular vector, v = [ vx v y ω ]T is −

the velocity of the robot’s axis.

2.2 Dynamic Model Based on [1], we have the robot’s dynamic model M v−˙ + Cv− + Gsign(v− ) = Bτ−

(2)

• M = diag(m, m, J ); • C = diag(Bx , B y , Bω ); • G = diag(C x , C y , Cω ); ⎛ √ √ √ √ ⎞ −√2r2 − √2r2 2r√2 √2r2 ⎟ • B=⎜ ⎝ 2r2 − 2r2 − 2r2 2r2 ⎠;

d r

d r

• τ = τ1 τ2 τ3 τ4

T

d r

d r

.

3 Control Law Design 3.1 Backstepping - Sliding Mode Control Law Following the proposal of the document [1] we define the state variables as follows: ⎧ ⎨ x− = q 1



⎩ x− = v− 2

(3)

248

M. N. Pham et al.

Combining (6) and (7), we have: ⎧ ⎨ x˙ = H x − − 1

2

(4)

⎩ M x−˙ + C x− + Gsign(x− ) = Bτ− 2

Step 1: Define z− = x− − x− 1

1

2

2

as tracking error vector with the desired trajectory 1d

x− . The virtual control signal is chosen as 1d

x− Define z− = x− − x− 2

2

z− = x− − x− 2

2

2d

2d

2d

= −H −1 (c1 z− − x−˙ )

(5)

1d

1

as the Backstepping error as, x−

as a virtual control signal. 2d

= H −1 x˙1 + H −1 (c1 z− − x−˙ ) = H −1 c1 z− + H −1 z−˙ 1d

1

1

(6) 1

The first Lyapunov candidate function: V1 =

1 T z z 2 −1 −1

(7)

Derivative of V1 V˙1 = −z− T c1 z− 1

(8) 1

with c1 is a diagonal positive definitive matrix, V˙1 ≤ 0. Step 2: Choosing a sliding surface .

s = λz 1 + H z 2 = (λ + c1 )z 1 + z 1

(9)

The sliding mode control is designed to guarantee the sliding surface approaches zero and thus the tracking error z− and its derivative z−˙ tend to zero. That leads to 1

1

the Backstepping error z− also tend to zero with λ = diag(λ1 , λ2 , λ3 ) is a positive 2

definitive matrix. The control signal comprises two elements: τ = τ eq + τ sw

(10)

where: - τ eq is the control signal element that keeps the system state on the sliding surface. - τ sw is the control signal element that leads the system state toward the sliding surface.

Adaptive Fuzzy Backstepping Sliding Mode Control for Omni Mobile Robot …

249

According to the above conditions, we have the formula of the general control signal: 

 . 

−1   −1  .

 . M H λz 1 + H z 2 − x2d − C x2 − Gsign x2 τeq = −B T B B T .

−1

M H −1 c2 sign(s) + c3 s τsw = −B T B B T (11)

Choose the second Lyapunov candidate function V2 =

1 T s s 2− −

(12)

with the computed control signal, we have the derivative of V2 V˙2 = −s− T c2 sign(s−) − s− T c3 s−

(13)

where c2 and c3 are chosen as diagonal positive definitive matrixes. That leads to V˙2 ≤ 0. In fact, there is no device to create the sign() function and the use of the sign() function will cause chattering phenomenal, which can demolish the system. So we replaced this function with the saturation amplifier function satlins() to resolve this problem.

3.2 Backstepping Sliding Mode Controller with Network Control System When the sensor device feedbacks the robot’s current positions data and the control signal is sent, the GSM handles all network connection problems between the controller and the remote system. Thus the robot will be controlled over via the communication network. These include typical network operations such as sending and receiving packets and other general middleware operations such as negotiation and resource reservation. The structure of the GSM is shown in Fig. 5. The GSM method estimates network traffic and controls the increase of the entire system with the feedback processor as shown in Fig. 3. 1) The feedback preprocessor waits for feedback data from the remote system. Once the feedback data arrives, the preprocessor processes the data using the current values of network variables and passes the preprocessed data to the controller. 2) The controller computes the control signals and sends them to the gain scheduler. 3) The gain scheduler modifies the controller output based on the current values of network variables and sends the updated control signals to the remote system.

250

M. N. Pham et al.

Fig. 5 Data transfers of direct structure

The GSM also has the ability to compensate for the delay when transmitting data through the network. We assume that at the time t = ti + τC R (i) the state feedback system received from the network is defined as x1 (ti + τC R (i)) which would be used for calculating the value of control signal. The control signal will be sent at the time t = ti , but it is affected by the delay time τC R (i). Thus, to deal with this issue, the system state x1 (ti + τC R (i)) is estimated based on x1 (ti − τ RC (i)). The estimated value of x1 (ti + τC R (i)) is defined as xˆ1 (ti + τC R (i)). For achieving that goal, we put focus on approximating the robot’s movement in [ti − τ RC (i), ti + τC R (i)] by using:  T x1 (i) = x1 (ti + τC R (i)) − x(ti − τ RC (i)) = x(i) y(i)  θ (i)

(14)

where: θ (i) = ω(i − 1)τ (i)

(15)

If ω(i − 1) = 0 : vx (i − 1) (sin θ (ti + τC R (i)) − sin θ (ti − τ RC (i))) ω(i − 1) vx (i − 1) (cos θ (ti + τC R (i)) − cos θ (ti − τ RC (i))) + ω(i − 1)

x(i) =

v y (i − 1) (cos θ (ti + τC R (i)) − cos θ (ti − τ RC (i))) ω(i − 1) v y (i − 1) (sin θ (ti + τC R (i)) − sin θ (ti − τ RC (i))) + ω(i − 1)

(16)

y(i) = −

(17)

If ω(i − 1) = 0: x(i) = (vx (i − 1)cosθ (ti − τ RC (i)) − v y sinθ (ti − τ RC (i)))τ (i) y(i) = (vx (i − 1) sin θ (ti − τ RC (i)) + v y cos θ (ti − τ RC (i)))τ (i)

(18) (19)

Adaptive Fuzzy Backstepping Sliding Mode Control for Omni Mobile Robot …

251

The delay time τ (i) = τC R (i) + τ RC (i) is estimated by the network traffic estimator part. After that, the estimated system state is computed: xˆ1 (ti + τC R (i)) = x1 (ti − τ RC (i)) + x1 (i)

(20)

3.3 Fuzzy Logic for Adjusting Control Parameters of Omni Robot Fuzzy control is a method that allows the construction of nonlinear controllers from heuristic information generated by the knowledge of experts. The Fuzzy block is responsible for processing the input signals and assign them a fuzzy value. The set of rules allows a linguistic description of the variables to be controlled and is based on the knowledge of the process. The inference mechanism is responsible for making an interpretation of the data taking into account the principles and their membership functions. With the defuzzy block, the fuzzy information coming from the inference mechanism is converted into non-fuzzy information that is useful for the process to be controlled. Hence, the design of a fuzzy controller for this work is presented. After conducting some simulations, the results point out that the performance of the control system considerably depends on the control parameter c3 = diag(C x, C y, Ct). Therefore, in this section, we proposed a Fuzzy logic law to enhance the tracking quality and an adaptive characteristic of the system. Table 1 describes the Fuzzy law with the input membership functions are shown in Fig. 6. The inputs of Fuzzy control law are tracking error vector and its time derivative. With the adaptive Fuzzy control law, the control parameter c3 can be optimized for the movement of the robot at any time on the orbit. Table 1 Fuzzy logic rule z˙ 1

z1 NB

NS

Z

PS

PB

NB

M

S

VS

S

M

NS

B

M

S

M

B

Z

VS

B

M

B

VS

PS

B

M

S

M

B

PB

M

S

VS

S

M

252

M. N. Pham et al.

Fig. 6 Fuzzy input membership functions

4 Simulation and Comparisons In this section, we demonstrate some simulation results to verify the effectiveness of the proposed controller. The parameters of the robot’s dynamic model are given as m = 20 kg; J = 10 kgm2 ; d = 0.3 m; r = 0.06 m. The reference trajectory is described by: xr = 10 cos(wt) yr = 10 sin(wt) θr = α The system is simulated under the effect of the delay time of the network. We consider the quality of the tracking controller without GSM, with GSM and GSM with Fuzzy control law. The trajectory of the Omni robot when using these controllers are presented in Fig. 7. It is clear that, with the impact of the delay time, the trajectory

Adaptive Fuzzy Backstepping Sliding Mode Control for Omni Mobile Robot …

253

Fig. 7 Trajectory of the robot

of the robot oscillates at the initial time when the robot is not on the orbit. At the time that the robot is on the orbit, the effect of the Backstepping Sliding Mode controller kept the robot on the tracking task but with some deviations from the desired trajectory. With GSM, the controller performs more smoothly when the robot approaching the reference trajectory and the tracking error is smaller than the controller without GSM. Moreover, when using the adaptive controller based on Fuzzy logic, the tracking quality of the system is considerably enhanced that the robot moves straightly to the orbit and tracks the desired orbit with the values of the tracking error vector approximate to zero. Figure 8 shows the output of the Fuzzy control law which are the optimal values for the control parameter while Fig. 9 depicts the tracking errors. According to these figures, the control gains are tuned by the Fuzzy-based adaptation schemes which result in the faster convergence of system states to the sliding surface. Therefore, the superior of this controller is obvious.

254

Fig. 8 Optimization of a parameter c3

Fig. 9 Tracking errors

M. N. Pham et al.

Adaptive Fuzzy Backstepping Sliding Mode Control for Omni Mobile Robot …

255

5 Conclusion In this paper, we have proposed an adaptive controller for the Omni robot by using Backstepping technique combined with Sliding mode control based on Fuzzy logic. The robot is controlled through the network and the GSM is designed to compensate the network delay time. According to the simulation results, we can see the effectiveness of the proposed control law. The Omni robot can track the reference trajectory fast with high accuracy and stability.

References 1. Thi, K.D.H., Nguyen, M.C., Vo, H.T., Tran, V.M., Nguyen, D.D., Bui, A.D.: Trajectory tracking control for four-wheeled omnidirectional mobile robot using Backstepping technique aggregated with sliding mode control. In: 2019 First International Symposium on Instrumentation, Control, Artificial Intelligence, and Robotics (ICA-SYMP), pp. 131–134 (2019) 2. Gao, Z., Yang, Y., Du, Y., Zhang, Y., Wang, Z.: Kinematic modeling and trajectory tracking control of a wheeled omni-directional mobile logistics platform. DEStech Trans. Eng. Technol. Res., no. apetc (2017) 3. Li, W., Yang, C., Jiang, Y., Liu, X., Su, C.-Y.: Motion planning for omnidirectional wheeled mobile robot by potential field method. J. Adv. Transp. 2017, 11 (2017) 4. Abiyev, R.H., Günsel, I.S., Akkaya, N., Aytac, E., Ça˘gman, A., Abizada, S.: Fuzzy control of omnidirectional robot. Procedia Comput. Sci. 120, 608–616 (2017) 5. Wang, C., Liu, X., Yang, X., Hu, F., Jiang, A., Yang, C.: Trajectory tracking of an omnidirectional wheeled mobile robot using a model predictive control strategy. Appl. Sci. 8(2), 231 (2018) 6. Santos, J., Conceiçao, A.G.S., Santos, T.L.M.: Trajectory tracking of omni-directional mobile robots via predictive control plus a filtered smith predictor. IFAC-PapersOnLine 50(1), 10250– 10255 (2017) 7. Ovalle, L., Ríos, H., Llama, M., Santibáñez, V., Dzul, A.: Omnidirectional mobile robot robust tracking: sliding-mode output-based control approaches. Control Eng. Pract. 85, 50–58 (2019) 8. Vieira, H.S., de Paiva, E.C., Moriguchi, S.K., Carvalho, J.R.H.: Unified Backstepping Sliding Mode Framework for Airship Control Design. arXiv Prepr. arXiv:1909.03143 (2019) 9. Truong, H.V.A., Tran, D.T., To, X.D., Ahn, K.K., Jin, M.: Adaptive fuzzy backstepping sliding mode control for a 3-DOF hydraulic manipulator with nonlinear disturbance observer for large payload variation. Appl. Sci. 9(16), 3290 (2019) 10. Jia, Z., Yu, J., Mei, Y., Chen, Y., Shen, Y., Ai, X.: Integral backstepping sliding mode control for quadrotor helicopter under external uncertain disturbances. Aerosp. Sci. Technol. 68, 299–307 (2017) 11. Liang, X., Wan, L., Blake, J.I.R., Shenoi, R.A., Townsend, N.: Path following of an underactuated AUV based on fuzzy backstepping sliding mode control. Int. J. Adv. Robot. Syst. 13(3), 122 (2016) 12. Mendez-Monroy, P.E., Dominguez, I.S., Bassam, A., Tzuc, O.M.: Control-scheduling codesign for ncs based fuzzy systems. Int. J. Comput. Commun. Control 13(2), 251–267 (2018) 13. Lin, Y., Wang, J., Han, Q.-L., Jarvis, D.: Distributed control of networked large-scale systems based on a scheduling middleware. In: IECON 2017-43rd Annual Conference of the IEEE Industrial Electronics Society, pp. 5523–5528 (2017)

Fuzzy Adaptive Dynamic Surface Control for Omnidirectional Robot Duyen Kim Thi Ha, Tien Manh Ngo, Minh Ngoc Pham, Vinh Quang Thai, Minh Xuan Phan, Dung Tien Pham, Dinh Duc Nguyen, and Hiep Quang Do

Abstract This paper proposes a new controller for the four-wheeled Omni robot (FWOR) based on dynamic surface control (DSC). DSC algorithm is constructed from sliding mode control and backstepping technique, it has advantages of both the above typical mechanism and avoids their drawbacks. The coefficients that affect speed toward zero of the sliding surface are adjusted by the Fuzzy logic controller. The fuzzy inference rules are established by the tracking error signal and its derivative. The effectiveness of the new algorithm is shown through simulations.

1 Introduction Nowadays, robotic systems are broadly used in human life. Especially, in industries, robots play a pivotal role in automating recurring manufacturing tasks. New trends towards Smart Factory and Industry 4.0, however, take a more product-driven approach and demand for more flexibility of the robotic systems. To achieve this target, the conventional wheel is replaced by the Omni wheel that can move simultaneously and independently in rotation and movement in the flat surface. Therefore, the issues of motion control of the Omni robot have significantly attracted the researchers in recent years. There are two types of Omni robots that are usually used, namely,

D. K. T. Ha Hanoi University of Industry, Hanoi, Vietnam e-mail: [email protected] T. M. Ngo (B) · M. N. Pham · V. Q. Thai Vietnam Academy of Science and Technology, Hanoi, Vietnam e-mail: [email protected] M. X. Phan · D. T. Pham · D. D. Nguyen Hanoi University of Science and Technology, Hanoi, Vietnam H. Q. Do The University of Economics-Technology for Industries, Hanoi, Vietnam © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_22

257

258

D. K. T. Duyen et al.

three-wheeled directional Omni robot and four-wheeled directional Omni robot. In this paper, the model of the four-wheeled robot will be presented. There are a huge number of studies that pay attention to control the motion of the Omni robot. In [1], Huang proposed a simple algorithm and the kinematic model of the four-wheeled directional Omni robot. However, in fact, due to the effect of numerous different disturbances of wheels, both dynamic models and kinematic models should be taken into account in order to increase the accuracy in [2, 3] but there has not been a combination of these models. Many studies of control law based on particular models were applied to the kinematics controller. An algorithm to calculate the optimal trajectory generation for a four-wheel omnidirectional vehicle was proposed in [4]. While in [5–7] some researchers proposed the popular algorithms is the classic PI control applied to the dynamic and kinematic model of the Four-Wheeled Omni-Directional Robot. A method based on the optimization of the trajectory to design the dynamics controller of the robot is presented in [8]. Recently, some studies investigate modern control using feedback control strategies for the nonlinear robot model. In [9], Chen proposed the sliding mode control based on a filter to evaluate the model uncertainties and the disturbances in the dynamic equations of the Omni robot. In [10], the tracking controller is designed by combining passive nonlinear controls, the proposed controller showed the effectiveness of the tracking with desired bounded velocities for an Omni robot, but the quality of the system is still not good. To improve the control performances, the backstepping technique – sliding mode control shows the effectiveness of the high accuracy tracking control of the Omni robot [11]. However, backstepping technique has a huge disadvantage that is an explosion of term and sensitive to disturbance, especially the complex system, they may reduce the performance of the system. From the aforementioned problems, D. Swaroop et al. proposed DSC algorithm in [12]. This method is not only inherited the advantages of both above mechanisms but also rejected the weaknesses of them. A low pass filter was added in DSC design that brings a significant effect in diminishing error in calculating and minimizing the amount of computation. Some researchers applied DSC to control nonlinear systems [13]. In [14], the authors proposed the adaptive controller by combining the DSC scheme with the Neural Network for Omni robot, however, with the constant gain controller result in the existence of overshoot and large settling time. Fuzzy logic control (FLC) is an existing tool to overcome the limits of conventional techniques by adopting adaptive schemes. In [15], Lee proposed the controller by combining the FLC with backstepping technique in the MiMo system. In [16, 17] the control performance can be improved since adjusting the parameters in the SMC controller by a fuzzy logic system. Inspired by the aforementioned studies, this paper proposed a new controller DSC technique – sliding mode control aggregated with the fuzzy logic for the FOWR. It is organized as follows. Section 2 designs the mathematical model of the Omni robot. A fuzzy logic adaptive dynamic surface control is designed in Sect. 3. Section 4 presents the simulation results of the proposed controller and the conclusion is shown in Sect. 5.

Fuzzy Adaptive Dynamic Surface Control ...

259

2 Mathematical Model of Omni Robot The Omni-directional mobile robot used for developing control design is shown in Fig. 1. The robot has four wheels that are 90° apart. Oxy represents the global coordinate axis, the distance between wheels and the robot center was defined as d. Meanwhile, the velocity, the normal velocity, and the angular velocity are vx , v y , and ω, respectively.

2.1 Kinematic Model To determine the robot’s movement, the robot’s coordinate vector is defined as q = −  T x y θ and the velocity vector on the fixed frame is the derivative of q . Through −

the following equation, the aforementioned velocity vector can be turned into the velocity on the robot’s frame. q˙ = H v− −

(1)

T  where v− = vx v y ω is the velocity on the robot’s axis, and H that demonstrates the relationship between q˙ and v− is shown as: −



⎞ cos θ −sin θ 0 H = ⎝ sin θ cos θ 0 ⎠ 0 0 1

Fig. 1 Four-wheeled Omni robot

260

D. K. T. Duyen et al.

2.2 Dynamic Model The dynamic model of FWOR can be expressed as follow:

M v−˙ + C q , q˙ v− + G q sign v− = Bτ− −





(2)

where: ⎛ ⎞

m 0 0 M q = ⎝ 0 m 0 ⎠ is the inertia matrix, − 0 0 J T  v− = vx vy ω is the velocity vector,

  C q , q˙ = diag Bx , B y , Bω is the matrix of Coriolis and centrifugal force, − −

  G q = diag C x , C y , Cω is the gravitational matrix, −

τ− is the input control vector that contains the torques of 4 wheel’s motors, ⎛ √ √ √ √ ⎞ −√2r2 − √2r2 2r√2 √2r2 ⎜ ⎟ B = ⎝ 2 − 2 − 2 2 ⎠ is the parameter matrix for the control signal vector. 2r d r

2r d r

2r d r

2r d r

3 Control Law Design 3.1 Dynamic Surface Control for FWOR The controller’s purpose is tracking the desired trajectory and anti-disturbance. The controller is generated by Dynamic Surface Control (DSC). DSC is constructed from the backstepping technique and multiple sliding mode methods. This algorithm is not only inherited the advantages of both the above typical mechanism but also rejected the weaknesses from them. In addition, a low pass filter in DSC’s design brought a significant effect in diminishing error in calculating and minimizing the amount of computation. This section focuses specifically on designing the DSC controller for FWOR. From (1) and (2) we obtain a fully mathematical model of FWOR as: ⎧ ⎨ q˙ = H v−

− ⎩ M v˙ + Cv + Gsign v = Bτ − − − −

(3)

Fuzzy Adaptive Dynamic Surface Control ...

261

Firstly, the vector z− = q − q is chosen to denote the tracking error where q −

1





d

d

is the desired trajectory. Lyapunov candidate function is chosen so that it has to be positive definite and the virtual control signal is computed to ensure its derivative has to be negative semi-definite: V1 = 21 z− T z− 1 1

By taking derivative V1 and select the virtual control as: α− r

−1 T c1 z− − q˙ , we obtain: V˙1 = −z− c1 z− ≤ 0 −H −

1

1

1d

=

1

where c1 is a diagonal positive definitive matrix Next, defining error vector as: z− = v− − α− = H

−1

2

q˙ + H

−1



c1 z− − q˙ = H −1 c1 z− + H −1 z−˙ −

1

1

1d

(4) 1

where α− is reference through filter τ α˙ + α = αr with time constant τ of the low-pass

filter and the initial value α(0) = αr (0). Next, the sliding surface is defined as:

s− = λz− + H z− = (λ + c1 )z− + z˙ 1 1

2

(5)

1

where λ = diag(λ1 , λ2 , λ3 ) is a positive definitive matrix. Taking derivative s−, we have: s−˙ = λ˙z− + H z−˙ + H˙ z− 1 2

2 ˙ = λ˙z− + H z− + H M −1 −Cv− − Gsign v− + Bτ− − α−˙ 1

(6)

2

The control signal τ− includes two components: τ−

will drive sliding surface to eq

zero and τ− will keep the surface at zero value. So control signal can be rewritten sw as: τ− = τ− From (6), we choose τ− , τ− eq

τ−

+ τ−

(7) sw

as: sw



−1 ˙ M H λ˙z− + H z− − α−˙ − Cv− − Gsign v− = −B (B B ) T −1

T

eq

eq

1

τ−

sw

(8)

2



= −B T (B B T )−1 M H −1 c2 sign s− + c3 s−

where c2 and c3 are diagonal positive definitive matrixes.

(9)

262

D. K. T. Duyen et al.

From (7), (8) and (9) we obtain control signal is shown as:



−1 ˙ M H λ˙z− + H z− − α−˙ − Cv − Gsign(v) τ− = −B (B B ) 2 1 −B T (B B T )−1 M H −1 c2 sign s− + c3 s− T

T −1

(10)

For ensuring the stability of the system, the following candidate Lyapunov function is taken into consideration: V2 = 21 s− T s− Differentiating the Lyapunov function V2 , we obtain: V˙2 = s− T s−˙

(11)

Replacing (6) into (11), V˙2 can be rewritten as:



T −1 ˙ ˙ −Cv− − Gsign(v) + Bτ− − α−˙ V2 = s− λ˙z− + H z− + H M 1

(12)

2

Using control signal from (10), we obtain: V˙2 = −s− T c2 sign s− − s− T c3 s−

(13)

Then, V˙2 ≤ 0 that satisfies the Lyapunov’s stability theory. Hence, the control signal (10) guarantees the stability of the system.

3.2 Fuzzy Adaptive Dynamic Surface Control for FWOR The robustness to the bounded variation of system’s parameters is the outstanding characteristic of the DSC controller, but this effect is only available when the state of the system is on the surface manifold. After conducting some simulations, the results show that the coefficients of virtual signal and control signal (c1 , c2 , c3 ) significantly impact the system’s performance. In every state, there are appropriate values for these parameters to achieve better system response. So In this section, we propose a Fuzzy logic law to achieve a suitable variety of this coefficient. The control diagram is shown in Fig. 2: The input signal of the fuzzy logic system (FLS) are errors between desired value and response value and their derivative (z 1 ; z˙ 1 ). The Fuzzy membership function of the input signal is shown as (Fig. 3, Fig. 4): The first output of FLS is a coefficient c1 of virtual signal with Fuzzy rule as (Table 1): The remaining output of FLS is coefficients of the control signal (c2 , c3 ) which is identical to each other. There are 25 rules in every table, the fuzzy rules are composed

Fuzzy Adaptive Dynamic Surface Control ...

263

z1 Fuzzy

dz1 dt

z1

c2 , c3

c1

qd



q

Sliding surface

Virtual signal

Omni robot

Control signal

Controller

Fig. 2 Structure of controller Fig. 3 Fuzzy membership function of z 1

-10

Fig. 4 Fuzzy membership function of z˙ 1

-25

NB

NS

-5

-0.01

NB

NS

-12

-0.06

Z

PS

PB

0

0.01

5

Z

PS

PB

0.06

12

0

10

25

Table 1 Fuzzy rule for c1 z˙ 1

z1 NB

NS

Z

PS

PB

NB

M

S

VS

S

M

NS

B

M

S

M

B

Z

VS

B

M

B

VS

PS

B

M

S

M

B

PB

M

S

VS

S

M

264

D. K. T. Duyen et al.

Table 2 Fuzzy rule for c2 , c3 z˙ 1

z1 NB

NS

Z

PS

PB

NB

M

B

VB

B

M

NS

S

M

B

M

S

Z

VB

S

M

S

VB

PS

S

M

B

M

S

PB

M

B

VB

M

M

  as: If z1 = Ai and z˙ 1 = Bj then the c1 (c2 , c3 ) is Cij Dij where Ai , Bj , Cij , Dij are corresponding to the Fuzzy subsets of z1 , z˙ 1 , c1 and c2 (c3 ). The Fuzzy rule is shown as following (Table 2):

4 Simulation In this section, we present a numerical simulation for the Omni robot model based on MATLAB/Simulink environment. To verify the performance of the proposed controller, the model system is affected by the external disturbance. It is shown in Fig. 5. The reference trajectory is chosen as a circular trajectory which is described by:

Fig. 5 The external disturbance

Fuzzy Adaptive Dynamic Surface Control ... Table 3 Parameters of the control system

265

Dynamic model m = 20 kg; J = 10 kgm2 ; d = 0.3 m; r = 0.0 6 m Generated path r0 = 10 m; 0 < t < 15 Dynamic surface control λ = diag(10, 10, 10); b = 25

X r = r0 cos(wt) Yr = r0 sin(wt) θr = α

(14)

in which, the parameters of the Omni robot and the controller are chosen in the following table (Table 3): The simulation results of the FDSC method in comparison with those of the DSC law are shown in Figs. 6, 7 and 8. These figures describe the actual motion of robot’s positions along with their reference trajectories (14), respectively (Fig. 7). According to the simulation results, it is obvious that both controller schemes ensure the quality of the Omni robot, in which the FDSC controller shows the better effectiveness of reducing the settling time of the robot motion in comparison with the DSC scheme. Specifically, the motion of the robot asymptotically approach the reference trajectory within approximately 0.2 s, meanwhile, the settling time when using the DSC law is nearly 0.6 s. The Fuzzy rules are designed to update the Fig. 6 The motion on X-axis

Fig. 7 The motion on Y-axis

266

D. K. T. Duyen et al.

Fig. 8 The angle of motion

parameters of sliding mode control for minimizing the reaching time and diminishing the tracking errors. Thus the proposed adaptive fuzzy logic controller ensures the high quality of the system with a small overshoot and performs much better performance in decreasing the oscillation at steady-state than the DSC controller using the fixed parameters controller. Figures 9, 10 and 11 represent the output of the fuzzy system, which are the control gains of the controller. It could be seen that the control gains are tuned by the fuzzy rule-based on adaptation schemes which results in the faster convergence of system states to the sliding surface. As can be seen from these above figures, both DSC and FDSC controllers solve the contradiction between the response system and the steady-state under the impact of the external disturbance. However, the FDSC controller can adjust the control Fig. 9 Coefficient c1

Fig. 10 Coefficient c2

Fuzzy Adaptive Dynamic Surface Control ...

267

Fig. 11 Coefficient c3

Fig. 12 The motion of the Omni robot with a circular trajectory

parameters to reduce the chattering phenomena on the sliding surface and combine with the DSC scheme to guarantee the robot can track rapidly the desired trajectory. Figure 12 presents the motion of the robot. It shows that the robot tracks the required trajectory with high accuracy.

5 Conclusion This paper proposed an adaptive controller for the Omni robot system based on the DSC control structure. The proposed controller that uses the Fuzzy logic to optimize the control parameters shows the effectiveness in solving the tracking control problem of the Omni robot and ensuring stability for the system undergone the impact of the external disturbance. The future investigation will focus on implementing the control algorithms in practical applications. Acknowledgements This research was funded by Project “Research, Design And Manufacturing Smart Human-Form IVASTBot Robot Applied In Communication And Serving Human” coded VAST01.01/20-21 implemented by the Institute of Physics, Vietnam Academy of Science and Technology.

268

D. K. T. Duyen et al.

References 1. Huang, L., Lim, Y., Li, D., Teoh, C.E.: Design and analysis of a four-wheel omnidirectional mobile robot. In: 2nd International Conference of Autonomous Robots and Agents, pp. 425–428 (2004) 2. de Oliveira, H.F.P., de Sousa, A.J.M., Moreira, A.P.G.M., da Costa, P.J.C.G.: Precise modeling of a four wheeled omni-directional robot. In: Proceedings of the 8th Conference on Autonomous Robot Systems and Competitions (2008) 3. Zeng, Z., Lu, H., Zheng, Z.: High-speed trajectory tracking based on model predictive control for omni-directional mobile robots. In: 2013 25th Chinese Control and Decision Conference (CCDC), pp. 3179–3184. IEEE (2013) 4. Purwin, O., D’Andrea, R.: Trajectory generation and control for four wheeled omnidirectional vehicles. Robot. Auton. Syst. 54(1), 13–22 (2006) 5. Ahmed, I., Al-Ammri, A.S.: Control of omni-directional mobile robot motion. Al-Khwarizmi Eng. J. 6(4), 1–9 (2010) 6. Wang, T.-Y., Tsai, C.-C., Wang, D.A.: Dynamic control of an omnidirectional mobile platform. J. Nan Kai 7, 9–18 (2010) 7. Wu, X.-B., Chen, Z., Chen, W.-B., Wang, W.-K.: Research on the design of educational robot with four-wheel omni-direction chassis. J. Comput. 29(4), 284–294 (2018) 8. Hashemi, E., Jadidi, M.G., Babarsad, O.B.: Trajectory planning optimization with dynamic modeling of four wheeled omni-directional mobile robots. In: 2009 IEEE International Symposium on Computational Intelligence in Robotics and Automation-(CIRA), pp. 272–277. IEEE (2009) 9. Chen, J., Wang, J., Ouyang, S., Yang, Y.J.C., Technologies, I.: Adaptive sliding mode control based on a filter for four-wheel omni-directional mobile robots. Cybern. Inf. Technol. 14(2), 140–153 (2014) 10. Peñaloza-Mejía, O., Márquez-Martínez, L.A., Alvarez, J., Villarreal-Cervantes, M.G., GarcíaHernández, R.: Motion control design for an omnidirectional mobile robot subject to velocity constraints. Math. Probl. Eng. (2015) 11. Cui, Q.Z., Li, X., Wang, X.K., Zhang, M.: Backstepping control design on the dynamics of the omni-directional mobile robot. In: Applied Mechanics and Materials, vol. 203, pp. 51–56. Trans Tech Publ (2012) 12. Swaroop, D., Hedrick, J.K., Yip, P.P., Gerdes, J.C.: Dynamic surface control for a class of nonlinear systems. IEEE Trans. Autom. Control 45(10), 1893–1899 (2000) 13. Wang, D., Huang, J.: Neural network-based adaptive dynamic surface control for a class of uncertain nonlinear systems in strict-feedback form. IEEE Trans. Neural Netw. 16(1), 195–202 (2005) 14. Kim, D.-H.T., Manh, T.-N., Van Bach, N.-P., Duc, T.-P.: Trajectory tracking control for omnidirectional mobile robots using direct adaptive neural network dynamic surface controller. In: 2019 First International Symposium on Instrumentation, Control, Artificial Intelligence, and Robotics (ICA-SYMP), pp. 127–130. IEEE (2019) 15. Lee, H.: Robust adaptive fuzzy control by backstepping for a class of MIMO nonlinear systems. IEEE Trans. Fuzzy Syst. 19(2), 265–275 (2010) 16. Petrehu¸s, P., Lendek, Z., Raica, P.: Fuzzy modeling and design for a 3D Crane. IFAC Proc. Vol. 46(20), 479–484 (2013) 17. Zhang, Q.-Y., Li, D., Pei, W.-J., Jia, Y.-M.: A TSK fuzzy model and adaptive slidingmode controller design for four-Mecanum-wheel omni-directional mobile free-bases. In: 2015 Chinese Automation Congress (CAC), pp. 1862–1867. IEEE (2015)

Traffic Light Control at Isolated Intersections in Case of Heterogeneous Traffic Phan Duy Hung and Do Thai Giang

Abstract Traffic is always a big problem in cities especially in Asian countries including Vietnam, Philippines, and India, etc. Traffic in these places is characterized by vehicles such as motorbikes, bicycles, cars, and buses while traveling on roads often without dedicated lanes. They do not follow the traffic lane and occupy any lateral position over the width of roadway depending on the availability of road space at a given instant of time. Such a transport system is called heterogeneous traffic. The issue of intelligent traffic light control thus attracts much attention. These include solutions such as controlling traffic lights in a grid, in a straight line (green wave) or controlling at intersections. This paper uses Fuzzy Logic to optimize traffic lights at an isolated intersection and in heterogeneous traffic conditions. The system is simulated on SUMO simulation software. The results of the application of fuzzy control algorithms have been remarkably effective compared to the use of fixed traffic lights. Keywords Fuzzy logic · Traffic light control · Heterogeneous traffic · Isolated intersection

1 Introduction Heterogeneous traffic is the most complex traffic model but is quite common in many countries such as Vietnam, Philippines, India, etc. The vehicles do not follow traffic lane and occupy any lateral position over the width of roadway depending on the availability of road space at a given instant of time. Low investment in the transport sector, poor transport planning and transport culture are the main causes of this. Overcoming these causes requires a long and not simple process. Increasing P. D. Hung (B) · D. T. Giang FPT University, Hanoi, Vietnam e-mail: [email protected] D. T. Giang e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_23

269

270

P. D. Hung and D. T. Giang

urbanization leads to a rapid increase in vehicle density. Consequently, the number of traffic accidents also increased, traffic congestion occurred frequently. With the strong development and application of information and communication technology in the transport sector, there have been many smart solutions given such as intelligent traffic light control system [1–9], self-driving car [10], traffic accident warning system [11], traffic congestion warning system [12], bad and rough road surface warning system [13], etc. With the problem of intelligent traffic light control, it is possible to mention solutions to control traffic lights on large areas such as cities, for a grid of points, for green waves or at an intersection. Control of traffic lights is basically to determine the number of light phases and the time of green, red and yellow lights. The method currently used at present is based on several traffic statistics and experience of the manager and operator of the transport system to calculate light phases, light cycles [14]. These parameters will, of course, be fixed. This is unreasonable, for example, during peak hours, the traffic volume will be higher than off-peak hours, or during holidays, in big cities, the traffic volume will decrease, etc. Therefore, traffic light cycles should be flexibly changed accordingly. Smart traffic light control means changing the time of green and red lights dynamically over time. Control algorithms will be based on inputs such as traffic density, queue length, peak time, off-peak time, weather, population distribution, etc. to calculate the light cycles. Typical studies and applications around the world for intelligent traffic light control problems are summarized below. Razavi et al. (2019) present a new method for traffic light control using the combination of IoT and image and video processing techniques [1]. In this model, traffic light scheduling is determined based on the density and the number of passing vehicles. Moreover, it is implemented by Raspberry-Pi board and OpenCV tool. The analytical and experimental results indicate the efficiency provided by the proposed models in intelligent traffic management. Pandey et al. (2018) in [2] provide a method for dynamically switching traffic lights according to the traffic on all sides of the intersection. For this purpose, the images of vehicles on intersection lanes are extracted from the traffic videos and then trained using Haar-cascade classifier. The benefits of this would be minimizing the average waiting time for all the vehicles. Luo et al. (2019) in [3] explore the application of Cell transmission model (CTM) to develop control strategies for dispersion accident-induced traffic jams and evaluate the efficiency of these strategies. This work focuses on using the developed Timed Petri nets (TPNs) to model variable traffic light control systems. Its advantage is that the traffic light behavior is clearly represented in terms of conditions and events, which result in changes in the pre-emption phase. For preventing traffic congestion, Qi et al. (2016) design an emergency traffic light strategy at signalized traffic intersections [4]. Some additional warning lights are used. The emergency strategy is realized by the cooperation of traffic lights and warning lights. It contains a ban signal strategy and a warning signal strategy. A two-way rectangular grid network is modeled via the cell transmission model. The effectiveness of the proposed strategy is evaluated through simulations in the modeled grid network. Simulation results reveal the influence of route changing behaviors of

Traffic Light Control at Isolated Intersections …

271

drivers, the time when to operate the emergency strategy, and the inflow of the traffic network. They can be used to improve the latest systems in preventing urban road traffic congestion caused by incidents. In [5], Natafgi et al. (2018) implement an adaptive traffic light system using reinforcement learning and tested using real data from Lebanese traffic. For training and testing the system, a software simulation tool is used. This tool can simulate the traffic intersection and allows the neural network to interact with it. An adaptive Traffic Light Control System Using Wireless Sensors Networks is also described by Rida et al. (2018) in [6]. They propose an adaptive traffic light control method for an isolated intersection that considers a number of traffic factors such as traffic volume and the waiting time. They give priority to the shortest queue and they show that this method reduces waiting time. Wu et al. (2018) in [7] introduce an Intelligent Control System of Traffic Light Based on Fog Computing. It calculates and shares the traffic flux situation at the intersection and the surrounding intersections through fog computing platform. Regarding the traffic flow at the intersection and the traffic flow at the surrounding intersections as the parameters, the intelligent control algorithm of traffic light is designed for achieving mutual coordination and mutual influence between different intersections, so that the traffic efficiency of each intersection is improved and the traffic flow of the entire transport network is alleviated. The simulation results showed this intelligent control system improves the traffic efficiency of every intersection and relieves the traffic flow of the whole transport network. Traffic controllers using fuzzy inference rules are also used in some studies such as one conducted by Mir et al. (2018) [8]. The neural traffic controller can calculate the green light time duration of the traffic signal. The time duration is calculated based on the information showing the number of cars on the two lanes where the signal is green i.e. Arrival and the number of cars that are waiting on the remaining lanes which have red signal i.e. Queue. The system will use fuzzy inference rule based strategy to train the neural network from the input data of Arrival and Queue and output data of Time for green light signal. MATLAB simulations are used to present the validation of the test data results. Also uses Fuzzy logic but with other inputs, Nurlayli et al. (2018) in [9] could determine the levels of the congestion, CO, and green light duration of traffic light based on certain classification in order to get an optimal decision for the duration of the traffic light. The simulation design has been validated by experts from the Department of Transportation, Malang City, the results show that the simulation design is valid and worthy to be implemented for determining the duration of the traffic light. The above section shows that there are many studies and applications in the world. However, most applications have not moved beyond the test level and are limited to some specified locations. Large-scale solutions often incur very expensive costs and the deployment of this solution proves to be difficult. This problem involves collecting and analyzing input data from many cameras, from sensor networks, or data collected from mobile devices, etc. When implementing the control system in practice, this solution often requires coordination with many stakeholders to operate the test system and the old system in parallel while ensuring smooth traffic condition.

272

P. D. Hung and D. T. Giang

Among the mentioned smart traffic light control solutions, control at an isolated intersection despite being the smallest in size is effective in many places and it is also the easiest method to deploy. Previous research have also pointed out a gap in studying heterogeneous traffic patterns like in Vietnam. This paper will, therefore, focuses on studying the control of traffic lights at an intersection considering two cases of junctions and crossroads. This paper will simulate the problem in a heterogeneous traffic model and apply fuzzy control algorithms, measure statistical results to prove the effectiveness of the solution. The remainder of the paper is organized as follows. Section 2 describes the simulation design. The algorithm and experiments are presented in Sect. 3. Part 4 is the statistical results and analysis of the effectiveness of the proposed solution. Finally, conclusions and perspectives are made in Sect. 5.

2 Simulation Design Currently, there are several variations of traffic simulation software, such as SUMO, MATSim, MITSIMlab, AIMSUN, CORSIM, Paramics, SimTraffic, VISSIM, and TRANSIMS, etc. [15]. This study chooses to use SUMO open-source software for potential longterm development since the software itself receives a lot of support from the community, especially it has the ability to simulate heterogeneous traffic [16]. SUMO (Simulation of Urban Mobility) emulates microscopic traffic situations. SUMO software allows to create and simulate traffic objects such as transport means, intersections, roads, traffic lights with the full range of behavior and status of objects. Objects are also linked to form a specific traffic situation. An important feature is that the SUMO software allows configuration for each type of vehicle, including different types of buses, cars, motorbikes, and bicycles, and the behavior of each vehicle is different. Besides, the lanes are subdivided into sub lanes, resulting in a lane that may have multiple vehicles traveling side by side. A vehicle also does not have to move on a single lane, but it can operate on both lanes. These features support the heterogeneous transport system design. This paper is designed to simulate two types of intersections: junction (Fig. 1) and crossroad (Fig. 2). Some assumptions are made in simulated situations as follows: – Isolation intersection means that the vehicles will pass through the intersection in all directions and the traffic volume here will not be affected nor will it affect the surrounding intersections. – Traffic light system is considered 2-phases type. When the east-west direction is green, the south-north direction will be a red light and vice versa. – The east-west traffic direction is the main traffic direction, and the traffic volume is larger than that of south-north direction. – The fuzzy controller system uses inputs as the number of vehicles in the queue of east-west and south-north directions (Queue and Arrival).

Traffic Light Control at Isolated Intersections …

273

Fig. 1 Simulation for junctions

Fig. 2 Simulation for crossroads

Table 1 Configuration for vehicles Bus

Car

Motorbike

Bike

Max speed (m/s)

10

12

10

3

Probability (%)

2

40

53

5

Configuration parameters for vehicles including maximum speed and probability for emitting a vehicle each second as seen in Table 1: The number of vehicles each hour in different directions is described in Table 2:

274 Table 2 Number of vehicles each hour

P. D. Hung and D. T. Giang East

West

North

South

1857

1909

448

396

3 Fuzzy Logic Traffic Light Controller 3.1 Variables and Membership Functions The control system uses two inputs, Queue and Arrival. These values are the number of vehicles in different directions at the end of a phase to determine if the light phase should be extended. Each input has 5 values corresponding to 5 membership functions [17]. The output is the extension time of the light phase based on the binding between the inputs. The output has 4 values corresponding to 4 membership functions. Input and output variables and their values are summarized in Table 3. Membership functions are described in Figs. 3, 4, and 5.

3.2 Fuzzy Rule Set Fuzzy logic uses a mechanism to produce results similar to that of human inference. People often put a series of rules to bind assumptions [17]. Table 3 Variables and their values

Input

Output

Queue

Arrival

Extension time

V.low

V.low

Zero

Low

Low

Short

Medium

Medium

Medium

High

High

Long

V.high

V.high

Fig. 3 Membership function of Queue variable

Traffic Light Control at Isolated Intersections …

275

Fig. 4 Membership function of Arrival variable

Fig. 5 Membership function of Extension Time variable

A simple example can be seen in the control method of traffic police. When a traffic policeman looks and estimates that the traffic volume from the East-West is much larger than the traffic volume from the North-South direction, he will extend the green light of the East-West direction to handle that traffic situation. The Fuzzy Logic controller also implements a similar reasoning mechanism through a set of rules. This paper uses a set of 25 rules listed in Fig. 6.

3.3 Defuzzification Defuzzification is the process of producing a quantifiable result in Crisp logic, given fuzzy sets and corresponding membership degrees. It is the process that maps a fuzzy set to a crisp set [17]. This paper uses the defuzzification technique by calculating the center of gravity. First, the results of the rules must be added together in some way to form trapezoids. Then, the centroid of this shape, called the fuzzy centroid, is calculated. The x coordinate of the centroid is the defuzzified value (Fig. 7).

276 rule1 rule2 rule3 rule4 rule5

P. D. Hung and D. T. Giang = = = = =

Rule(arrival['v.low'] Rule(arrival['low'] Rule(arrival['medium'] Rule(arrival['high'] Rule(arrival['v.high']

& & & & &

queue['v.low'], queue['v.low'], queue['v.low'], queue['v.low'], queue['v.low'],

extensionTime['zero']) extensionTime['short']) extensionTime['medium']) extensionTime['long']) extensionTime['long'])

rule6 = Rule(arrival['v.low'] & rule7 = Rule(arrival['low'] & rule8 = Rule(arrival['medium'] & rule9 = Rule(arrival['high'] & rule10 = Rule(arrival['v.high']&

queue['low'], queue['low'], queue['low'], queue['low'], queue['low'],

extensionTime['zero']) extensionTime['zero']) extensionTime['medium']) extensionTime['medium']) extensionTime['long'])

rule11 rule12 rule13 rule14 rule15

= = = = =

Rule(arrival['v.low'] & Rule(arrival['low'] & Rule(arrival['medium']& Rule(arrival['high'] & Rule(arrival['v.high']&

queue['medium'], queue['medium'], queue['medium'], queue['medium'], queue['medium'],

rule16 rule17 rule18 rule19 rule20

= = = = =

Rule(arrival['v.low'] & Rule(arrival['low'] & Rule(arrival['medium']& Rule(arrival['high'] & Rule(arrival['v.high']&

queue['high'], queue['high'], queue['high'], queue['high'], queue['high'],

rule21 rule22 rule23 rule24 rule25

= = = = =

Rule(arrival['v.low'] & Rule(arrival['low'] & Rule(arrival['medium']& Rule(arrival['high'] & Rule(arrival['v.high']&

queue['v.high'], queue['v.high'], queue['v.high'], queue['v.high'], queue['v.high'],

extensionTime['zero']) extensionTime['zero']) extensionTime['short']) extensionTime['medium']) extensionTime['long'])

extensionTime['zero']) extensionTime['zero']) extensionTime['short']) extensionTime['short']) extensionTime['long']) extensionTime['zero']) extensionTime['zero']) extensionTime['zero']) extensionTime['short']) extensionTime['medium'])

Fig. 6 Membership function of Extension Time variable

Fig. 7 Interpreting the defuzzification

centroid

3.4 Results and Analysis With the simulation configuration designed as in Tables 1 and 2, the data before and after applying the control algorithm including the number of vehicles in the queue and total waiting time calculated over 1 min of simulation.

Traffic Light Control at Isolated Intersections …

277

Fig. 8 Number of vehicles in the queue when simulating with a junction

Fig. 9 Waiting time when simulating with a junction

For the intersection model of the intersection, the data show that when using fuzzy logic controller, the traffic condition has been significantly improved. This result can be seen in Figs. 8 and 9, most of the time, the graphs of the total waiting time and the number of vehicles in the queue when using fuzzy logic controller are lower than the corresponding graphs when using fixed cycle lights. The waiting time and the number of vehicles in the queue decrease mean that it is effective when applying the fuzzy control to the problem. The separation and distance between the graphs in Figs. 10 and 11 show that the benefits of control algorithms, when applied to a crossroads, are stronger when applying that method at a junction. The reason is that at a crossroad there are more directions than at a junction, resulting in more conflicts between vehicles leading to more congestion at the junction. With the data configured and simulation run in the model of an intersection without the controller, after 20 min of simulation, the

278

P. D. Hung and D. T. Giang

Fig. 10 Number of vehicles in the queue when simulating with a crossroad

Fig. 11 Waiting time when simulating with a crossroad

intersection becomes congested. The waiting time and the number of vehicles in the queue then increased dramatically. The same simulated condition, when applying the fuzzy logic controller, the traffic situation is completely stable and no congestion occurs.

4 Conclusion and Perspectives The paper proposed the application of fuzzy control for controlling an isolated intersection in heterogeneous traffic conditions. In two cases, a junction and a crossroad, the fuzzy control algorithm gives positive results. The number of inputs of the algorithm is minimized and can be calculated from several tools such as cameras, magnetic rings, subscriber density, etc. ensuring high feasibility of the solution.

Traffic Light Control at Isolated Intersections …

279

The results of the problem can extend to grid problems or green waves. However, the scale expansion will have very high complexity and there are many possible issues to be studied. The Fuzzy logic control method can also be applied to problems such as controlling and forecasting.

References 1. Razavi, M., Hamidkhani, M., Sadeghi, R.: Smart traffic light scheduling in smart city using image and video processing. In: 3rd International Conference on Internet of Things and Applications (IoT), pp. 1–4. Isfahan, Iran (2019). https://doi.org/10.1109/iicita.2019.8808836 2. Pandey, K., Jalan, P.: An approach for optimizing the average waiting time for vehicles at the traffic intersection. In: fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 30–35. Solan Himachal Pradesh, India (2018). https://doi.org/10. 1109/pdgc.2018.8745757 3. Luo, J., Huang, Y., Weng, Y.: Design of Variable Traffic Light Control Systems for Preventing Two-Way Grid Network Traffic Jams Using Timed Petri Nets. IEEE Trans. Intell. Transp. Syst. (2019). https://doi.org/10.1109/TITS.2019.2925824 4. Qi, L., Zhou, M., Luan, W.: An emergency traffic light strategy to prevent traffic congestion. In: IEEE 13th International Conference on Networking, Sensing, and Control (ICNSC), pp. 1–6. Mexico City (2016). https://doi.org/10.1109/icnsc.2016.7479013 5. Natafgi MB, Osman M, Haidar AS, Hamandi L (2018) Smart Traffic Light System Using Machine Learning. In: IEEE International Multidisciplinary Conference on Engineering Technology (IMCET), Beirut, 2018, pp. 1–6. https://doi.org/10.1109/imcet.2018.8603041 6. Rida, N., Ouadoud, M., Hasbi, A., Chebli, S.: Adaptive traffic light control system using wireless sensors networks. In: IEEE 5th International Congress on Information Science and Technology (CiSt), pp. 552–556. Marrakech (2018). https://doi.org/10.1109/cist.2018.8596620 7. Wu, Q., He, F., Fan, X.: The intelligent control system of traffic light based on fog computing. Chin. J. Electron. 27(6), 1265–1270 (2018). https://doi.org/10.1049/cje.2018.09.015 8. Mir, A., Hassan, A.: Fuzzy inference rule based neural traffic light controller. In: IEEE International Conference on Mechatronics and Automation (ICMA), pp. 816–820. Changchun (2018). https://doi.org/10.1109/icma.2018.8484382 9. Nurlayli, A., Alqodri, F., Sakkinah, I. S.: Design of fuzzy simulation for determining the duration of traffic light based on vehicle density level and carbon monoxide level. In: 4th International Conference on Science and Technology (ICST), pp. 1–6. Yogyakarta (2018). https://doi.org/10.1109/icstc.2018.8528671 10. Koilias, A., Mousas, C„ Rekabdar, B„ Anagnostopoulos, C., Passenger anxiety when seated in a virtual reality self-driving car. In: IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 1024–1025. Osaka, Japan, (2019). https://doi.org/10.1109/vr.2019.8798084 11. Ren, H., Song, Y., Wang, J., Hu, Y., Lei, J.: A deep learning approach to the citywide traffic accident risk prediction. In: 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 3346–3351. Maui, HI (2018). https://doi.org/10.1109/itsc.2018.8569437 12. Deshmukh, S.M., Savant, B.N.: Traffic congestion alerting system. In: International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 1–5. Pune (2016). https://doi.org/10.1109/iccubea.2016.7860110 13. Taniguchi, Y., Hisamatsu, H.: A study on road surface condition monitoring system using bicycle-mounted grid laser light. In: 7th International Conference on Intelligent Systems, Modelling and Simulation (ISMS), pp. 356–359. Bangkok (2016). https://doi.org/10.1109/ isms.2016.23

280

P. D. Hung and D. T. Giang

14. Webster, F.V.:. Traffic signal settings. Road Research Technical Paper No. 39. London: Great Britain Road Research Laboratory(1958) 15. Saidallah, M., Fergougui AE et al. A Comparative Study of Urban Road Traffic Simulators. In: MATEC Web of Conferences, vol. 81, p. 05002. (2016). https://doi.org/10.1051/matecconf/ 20168105002 16. Eclipse SUMO, https://www.dlr.de/ts/en. Accessed 14 Aug 2019 17. Design and simulate fuzzy logic systems, https://ch.mathworks.com/products/fuzzy-logic. html, Accessed 14 Aug 2019

Block Difference of Inverse Probabilities Features for Chromoendoscopy Image Classification Viet Dung Nguyen, Thanh Hien Truong, Ha Anh Pho, and Le Thu Thao Dao

Abstract Gastric or stomach cancer is one of the most common cancers in the world. It used to be the leading cause of cancer deaths before 1980s. Endoscopy is a less invasive method to screen gastric cancer than biopsy. In chromoendoscopy, one of endoscopy improvements, by spraying dyes over mucosal surface, abnormal regions are made more prominent visually. However, detection and classification of abnormal regions are not so easy tasks. Accuracy depends largerly on experience of doctors, physical status of doctors, and illumination variations. Nowaday, with computeraided diagnosis (CAD) systems, gastric cancer can be detected and classified into different stages. In this paper, we propose using Block Difference of Inverse Probabilities (BDIP) and Support Vector Machine (SVM) to build an automatic and accurate yet simple classification algorithm for identifing whether a chromoendoscopy (CH) image is abnormal or not. Experimental results show that the proposed method has a classification accuracy of 87.3% and an area under the curve (AUC) value of 0.92 on the CH imageset obtained using an Olympus CV-180 endoscope at the Portuguese Institute of Oncology (IPO) Hospital in Porto, Portugal. Keywords Gastric cancer · Chromoendoscopy · BDIP · Classification

V. D. Nguyen (B) · T. H. Truong · H. A. Pho · L. T. T. Dao Hanoi University of Science and Technology, Hanoi, Vietnam e-mail: [email protected] T. H. Truong e-mail: [email protected] H. A. Pho e-mail: [email protected] L. T. T. Dao e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_24

281

282

V. D. Nguyen et al.

1 Introduction Stomach is a muscular organ located on the left side of the upper abdomen. The stomach receives food, secretes acid and enzymes to aid in the digestion of food. The global cancer burden has risen to 18.1 million cases and 9.6 million cancer deaths [1]. Stomach cancer is the 5th in new cancer cases (5.7% of all new—1.8 million) and the 3th in cancer deaths (8.2%—783000). During the last century, there many medical imaging technologies for cancer diagnosis have been being evolved, especially endoscopy. With these images of the tissues, physicians are able to detect cancer in the gastrointestinal track with great accuracy and to distinguish different stages of the disease. Though several limitations are existing such as dependence of the image features, time for physicians, and the experience of physicians for diagnosing. Therefore, designing an assisted tool for the recognition of gastric abnormalities having acceptable accuracy and as-fast-as possible speed is necessary. In this study, an accurate yet simple algorithm to classify chromoendoscopy (CH) images is proposed. First, Block Difference of Inverse Probabilities (BDIP) features are extracted as descriptors for CH images. Support Vector Machine (SVM) then classifies the CH images into normal or abnormal one based on those BDIP features.

2 Related Works Different feature extraction methods have been developed for the automatic examination of GI diseases, as shown in Table 1. In [2], different variations in magnified endoscopy images are described by GLDM (Gray Level Difference Matrix) texture descriptors. To administer shift and rotation invariances, combination of Gabor filter banks and autocorrelation matrices was used to describe CH images [3]. A semantic analysis in the feature space applied to confocal laser endomicroscopy images was proposed in [4]. In [5], local constrained SIFT (Scale-Invariant Feature Transform) descriptors was suggested for magnified endoscopy images. In [6], uniform LBP (Local Binary Pattern) features were chosen as descriptors for wireless capsule Table 1 Survey of related works Approaches

Dataset

Accuracy (%)

Sen

Spec

GLDM [2]

Magnified endoscopy images







Gabor [3]

Chromoendoscopy images

78.00





Semantic space [4]

Colonscopy images

82.00

0.95

0.48

SIFT [5]

Magnification endoscopy images

81.70





LBP [6]

Wireless capsule endoscopy images

93.16





Gabor wavelet [7]

Chromoendoscopy images

90.0

SURF [8]

Chromoendoscopy images

85





Block Difference of Inverse Probabilities Features ...

283

endoscopy images. In [7], CH image was represented by a set of homogenous texture descriptors based on Gabor wavelets. Recently, SURF (Speed Up Robust Features) descriptors were used to classify CH as in [8].

3 Proposed Method Block diagram of the proposed method that uses BDIP (Block Difference of Inverse Probabilities) features as descriptors for the CH image is described in Fig. 1. Brief introduction of each step is as below:

3.1 BDIP Features BDIP uses the local probabilities in image blocks to measure variation of brightness [9]. It is defined as the difference between the number of pixels in a block and the ratio of sum of pixels in the block to the maximum value in the block.  BDI P = M − 2

(i, j)∈B

I (i, j)

max I (i, j)

.

(1)

(i, j)∈B

where I(i, j) denotes the intensity of a pixel (i, j) in a block B of size M × M. In BDIP images, the insides of objects and the backgrounds are shown bright, while edges and valleys are shown dark (Fig. 2). BDIP has being proven as a good descriptor for medical images [10].

Fig. 1 Proposed method’s block diagram

284

V. D. Nguyen et al.

Fig. 2 a Original images and b their corresponding BDIP images [8]

3.2 SVM Classifier SVM is a tool that fits a binary classification model to the training data, by defining a hyperplane in the feature space that bt separates the features of different classes [11]. With a set of training data, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is represented as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall. SVM have been extensively used in machine learning applications, for image and signal classification.

4 Results and Discussion 4.1 Dataset The proposed method is evaluated on the CH images of Olympus GIFH180 endoscope database at the Portuguese Institute of Oncology (IPO) Porto. The CH images obtained from multiple patients suffering from normal and abnormal gastric conditions. The dataset contained 176 CH images including 56 normal and 120 abnormal images. The images were saved as graphics files of type PNG (Portable Network Graphics) with resolution of 518 × 481 pixels [12].

Block Difference of Inverse Probabilities Features ...

285

4.2 Implementation The original dataset is randomly divided into training and test subset with different training:test ratios. It is important to keep normal to abnormal image proportion in both subsets as in the original dataset. At each training:test ratio, different schemes of calculating BDIP descriptors (block size, overlapping or not) are assessed. It is the fact that, whenever new training and test subset are selected randomly, classification results will vary. Therefore, average results given in Sect. 4.3 are archived by iterating data selection 100 times at each training:test ratio. All the experiments were implemented using Matlab and conducted on Intel Core i5 2.4 GHz. There only one preprocessing step is to convert the original images into grayscale images.

4.3 Results First of all, effect of block size to classification result is evalutated. The BDIP block size is chosen as square in accordance with original almost square image. The size of the block ranges from 2 × 2 to 9 × 9. Given in Table 2 are average accuracies and corresponding average AUCs by SVM classifier with different BDIP block sizes. The highest accuracy is 87.3%. It is obtained with block size of 7 × 7 and outperformed the others. Meanwhile, two worst accuracies are 81.1 and 82.0% with block size of 2 × 2 and 3 × 3 respectively. It is probably image blocks having small block sizes tend to be homogenous and their BDIP values approach to unity. Hence, classification accuracy is not as high as those of larger block sizes. However, taking AUC values into consideration, it is obvious that changing block size does not cause AUC value change much. The highest AUC value is 0.923 with block size 7 × 7. Therefore, the BDIP block size is set as 7 × 7. Once the block size is set, next step is to determine number of overlapping pixels or overlapping level between consecutive blocks. Selection block size as 7 × 7 means that number of overlapping pixels can only vary from 0 to 6 pixels. Depicted Table 2 Classification results with different block sizes (training:test ratio as 70:30)

Size of block

Accuracy (%)

AUC

2

82.0

0.873

3

81.1

0.874

4

83.1

0.905

5

83.8

0.904

6

83.8

0.903

7

87.3

0.923

8

84.6

0.906

9

85.4

0.909

286

V. D. Nguyen et al.

in Table 3 are classification results corresponding with different overlapping levels. Training:test ratio is still set at 70:30. The highest accuracy is 87.3% in case of 0 overlaping pixel or non overlapping while the second highest accuracy 86.3% is with 2 overlapping pixels. In term of AUC, non overlapping scheme provides highest AUC of 0.923. However, all AUC values are quite similar. Apparently, the higher number of overlapping pixels is, the larger number of extracted BDIP features is. Consequently, 0 overlaping pixel or non overlapping scheme is chosen. Table 4 illustrates the evaluation results when changing training:test ratio while keeping non-overlapping block size as 7 × 7. The best accuracy as well as AUC are achieved with training:test ratio of 70:30. That is why training:test ratio is selected as 70:30 when assessing effect of BDIP block size. Comparison of the proposed method to other approaches utilizing same IPO dataset is shown in Table 5. It is worth to note that in [6], 10-fold-cross-validation was used. Test conditions in Sect. 4.2 are the same as in [7]. Procedure in all methods are repeated 100 times. The AUC value of 0.92 is comparable to 0.93 as in [6], one of the state-of-art methods in the field. Contrary to AUC, the obtained accuracy of 87.3% is lower than 90% in [6]. The relative difference is about 3 percentages. Nevertheless (90%, 0.93) was gained with feature selection by Genetic Algorithm. Without feature selection, the accuracy and AUC was 86.1% and 0.9 respectively. Obviously, the proposed method (no feature selection is applied) is better than [6] in this instance. It is aslo apperent that extracting BDIP features in Eq. (1) is less complex than calculating Gabor wavelet based features in [6]. All the results provides us an insight of how well the BDIP features represent the CH images. Table 3 Classification results when changing overlapping pixels (block size of 7 × 7)

Table 4 Classification results when changing training:test ratio (block size of 7 × 7, non-overlapping)

Number of overlaping pixels

Accuracy (%)

AUC

0

87.3

0.923

1

82.7

0.914

2

86.1

0.917

3

84.9

0.912

4

85.1

0.911

5

85.3

0.915

6

83.9

0.901

Training:test ratio

Accuracy (%)

AUC

30:70

85.2

0.912

40:60

86.0

0.915

50:50

86.4

0.920

60:40

87.1

0.921

70:30

87.3

0.923

Block Difference of Inverse Probabilities Features ...

287

Table 5 Comparision with other approaches on same IPO dataset Feature + Classifier

Accuracy (%)

AUC

Note

Gabor + SVM [7]

86.1

0.90

Without feature selection With feature selection

90.0

0.93

SURF + SVM [8]

81.5

0.92

Proposed method (BDIP + SVM)

87.3

0.92

5 Conclusion In this presented work, the BDIP texture descriptor exhibites an excellent discrimination performance on the CH images. However, colors, important visual cues of CH image, was not investigated in the proposed method. Therefore, it would be attractive to incorperate visual information to textures for CH image representation. In the near future, it would be generalized by using other endoscopic images obtained from diverse endoscopic imaging modalities under several unhealthy conditions.

References 1. New Global Cancer Data: GLOBOCAN (2018). https://www.uicc.org/new-global-cancer-dataglobocan-2018 2. Onji, K., Yoshida, S., Tanaka, S., Kawase, R., Takemura, Y., Oka, S., Tamaki, T., Raytchev, B., Kaneda, K., Yoshihara, M., Chayama, K.: Quantitative analysis of colorectal lesions observed on magnified endoscopy images. J. Gastroenterol. 46, 1382–1390 (2011) 3. Riaz, F., Silva, F., Ribeiro, M., Coimbra, M.: Invariant Gabor texture descriptors for classification of gastro enterology images. IEEE Trans. Biomed. Eng. 59(10), 2893–2904 (2012) 4. Kwitt, R., Vasconcelos, N., Rasiwasia, N., Uhl, A., Davis, B., Hafner, M., Wrba, F.: Endoscopic image analysis in semantic space. Med. Image Anal. 16(7), 1415–1422 (2012) 5. Sousa, R., Moura, D.C., Dinis-Ribeiro, M., Coimbra, M.T.: Local self similar descriptors: comparison and application to gastroenterology ımages. In: 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (2014) 6. Constantinescu, A.F., Ionescu, M., Rogoveanu, I., Ciurea, M.E., Streba, C.T., Iovanescu, V.F., Artene, S.A., Vere, C.C.: Analysis of wireless capsule endoscopy images using local binary patterns. Appl. Med. Inf. 36(2), 31–42 (2015) 7. Ali, H., Sharif, M., Yasmin, M., Rehmani, M.H.: Computer-based classification of chromoendoscopy images using homogeneous texture descriptors. Comput. Biol. Med. 88(1), 84–92 (2017) 8. Nguyen, V.D., Truong, T.H.: Speeded-up robust feature descriptor for endochromoscopy images. In: 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC 2019) (2019) 9. So, H.J., Kim, M.H., Kim, N.C.: Texture classification using wavelet domain BDIP and BVLC features. In: 17th European Signal Processing Conference (EUSIPCO2009) (2009)

288

V. D. Nguyen et al.

10. Nguyen, V.D., Nguyen, D.T., Nguyen, T.D., Truong, Q.D., Le, M.D.: Combination of block difference inverse probability features and support vector machine to reduce false positives in computer-aided detection for massive lesions in mammographic images. In: 6th International Conference on Biomedical Engineering and Informatics (BMEI 2013) (2013) 11. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: 5th Annual Workshop on Computational Learning Theory (1992) 12. Sousa, A., Dinis-Ribeiro, M., Areia, M., Coimbra, M.: Identifying cancer regions in vitalstained magnification endoscopy images using adapted color histograms. In: 16th IEEE International Conference on Image Processing (ICIP) (2009)

Hybrid Convolutional Neural Network Ensemble for Activity Recognition in Mobile Phones Jyh-Huah Chan, Hui-Juin Lim, Ngoc-Son Hoang, Jeong-Hoon Lim, Khang Nguyen, Binh P. Nguyen, Chee-Kong Chui, and Matthew Chin-Heng Chua

Abstract The increasing importance of human activity recognition in Ambient Assisted Living systems (AAL) has brought a tremendous growth in research efforts in the field. Amongst these, the identification of activities of daily living using sensors available in mobile devices has emerged as one of the most interesting goals for AAL systems. With the development of deep learning algorithms in recent years, there has been increasing interest in hybrid models, which are able to perform as well as traditional machine learning models and yet entail the self-learning capabilities of deep J.-H. Chan · H.-J. Lim · N.-S. Hoang · M. C.-H. Chua Institute of Systems Science, National University of Singapore, Singapore 119620, Singapore e-mail: [email protected] H.-J. Lim e-mail: [email protected] N.-S. Hoang e-mail: [email protected] M. C.-H. Chua e-mail: [email protected] J.-H. Lim Division of Neurology, University Medicine Cluster, National University Hospital, Singapore 119074, Singapore e-mail: [email protected] K. Nguyen IBM Vietnam, Hanoi 100000, Vietnam e-mail: [email protected] B. P. Nguyen (B) School of Mathematics and Statistics, Victoria University of Wellington, Wellington 6140, New Zealand e-mail: [email protected] C.-K. Chui Department of Mechanical Engineering, National University of Singapore, 117575 Singapore, Singapore e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_25

289

290

J.-H. Chan et al.

learning at the same time. Using the time series data obtained from the MotionSense dataset, this paper introduces the concept of converting time series signals to images. The paper then proposes a novel approach for activity recognition by fusing a variety of traditional machine learning models with a deep convolutional neural network via a majority voting ensemble. The proposed method produced better results than the traditional machine learning models. Keywords Activity recognition · Recurrence plots · Ensemble · Convolutional neural networks · Machine learning · Deep learning

1 Introduction Human Activity Recognition (HAR) has been a pivotal component in the development of Ambient Assisted Living systems (AAL). To reduce the potential societal and economic problems associated with a rapidly aging population, AAL systems provide an ecosystem of sensors and networks which aims to assist the elderly and disabled people in living independently using their preferred choices of communication devices [1]. With the advent of more sophisticated machine learning algorithms, AAL systems are now able to predict and prevent falls, which are the most common cause of severe injuries in the elderly and closely correlated with individual life expectancy [30]. Traditionally, such systems were based on depth video sequences [37]. However, the ethics and privacy concerns regarding vision-based activity recognition have resulted in low adoption of such technology [38]. Inertiabased methods were developed to address such concerns. For such methods, sensors such as accelerometers and gyroscopes are used to record the inertial data generated by body movement and have been useful for extracting walking patterns [29]. Thus, various techniques have been developed for inertia-based gait recognition [17, 33]. However, most of them require the inertia sensors to be fastened to a specific part of the body, resulting in inconvenience. The integration of many advanced sensors, including accelerometers and gyroscopes, into smartphones and the increasing affordability of smartphones has made them inexpensive and very convenient to collect gait data. This has encouraged more efforts to utilize smartphones for activity recognition [11, 20, 31]. With the number of smartphone users expected to reach 6 billion worldwide by the end of year 2020, AAL systems are now able to increase their adoption at a lower cost. With the increasing importance of smartphones, gait recognition approaches have also evolved immensely by adopting various machine learning techniques. Many statistical modelling approaches using Hidden Markov Models (HMMs) [22, 32, 34] and Naïve Bayes classifiers [3, 4, 8, 18] have shown favorable results. However, strict independence assumptions in those algorithms limit their capability to capture long-range observations, resulting in low accuracy in practice. Techniques such as k-Nearest Neighbor classifiers [25], Decision Trees [28] and Support Vector Machines [2] have also been widely used. Although these may yield higher accuracy

Hybrid Convolutional Neural Network Ensemble for Activity Recognition ...

291

than the statistical methods, the results are often non-intuitive and large amounts of labelled data which are difficult to obtain in reality are often required [28]. Another shortcoming of these approaches is that feature extraction relies on domain expertise, which is expensive and time-consuming. Human expertise can only enable low-level features such as mean, variance, frequency, to be learnt [35], resulting in models which lack the ability for higher-level contextual inferences [36]. The tremendous development in deep learning methods in recent years have effectively been able to combat these shortcomings [9, 12, 19]. These models such as Convolutional Neural Networks (CNNs) are not only able to use unlabeled data for model training [14] but also have better generalization, allowing new tasks to be identified even with few labelled datasets. CNNs also allow automatic learning of features during model training, which removes the hassle of having human intervention in the model. In this paper, we will examine common machine learning methods used for activity recognition and propose an innovative way to fuse automatic learning capabilities from CNNs with the traditional machine learning approaches without any compromise on the accuracy.

2 Materials and Methods The MotionSense dataset [21] used in this study is a time-series dataset generated by gyroscope and accelerometer sensors. It was collected by an iPhone 6 S placed in the test participants’ front pocket using the SensingKit app which uses the Core Motion framework on iOS devices. The accelerometer measures the sum of two acceleration vectors: user acceleration and gravity. User acceleration is the acceleration that the user imparts to the device. Core Motion can differentiate between gravity and user acceleration since it is able to track a device’s attitude using both the gyroscope and the accelerometer. In our experiments, there were 24 participants performing 6 activities in daily life including • Walking Upstairs • Walking Downstairs • Sitting

• Standing • Walking on flat terrain • Jogging on flat terrain

These were done in 15 trials for each person. Each trial collected multi-variate time series data with 12 features • • • •

Attitude roll Gravity X Rotation X Acceleration X

• • • •

Attitude pitch Gravity Y Rotation Y Acceleration Y

• • • •

Attitude yaw Gravity Z Rotation Z Acceleration Z

292

J.-H. Chan et al.

2.1 Ensemble of Machine Learning Methods This section describes the first approach which we have experimented with to classify the 6 activities. This approach adopts manual feature extraction before being applied to an array of traditional machine learning algorithms. First of all, the time-series data was converted into examples using a sliding-window approach as illustrated in [39]. As prior activity recognition studies have demonstrated good results by utilizing a 10-s time interval [16, 20], we have divided the data stream in a similar 10-s interval without overlap. The data was then split into groupings of 400 records, with each grouping corresponding to 1 experiment. This results in 3359 experiments. After which, 6 additional high-level features (Mean, Median, Maximum, Minimum, Skew and Standard Deviation) were generated for each of the 12 features, resulting in a total of 72 features. To prevent overfitting, a training-test split ratio of 70:30 was applied. Python’s Scikit-learn module [15] was used to apply the following algorithms to the dataset: • • • • • •

K-Nearest Neighbor (KNN) Decision Trees Random Forest classifier Gradient Descent classifier Naïve Bayes classifier Support Vector Machine (SVM)

The KNN model was generated using K = 5, which meant that classification of each data point was based on the classes of its 5 nearest neighbors. To ensure that the model was not overfitted for the decision trees, a maximum tree depth of 10 was selected. The Random Forest classifier was also fitted with 100 estimators to reduce the likelihood of the model being overfitted. The Gradient Descent classifier estimates the error gradient for the current model state using examples from the training dataset, and then updates the weights of the model using back-propagation. To prevent the model from converging to a sub-optimal solution too quickly, we have selected a small learning rate of 0.01 for the Gradient Descent classifier. A Support Vector Machine model with a Gaussian Radial Basis function was also presented. To further improve the accuracy of the traditional machine learning algorithms used in the previous section, a voting strategy was applied to the 6 algorithms [15]. Voting is a technique applied to the classification outcome where each classification outcome generates a “vote" for the class. The class which garners more vote is then selected to be the winning algorithm.

Hybrid Convolutional Neural Network Ensemble for Activity Recognition ...

293

2.2 Deep Convolutional Neural Network 2.2.1

Image Representation of Time Series Data

As illustrated in [13], the time series data can be represented as a 2D image via a Recurrence Plot (RP). The RP [10] presents a 2D representation of the time series recurrences, with an aim to reveal with points that some trajectories return to a previous state and can be formulated as follows Ri, j = θ ( − ||si − s j ||), s(.) ∈ R m , i, j = 1..., K ),

(1)

where K is the number of considered states s,  is the threshold distance, ||.|| is a norm and θ (.) is the Heaviside step function. Applying this to the raw data for each of the 6 actions, the following images were obtained (Fig. 1). From the images, there is a clear similarly between walking and jogging, climbing upstairs and downstairs, sitting and standing. This indicated that the data could be classified according to their images.

2.2.2

CNN Architecture

Figure 2 shows the architecture of our proposed deep convolutional neural network. Upon conversion of a time-series signal into a 2D feature map with dimensions of 400 × 12, a 2-stage deep CNN model is applied. The proposed CNN network is constructed with 2 convolution layers and 2 max-pooling layers. The 3 × 3 convolution kernel extracts features from the image by taking the dot product of the pixel values and the kernel. Here we applied a ReLU (Rectified Linear Units) activation

Fig. 1 Image representations of the 6 actions using RP

Fig. 2 Architecture of the proposed CNN model

294

J.-H. Chan et al.

Table 1 Performances of the machine learning models # Models Training Accuracy (%) 1

Random Forest (Baseline) KNN Decision Trees Gradient Descent Naïve Bayes SVM Ensemble model

2 3 4 5 6 7

Test Accuracy (%)

100.00

98.70

97.40 99.62 98.08 95.45 99.78 99.87

95.72 97.72 97.71 95.13 97.91 98.91

function because it does not require expensive computation and is able to train the neural network faster [19]. To ensure that the subsequent layers of the CNN is able to pick up more sophisticated features other than edges, a max pooling layer was applied. After which, a dropout rate of 25% was applied to reduce the likelihood of overfitting. A final fully connected softmax operation was used to produce the classification output as formulated the following equation e zi a(Z )i =  K j=1

ez j

for i = 1, ..., K and z = (z 1 , ..., z K ) ∈ R

(2)

2.3 CNN Fusion with Machine Learning Ensemble Our proposed solution incorporates the high accuracy provided by the ensemble model in Sect. 2.1 with the benefits brought about by CNN in Sect. 2.2, particularly the ability for more meaningful features to be generated. The voting strategy was applied to the two models.

3 Results Table 1 shows the results when applying the six individual machine learning algorithms to the dataset. As highlighted in Sect. 2.1, the Random Forest classifier had the highest accuracy at 98.70% and was thus selected to be the baseline method. Figure 3 shows the confusion matrix when applying this baseline model on the test dataset. The proposed CNN model yielded an accuracy of 98.21% which was lower than that of the baseline method.

Hybrid Convolutional Neural Network Ensemble for Activity Recognition ... Fig. 3 The confusion matrix by the Random Forest classifier

Table 2 Performance of the final proposed model Model Test Accuracy (%) Fusion of CNN and Ensemble

Fig. 4 The confusion matrix by the CNN-ML ensemble

98.81

295

296

J.-H. Chan et al.

Table 2 shows the performance of our proposed model described in Sect. 2.3. By fusing CNN with the ensemble model, the fused model has exhibited a better performance of 98.81% as compared to the baseline of 98.70%. The confusion matrix corresponding to this model is presented in Fig. 4.

4 Discussion and Future Work The results from Tables 1 show that out of all the traditional machine learning approaches, the Random Forest classifier performed the best, the Decision Trees algorithm performed the second best while the Naïve Bayes classifier performed the worst. The superiority of Random Forest for such classification tasks is also in line with some previous works [16, 20]. Another noteworthy observation was that these results indicate that an ensembled model created by a majority-voting scheme improves classification accuracy. This is because the classification based on the traditional machine learning methods takes more advantage from the voting procedure, which treats each vote equally. To improve on the methodology, we could change the majority voting to a weighted majority voting approach in which each vote is weighted based on the accuracy as shown in some other works [23, 24]. As future work, a CNN architecture with more feature representation layers can be investigated for more difficult activity recognition tasks. However, large datasets are needed to train deeper architectures. Exploring other machine learning methods, such as the enhanced k-nearest neighbours [27], linear iterative clustering [26] and kernel dictionary learning [5–7], can also be incorporated.

5 Conclusion In this paper, we have proposed a novel solution to the activity recognition task presented by the MotionSense dataset. Six machine learning approaches were utilized, including the k-Nearest Neighbors, Decision Trees, Gradient Descent, Naïve Bayes, Support Vector Machine and Random Forest classifiers. The Random Forest classifier was chosen to be the benchmark with accuracy of 98.70%. Also, a novel way of transforming time-series signals into images using recurrent plots was introduced, which indicated that CNN could be used for effective activity recognition. As such, a two-stage deep CNN was trained and thereafter fused with the ensemble of the 6 machine learning models using a majority voting ensemble classifier. The resultant model was not only able to deliver an accuracy of 98.81%, which was higher than the baseline method, but was also able to demonstrate the benefits of deep learning such as automatically learning different levels of features in a timely manner.

Hybrid Convolutional Neural Network Ensemble for Activity Recognition ...

297

Acknowledgment. This research is supported by the Singapore Ministry of Health’s National Medical Research Council under its Enabling Innovation Grant, Grant No: NMRC/ EIG06/2017.

References 1. Belbachir, A.N., Drobics, M., Marschitz, W.: Ambient assisted living for ageing well - an overview. e & i Elektrotechnik und Informationstechnik 127(7), 200–205 (2010) 2. Brdiczka, O., Crowley, J.L., Reignier, P.: Learning situation models in a smart home. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(1), 56–63 (2009) 3. Brdiczka, O., Reignier, P., Crowley, J.L.: Detecting individual activities from video in a smart home. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) Knowl. Based Intell. Inf. Eng. Syst., pp. 363–370. Springer, Heidelberg (2007) 4. Chen, M., Huang, B., Xu, Y.: Human abnormal gait modeling via hidden Markov model. In: Proceedings of the 2007 International Conference on Information Acquisition, pp. 517–522 (2007) 5. Chen, X., Nguyen, B.P., Chui, C.K., Ong, S.H.: Automated brain tumor segmentation using kernel dictionary learning and superpixel-level features. In: Proceedings of the International Conference on Systems, Man, and Cybernetics, SMC 2016, pp. 2547–2552, Budapesh, Hungary, IEEE. 9–12 October 2016 6. Chen, X., Nguyen, B.P., Chui, C.-K., Ong, S.-H.: An automatic framework for multi-label brain tumor segmentation based on kernel sparse representation. Acta Polytechnica Hungarica 14(1), 25–43 (2017) 7. Chen, X., Nguyen, B.P., Chui, C.-K., Ong, S.-H.: Reworking multilabel brain tumor segmentation - an automated framework using structured kernel sparse representation. IEEE Syst. Man Cybern. Mag. 3(2), 18–22 (2017) 8. Cook, D.J., Schmitter-Edgecombe, M.: Assessing the quality of activities in a smart environment. Methods Inf. Med. 48, 480–485 (2009) 9. Donahue, J.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017) 10. Eckmann, J.P., Kamphorst, S.O., Ruelle, D.: Recurrence plots of dynamical systems. Europhys. Lett. 4(9), 973–977 (1987) 11. Fernandez-Lopez, P., Sanchez-Casanova, J., Tirado-Martín, P., Liu-Jimenez, J.: Optimizing resources on smartphone gait recognition. In: Proceedings of the IEEE International Joint Conference on Biometrics, IJCB 2017, pp. 31–36, October 2017 12. Graves, A., Mohamed, A.R., Hinton, G.E.: Speech recognition with deep recurrent neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, pp. 6645–6649 (2013) 13. Hatami, N., Gavet, Y., Debayle, J.: Classification of time-series images using deep convolutional neural networks. In: Verikas, A., Radeva, P., Nikolaev, D., Zhou, J., (eds.) Proceedings of the 10th International Conference on Machine Vision, ICMV 2017, vol. 10696, pp. 242–249. International Society for Optics and Photonics, SPIE (2018) 14. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) 15. Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000) 16. Johnston, A.H., Weiss, G.M.: Smartwatch-based biometric gait recognition. In: Proceedings of the 7th IEEE International Conference on Biometrics Theory, Applications and Systems, BTAS 2015, pp. 1–6, September 2015 17. Juefei-Xu, F., Bhagavatula, C., Jaech, A., Prasad, U., Savvides, M.: Gait-ID on the move: pace independent human identification using cell phone accelerometer dynamics. In: Proceedings of the 5th IEEE International Conference on Biometrics: Theory, Applications and Systems BTAS 2012, pp. 8–15, September 2012

298

J.-H. Chan et al.

18. van Kasteren, T., Krose, B.: Bayesian activity recognition in residence for elders. In: Proceedings of the 3rd IET International Conference on Intelligent Environments, pp. 209–212, September 2007 19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Mag. Commun. ACM 60(6), 84–90 (2017) 20. Kwapisz, J.R., Weiss, G.M., Moore, S.A.: Cell phone-based biometric identification. In: Proceedings of the 4th IEEE International Conference on Biometrics: Theory, Applications and Systems BTAS 2010, pp. 1–7, September 2010 21. Malekzadeh, M., Clegg, R.G., Cavallaro, A., Haddadi, H.: Protecting sensory data against sensitive inferences. In: Proceedings of the 1st Workshop on Privacy by Design in Distributed Systems, pp. 2:1–2:6, New York, NY, USA. ACM (2018) 22. Mannini, A., Sabatini, A.M.: Gait phase detection and discrimination between walking-jogging activities using hidden Markov models applied to foot motion data from a gyroscope. Gait Posture 36(4), 657–661 (2012) 23. Mannini, A., Sabatini, A.M.: A smartphone-centered wearable sensor network for fall risk assessment in the elderly. In: Proceedings of the 10th EAI International Conference on Body Area Networks, BodyNets 2015, pp. 167–172 (2015) 24. Mannini, A., Sabatini, A.M., Intille, S.S.: Accelerometry-based recognition of the placement sites of a wearable sensor. Pervasive Mobile Comput. 21, 62–74 (2015) 25. Maurer, U., Rowe, A., Smailagic, A., Siewiorek, D.: Location and activity recognition using eWatch: a wearable sensor platform. In: Cai, Y., Abascal, J. (eds.) Ambient Intelligence in Everyday Life: Foreword by Emile Aarts, pp. 86–102. Springer, Heidelberg (2006) 26. Nguyen, B.P., Heemskerk, H., So, P.T.C., Tucker-Kellogg, L.: Superpixel-based segmentation of muscle fibers in multi-channel microscopy. BMC Syst. Biol. 10(5), 124 (2016) 27. Nguyen, B.P., Tay, W.-L., Chui, C.-K.: Robust biometric recognition from palm depth images for gloved hands. IEEE Trans. Human-Mach. Syst. 45(6), 799–804 (2015) 28. Ravi, N., Dandekar, N., Mysore, P., Littman, M.L.: Activity recognition from accelerometer data. In: Proceedings of the 17th Conference on Innovative Applications of Artificial Intelligence - Volume 3, IAAI 2005, pp. 1541–1546. AAAI Press (2005) 29. Rida, I., Almaadeed, N., Almaadeed, S.: Robust gait recognition: a comprehensive survey. IET Biometrics 8(1), 14–28 (2019) 30. Stolze, H., Klebe, S., Zechlin, C., Baecker, C., Friege, L., Deuschl, G.: Falls in frequent neurological diseases: Prevalence, risk factors and aetiology. J. Neurol. 251(1), 79–84 (2004) 31. Sun, B., Wang, Y., Banda, J.: Gait characteristic analysis and identification based on the iPhone’s accelerometer and gyrometer. Sensors 14(9), 17037–17054 (2014) 32. Taborri, J., Rossi, S., Palermo, E., Patane, F., Cappa, P.: A novel HMM distributed classifier for the detection of gait phases by means of a wearable inertial sensor network. Sensors 14(9), 16212–16234 (2014) 33. Trung, N.T., Makihara, Y., Nagahara, H., Mukaigawa, Y., Yagi, Y.: Performance evaluation of gait recognition using the largest inertial sensor-based gait database. In: Proceedings of the 5th IAPR International Conference on Biometrics, ICB 2012, pp. 360–366, March 2012 34. Wang, P., Li, W., Li, C., Hou, Y.: Action recognition based on joint trajectory maps with convolutional neural networks. Knowl. Based Syst. 158, 43–53 (2018) 35. Bo Yang, J., Nguyen, M.N., San, P.P., Li, X.L., Krishnaswamy, S.: Deep convolutional neural networks on multichannel time series for human activity recognition. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI 2015, pp. 3995–4001. AAAI Press (2015) 36. Yang, Q.: Activity recognition: linking low-level sensors to high-level intelligence. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, IJCAI 2009, pp. 20–25, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc (2009) 37. Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia, MM 2012, pp. 1057–1060, New York, NY, USA. ACM (2012)

Hybrid Convolutional Neural Network Ensemble for Activity Recognition ...

299

38. Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Computing Surveys 38(4) (2006) 39. Yoneda, K., Weiss, G.M.: Mobile sensor-based biometrics using common daily activities. In: Proceedings of the 8th IEEE Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2017, pp. 584–590, October 2017

Text Mining Analysis of Comments in Thai Language for Depression from Online Social Networks Pornpimol Chaiwuttisak

Abstract The objectives of this research were to analyze the relationship of the phrases or words commonly found in the comments from depression hashtag on Twitter using the association rules. The data used in this study were collected from comments in Thai language via depression hashtag on Twitter during 1 January 2019 to 31 January 2019, in total of 1,500 comments. According to the comments in Thai language on social media collected by using Rapidminer Studio 9 software to get the word about depression and used to analyze relationships of words from a text comment to get the format data Association. The frequency of words and phrases in a form of presentation is used to describe the various opinions about the depression that has a presentation on social media. According to the model performance in each of the above methods, it was found that Euclidean Distance provided the best result due to the smallest average distance at all points in each cluster which was equal to 152.504. The association analysis, a total of 30 association rules were obtained, the support of 0.5% and the 80% confidence. Keywords Text Mining · Comment · Depression · Social media · Hashtag · Twitter · Association rule

1 Introduction Social media is a way to share information and communicate effectively through web-based technology. It is employed as communication tools and guidelines for healthcare providers [1]. News about political, social, economic and public has been presented to the public by spreading information through online social media. In facts, social media has become a part of life and has become the fifth factor of daily life for people in today’s era. For Thailand, [2] reported that nearly 40% of the people use various social media services such as Facebook, Twitter and Instagram etc. via the P. Chaiwuttisak (B) Department of Statistics, Faculty of Science, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_26

301

302

P. Chaiwuttisak

internet system that people can present and distribute information through information technology, computer equipment and various types of electronic communication devices. Twitter is a social media that can send short information to a large number of recipients within a short period of time through the internet network [3]. The report on the number of users on social media in Thailand presented that there were 12 million Twitter users, which increased from the previous year by 33% and the number of active users was approximately 5.7 million or increasing by 80%, while Instagram has 13.6 million users, increasing by 24% and Facebook is a large platform with 49 users Millions of people, but increasing by only 4% or 2 million. Twitter is a fastgrowing online social media and people are constantly interested in using it. Twitter can be used to express opinions, thoughts, and life styles. It becomes the popular channel with the use of hashtags [4]. Depression is a common mental disorder and major public health problems which negatively affect people’s daily lives, workplace performance and relationships with colleagues and peers. At worst cases, depression can lead to suicide. In Thailand, Depression disorder was ranked 6th and the incidence rates during 2005–2016 have increased by 13%. Social network is used as the popular channel to communicate with friends and share opinions, pictures, and video. It can reflect to moods, attitudes and feelings, including depression of people. It is a great challenge to analyze data collected through social network for opinions and feelings that take to depression. Therefore, the objective of the research is to study the opinions about depression in Thai language, through hashtags on Twitter and find the relationship patterns of words that appear in various opinions using Text Mining. Moreover, knowledge gained from the study can express symbols of people who will have risk to be depression.

2 Literature Reviews [5] classified people with depression disorder by using classification techniques that are Multinomial Naïve Bayes, Random Forest, Gradient Boosting, and Ensemble Vote Classifier. As a result, Ensemble Vote Classifier gave the best accuracy. [6] studied Facebook messages posted by 683 patients. The result showed that language predictors of depression include emotional (sadness), interpersonal (loneliness, hostility), and cognitive (preoccupation with the self, rumination). [7] studied prediction for major depression through twitter. It showed that individuals with depression play greater negative emotion, high self-focused attention, and expression of religious thoughts.

Text Mining Analysis of Comments in Thai Language …

303

2.1 Text Mining Text Mining is a process to extract meaningful patterns from the unstructured text using data mining tools based on statistical and machine learning algorithms. The process of Text Mining consists of the following steps shown in Fig. 1. Word Segmentation can be classified into 3 major categories: (1) Word segmentation using rules is the method that checks character rules, methods for character combination, spacing, and deduction to determine the scope and function of words. This limitation of method is low accuracy of word segmentation, but high accuracy in syllable level. Moreover, it is quick to work and uses fewer resources. (2) Word segmentation based on dictionaries compares string to the words in the dictionary. All words must be stored in the dictionary. That method gets more correctness than using rules but takes more time [8]. (3) Word segmentation using statistical methods to be used in language processing based on probability (Probabilistic word segmentation) and word characteristics (Feature-based word segmentation). Probabilistic word segmentation uses word n-gram model to find the form of word segmentation and possible word order. In this way, the data warehouse which contains word segmentation and identify kinds of words must be employed. The pattern of word segmentation with the highest probability is selected [9]. Stop Word Removal is removal of insignificant words and the meaning of the phrase has not changed. In general, stop words appear in every message with high frequencies. The stop word is a feature that is irrelevant or useless in classification. The elimination of stop words is a process that will help to reduce the size of the index and also reduce the size and processing time. Types of stop words in Thai can be preposition, conjunction, pronoun, adverbs and exclamation. Word stemming is finding the original form of a word without prefixes and suffixes or finding words with similar meanings to be combined into one word, but there is

Fig. 1 Process of Text Mining

304

P. Chaiwuttisak

no algorithm for finding the root due to the complexity of Thai grammar. Therefore, the library of Thai root words which contains words with similar meanings derived in the same root are evaluated Thai language experts.

2.2 LextoPlus Program LextoPlus is a Thai lexeme tokenization and normalization tool based on the longest matching technique which provides high accuracy of Thai tokenization. Users can add new vocabulary which may not yet exists in dictionaries or contains specialized terminology. In addition, the system also supports normalizing words in social languages. The characteristics of LextoPlus are as follows: (1) (2) (3) (4) (5) (6) (7)

Compatible with all operating systems (Windows, Unix, OSX) Support as a service (REST Full Service) Fast processing, average 435,596 words per second Display the results in JSON format which makes it easier to use. Support to normalize words in social languages Intelligent management of words that are not in the dictionary More than 60,000 vocabulary sets and can be added as needed.

2.3 Clustering Clustering is one of data mining techniques based on machine learning. Data set as a vector is divided into groups or cluster. The similar features are arranged in the same group. Algorithms used to divide the group are based on similarity or proximity. It is calculated by measuring the distance between the data vectors.

2.4 Association Rule Analysis Association rule analysis is one of Data Mining techniques to find the interesting patterns from large [10]. The database used for Association rule analysis is often a transaction database type. The results of the Association rule can be written as a set of items. For example, customers who buy diapers mostly buy beer as well. The data used will be in the form of nominal or ordinal only. Association rule analysis is known as Market basket analysis. In principle, it finds groups of items that are likely to appear together in one transaction, which is often a transaction at the point of sale. It tells the possibility of buying various products together [11]. Market basket analysis plays an important role to the retail industry to study customer behavior.

Text Mining Analysis of Comments in Thai Language …

305

Support is an indication of how frequently the itemset appears in the dataset, while Confidence is an indication of how often the rule has been found to be true. The lift of a rule is defined as the ratio of the observed support to that expected if left-hand side of the rule (X) and right-hand side of the rule (Y) were independent. If the lift is equal to 1, it means that the probability of the occurrence of the ancestors and the consequences are independent of one another. When two events are independent from each other, no rules will be drawn about both events. If the lift is greater than 1, that the level of both occurrences depend on each other and make those rules useful for predicting the results in future data sets, and if the lift is less than 1, then we will know that those items have been replaced. This means that having one item has a negative effect on the appearance of another item.

3 Experiment Design In the study, the conceptual framework of conducting research to extract opinions and comments about depression disorder can be divided into 3 main steps, namely (1) Data Collection (2) Data Preparation, and (3) Modeling as shown in Fig. 2 to study and analyze the text using Cluster Analysis with K-Means algorithm and Association Rules Analysis using FP-Growth algorithm.

3.1 Text Preprocessing, Transformation, and Feature Extraction Rapidminer software through Search Twitter operator is used to retrieve data from (in Thai Language). social media Twitter with depression disorder keyword Approximately 1,500 sentences are saved to a spreadsheet file. Some duplicate sentences are removed and only 1,354 sentences obtained from the data cleaning process. Word Segmentation is a process that converts sentences into words using PyThaiNLP as a Natural Language Processing module for Thai Language. The method for word segmentation in the PyThaiNLP is called word_tokennize using the bi-directional Recurrent Neural Network (RNN). Moreover, a set of frequent words is determined as stop words which do not convey much meaning. It is neces”, “ ”, “ ”, sary to remove these words from the sentences. For example, “ ”, “ and “ ” (in Thai Languages) etc. Word Stemming is a process that helps to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For example, the ”, “ ”, “ ”] have similar meaning and derived words from following words [“ base bird (“ ” in Thai Language) etc.

306

P. Chaiwuttisak

Fig. 2 Research framework

Partial codes and results of word segmentation implemented by Python Programming are shown in Fig. 3 and Table 1.

3.2 Text Mining Techniques K-Means is the most popular clustering algorithm based on centroid method. In principle, centroids are selected randomly for predefined k clusters and assign the remaining objects to the closest centroid on the basis of distance measure. After that the new centroid of every cluster will be updated by calculating the mean value of the objects in a cluster. This process is repeated until there is no change in the centroid

Text Mining Analysis of Comments in Thai Language …

307

Fig. 3 Word segmentation through the Pycharm Community Edition 2018.3.5 Table 1 Example of tokenization in Thai Language using the Pycharm Community Edition 2018.3.5

308

P. Chaiwuttisak

of the clusters. The following operators in Rapidminer employed to cluster analysis are shown in Table 2. FP-Growth method is used to find association rules using graph method which consists of node and edge. It can be divided into two steps: (1) building a compact data structure called the FP-tree, (2) Extracts frequent item sets directly from the FPtree. The following operators in Rapidminer employed to association rule analysis are shown in Table 3. Table 2 Operators for clustering

Operator

Name

Description

Retrieve

The operator can access the data stored in the storage space and load it into the process.

Loop parameters

The operator is iterative all defined parameter combinations and can be set by the wizard provided in the parameters.

k means

The operator performs clustering using the k-mean algorithm.

Cluster Distance

The operator is used to evaluate the efficiency of the distance measurement of data in the same group.

Performance

Log

This operator stores data in the table. This information can be almost anything, including operator parameters, operator count, execution time, etc. The stored data can be plotted by the GUI when the process is completed.

Text Mining Analysis of Comments in Thai Language …

309

Table 3 Operators for association rule analysis

Operator

Name

Description

Retrieve

This operator can access data stored in the repository and load it into the process.

Filter Examples

This operator selects samples from the stored sample set and the deleted samples.

Numerical to Bi-

This operator will change the type of number properties to binomial types. In addition, it matches all attribute values with corresponding binomial values.

nominal

FP Growth

This operator calculates all frequently occurring sets in the sample set. By using the FP-tree data structure

Create Association Rules

This operator creates a set of association rules from the specified item set.

4 Result The total number of words consists of totally different 788 words: 401 verbs, 251 nouns and 136 adverbs from Figs. 4, 5 and 6. Examples of verb words are regret ( in Thai Language), leave ( in Thai Language), whirl ( in Thai Language), suffer ( in Thai Language), and love ( in Thai Language). Examples of noun words are mind ( in Thai Language), human ( in Thai Language), and body ( in Thai Language). Examples of adjective words are inferior ( in Thai Language), wrong ( in Thai Language), and fine ( in Thai Language).

310

P. Chaiwuttisak

Fig. 4 Verb Thai word cloud expresses opinions about depressive disorders

Fig. 5 Noun Thai word cloud expresses opinions about depressive disorders

Texts are classified into 5 clusters with k-means clustering analysis. Cluster 1 reflects weakness and tiredness. Cluster 2 reflects effort to get attention and seeking love and acceptance. Cluster 3 reflects fear and nervousness. Cluster 4 reflects think about past failures and efforts to pass that time. Cluster 5 reflects insulting themselves. Most messages are grouped in the Cluster 5. It is corresponding to signs of depression

Text Mining Analysis of Comments in Thai Language …

311

Fig. 6 Adjective Thai word cloud expresses opinions about depressive disorders

disorder. Depression can make a person feel as if they have no worth as a human being [12]. Association patterns of words that are seen frequently through Thai language comments on the social media, The FP-GROWTH techniques with Min Support = 0.005 and Support = 0.8. Some association rules are shown in Table 4. Only association rule can be predicted because lift is greater than 1 or 100%. It shows a relation of Thai words about fear. Table 4 Association rules that have a lift value is not equal to 100% Association Rule Smile (

in Thai)

Smile (

in Thai),

Time (

in Thai)

Wait (

in Thai),

Sad (

in Thai)

Laugh (

in Thai)

Crying (

in Thai),

Good ( in Thai),

Alone (

in Thai)

Depressed ( Thai) Fear ( in Thai)

Chicken-hearted (

in Thai)

Support

Confidence

0.005

1

Lift (100%) 90.26

0.005

1

90.26

0.005

1

96.71

0.005

1

104.15

in

312

P. Chaiwuttisak

5 Conclusion Depressive disorder is a disease that can occur with people of all ages. In several cases, patients do not realize that they are suffering from depression or may become aware when the symptoms of the disease become so severe that it affects their daily lives. Symptoms of depression may seem like sadness, but they affect more severe and occurs for much longer. Depression patients are not classified as insane, but are emotional illnesses that should be treated because it may result in death, if the patients are ignored. They may have behavior for self-harm, short-term pain or suicide. Thus, everyone should pay attention to understand opinions and comments to help people with this disease. This research therefore studies text mining techniques to be applied in the analysis of Thai comments and opinions about depression on social media via Hashtag. Two text mining techniques are employed to cluster comments and opinions using K-means algorithm and Association rules of words involving depression using the Apriori algorithm. It shows that there are five set of Thai words about weakness and tiredness, effort to get attention and seeking love and acceptance, fear and nervousness, think about the pass failures, and low self-esteem. Moreover, it presents the association rule of Thai words that people share opinion and feeling with depression.

References 1. Spector, N., Kappel, D.M.: Guidelines for Using Electronic and Social Media: The Regulatory Perspective. http://www.nursingworld.org/MainMenuCategories/ANAMarketplace/ANAPer iodicals/OJIN/TableofContents/Vol-17-2012/No3-Sept-2012/Guidelines-for-Electronic-andSocial-Media.html 2. Zocial. Thai social network that marketers should not miss. http://www.zocialinc.com/blogsocial-network. (in Thai) 3. Thawisuwan, P.: Twitter Sukhoi, 1st edn. Success Media, Bangkok (2009). (in Thai) 4. ETDA Thailand, Internet user’s behavior in 2018, Thai people use the internet for 10 hours, 5 minutes per a day. https://www.etda.or.th/content/etda-reveals-thailand-internet-user-profile2018.html. (in Thai) 5. Kumar, A., Sharma, A., Arora, A.: Anxious depression prediction in real-time social data. In: Proceedings of International Conference on Advanced Engineering, Science, Management and Technology (ICAESMT 2019) (2019) 6. Eichstaedt, J.C., Smith, R.J., Merchant, R.M., Ungar, L.H., Crutchley, P., Preo¸tiuc-Pietro, D., Schwartz, H.A.: Facebook language predicts depression in medical records. Proc. Natl. Acad. Sci. U.S.A. 115(44), 11203–11208 (2018) 7. De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (2013) 8. Charoenpornsawat, P.: Feature-based Thai word segmentation. Master of Engineering in Computer Engineering thesis, Chulalongkorn University (1998). (in Thai) 9. Tongaree, P.: A comparative study of word segmentation on Thai text categorization. Master of Computer Science thesis, Thammasat University (2008). (in Thai) 10. Rygielski, C., Wang, J.-C., Yen, D.C.: Data mining techniques for customer relationship management. Technol. Soc. 24(23), 483–502 (2002)

Text Mining Analysis of Comments in Thai Language …

313

11. Thepmak, S.: The market basket analysis system using association rules case study cosmetic. Master of Science thesis, Information Technology Faculty, King Mongkut’s University of Technology North Bangkok (2011). (in Thai) 12. National Institute of Mental Health (NIH), Depression. https://www.nimh.nih.gov/health/top ics/depression/index.shtml

Building a X-ray Database for Mammography on Vietnamese Patients and automatic Detecting ROI Using Mask-RCNN Nguyen Duc Thang, Nguyen Viet Dung, Tran Vinh Duc, Anh Nguyen, Quang H. Nguyen, Nguyen Tu Anh, Nguyen Ngoc Cuong, Le Tuan Linh, Bui My Hanh, Phan Huy Phu, and Nguyen Hoang Phuong Abstract This paper describes the method of building a X-ray database for Mammography on Vietnamese patients that we collected at Hanoi Medical University Hospital. This dataset has 4664 images (Dicom) corresponding to 1161 standard N. D. Thang · N. T. Anh · P. H. Phu · N. H. Phuong (B) Thang Long University, Hanoi, Vietnam e-mail: [email protected] N. D. Thang e-mail: [email protected] N. T. Anh e-mail: [email protected] P. H. Phu e-mail: [email protected] N. V. Dung · T. V. Duc · Q. H. Nguyen Hanoi University of Science and Technology, Hanoi, Vietnam e-mail: [email protected] T. V. Duc e-mail: [email protected] Q. H. Nguyen e-mail: [email protected] A. Nguyen Auburn University, Auburn, AL, USA e-mail: [email protected] N. N. Cuong · L. T. Linh · B. M. Hanh Hanoi Medical University Hospital, Hanoi, Vietnam e-mail: [email protected] L. T. Linh e-mail: [email protected] B. M. Hanh e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Kreinovich and N. Hoang Phuong (eds.), Soft Computing for Biomedical Applications and Related Topics, Studies in Computational Intelligence 899, https://doi.org/10.1007/978-3-030-49536-7_27

315

316

N. D. Thang et al.

patients with uniform distribution according to BIRAD from 0 to 5. This paper also presents the method of detecting Region of Interest (ROI) in mammogram based on Mask R-CNN architecture. The method of determining the ROI for accuracy [email protected] = 0.8109 and the accuracy of classification BIRAD levels is 58.44%.

1 Introduction Recent studies [20] show that breast cancer is one of the leading cancers of Vietnamese women between the ages of 45 and 55. According to the WHO health statistics on gender health, Vietnam had about 11,060 new cases in 2012, an increase of 30% compared to 2002 [21], of which 64% were women under 50 years old. In 2012, 1,677 million new breast cancer cases were discovered worldwide and accounted for 25.2% of all cancers seen in women. In 2011, more than 508,000 women died from breast cancer and in the United States alone, approximately 316,120 new breast cancer and 40,610 deaths in 2017, the number of breast cancer cases increased from 10 to 50% in the period from 1975 to 2014. Moreover, according to a 5-year US study, the survival rate at stages 0 and 1 was almost 100%, but this rate decreased significantly when treated at a later stage: 93% in stage 2, 72% in stage 3 and 22% in stage 4. Therefore, the early detection of breast cancer is very important in treating and reducing the mortality rate of this kind of disease. Medical imaging is currently one of the most popular and effective methods used to detect early breast cancer [3, 5, 9–15]. However, this accuracy depends on the circumstances, doctor degree and sometimes the subjective factors of the doctor. One of the low-cost popular methods used to screen for breast cancer is a mammography. However, this method has a sensitivity of only 84.4%. This means that the remaining 15.6% of cases cannot be detected by this method. The main reason is the dense level of mammary fat. The sensitivity is increased to 87% when mammograms are less prone to fat tissue. In contrast, the sensitivity drops to 62.9% when a mammogram has lots of fat tissue surrounding the mammary glands. The second main reason is the quality of mammograms. The X-rays is used to beam through the mammary gland and records mammography. However, the use of radiation intensity will also affect the quality of the mammogram. In addition, the diagnosis results of doctors often achieve accuracy from 62 to 80% for many reasons due to their professional qualifications and ability to work continuously. Consultation is a solution that increases accuracy but takes time, costs and is not necessary in some cases. A CAD system that helps doctors detect abnormal signs to increase accuracy, save time and reduce costs. The paper includes the following sections. Section 2 describes Related Works on building X-ray databases for Mammography around the world. Section 3 describes the X-ray for building a X-ray Database for Mammography on Vietnamese patients that we collected at Hanoi Medical University Hospital, which is a leading hospital in Vietnam. Section 4 describes a new method of determining location of ROI based on Mask R-CNN. Section 5 is conclusions and perspectives.

Building a X-ray Database for Mammography on Vietnamese Patients ...

317

2 Related Works in X-ray Dataset for Mammography In the problems of applying Deep learning method, the data plays a very important role that decides the accuracy of the problem and the diagnosis of breast cancer is no exception. There are currently a number of mammograms dataset such as DDSM [22], CBIS-DDSM [23], MIAS [24], and the NYU breast cancer screening dataset [25].

2.1 DDSM và CBIS - DDSM Dataset DDSM (Digital Database for Screening Mammography) [22] is a dataset with more than 2620 cases and each scan will have 2 CC (craniocaudal view) and MLO (mediolateral oblique view) of 2 breasts (Fig. 1), 10480 mammograms in total. The dataset was categorized with 3 labels as normal, benign and melanoma and confirmed by a diagnostic imaging doctor. DDSM dataset is saved as JPEG format. However this format is lossless and images in this dataset are still too much noisy. The CBIS-DDSM dataset (Curated Breast Imaging Subset of DDSM) [23] is a subset of DDSM dataset. This data set is extracted, preprocessed and saved in standard DICOM format with better image quality than images in DDSM files. However, this dataset contains only abnormal x-ray images. CBIS-DDSM consists of 3101 highresolution breast x-ray images that have been labeled with the following distribution: • • • •

36% of images are labeled as Begin mass 25% of images are labeled as Begin calc 24% of images are labeled as Malignant mass 15% of images are tagged as Malignant calc

Figure 2 depicts the statistics of CBIS-DDSM database according to BI-RADS and Fig. 3 depicts statistics by pathology.

Fig. 1 Four breast x-ray images per case included CC-L, CC-R, MLO-L, and MLO-R

318

N. D. Thang et al.

2.2 The MIAS Dataset MIAS (Mammographic Imaging Analysis Society) [24] is a small collection of mammograms of about 322 images with a resolution of 1024 by 1024 px. Images in the MIAS dataset are stored as PGM (x-portable-graymap) format and also labeled as similar to CBIS-DDSM files. However, unlike CBIS-DDSM dataset, images in the MIAS can only know the ROIs (Region Of Interest) frame through 3 parameters: ROI (x, y) coordinates and radius containing the ROI area. Given the relatively small number and heterogeneous way of determining ROI, this dataset will not be sufficient for use in the model training process, but will be appropriate when used to evaluate the model.

Fig. 2 Number of mammograms of CBIS-DDSM dataset according to BI-RADS

Fig. 3 Number of mammograms of CBIS-DDSM dataset by pathology

Building a X-ray Database for Mammography on Vietnamese Patients ...

319

2.3 The NYU Breast Cancer Screening Dataset This dataset includes 229,426 digital screening mammography exams (1,001,093 images) from 141,473 patients [25]. Each exam contains at least four images, corresponding to the four standard views used in screening mammography: R-CC (right craniocaudal), L-CC (left craniocaudal), R-MLO (right medio-lateral oblique) and L-MLO (left mediolateral oblique). However, this dataset does not define ROI areas and this dataset is not also public.

3 Building a X-ray Database for Mammography on Vietnamese Patients Breast cancer is a disease of a local nature where characteristics vary by region, country, environment, and so on etc. Therefore, the breast cancer detection system to take an accurate diagnosis for Vietnamese patients needs to be trained on the medical history of Vietnamese. Therefore this study will need a X-ray dataset for mammography of Vietnamese volunteers, which will be authorized for use by the Hanoi Medical University Hospital for research purposes. All volunteer information will be replaced with an identification number before being delivered to the research team. It is estimated that Hanoi Medical University Hospital will provide over 10,000 mammograms with detailed transcription. Since 1993, the American Photovoltaic Association (ACR) has introduced a BIRADS mamography classification that aims to standardize the diagnosis, prognosis of malignancy risk. Hanoi Medical University Hospital also wants system to be able to diagnose into 6 types of BI-RADS from 0 to 5 in which BI-RAD 0 corresponds to the case of insufficient information to identify the disease, BI-RAD. 1 is no injury, BI-RAD 2 is when the symptom of benign lesions begins, BI-RAD 3 is the case of highly benign (0.8% malignant) lesions, BI -RAD 4 was when malignant abnormal signs were started to find (31.1% malignancy), BI-RAD 5 was the case with high suspicion of malignancy (96.9% malignancy). Figure 4 illustrates an process of building a X-ray Database for Mammography on Vietnamese patients. This is the result of a joint collaboration between Hanoi Medical University Hospital and Thang Long University. The steps in the process include: 1. PACs: Image record keeping system of Hanoi Medical University Hospital. 2. ICY: Software to help localize lesions with simple, effective and time-saving operations. 3. Doctor: The radiologists at Hanoi Medical University Hospital. 4. Cloud storage: Service for image exchange between Hanoi Medical University Hospital and Thang Long University.

320

N. D. Thang et al.

Fig. 4 Overview of the process of developing a X-ray Database for Mammography on Vietnamese patients

5. Checker: members of AI-Brain group of ThangLong University who are responsible for monitoring the quality and standardizing errors before putting them into the database. 6. Database: Stores images that have been standardized. Tasks performed in the process: 1. Task1: Export data from PACs and perform Perform ID tag assignment for each patient. 2. Task2: Doctors perform localization of lesions and identify BI-RADs. 3. Task3: Doctor exports file of injury areas and proceed to put with the original DCM file whi.xml and .dcm image file to Cloud storage. 4. Task4: ThangLong AI-Brain team member monitors quality. 5. Task5: Put standardized images into the database at ThangLong University. The collected data are described in Fig. 5. This dataset is divided into 3 parts: 60% for training set, 20% for validation set and 20% for test set.

Building a X-ray Database for Mammography on Vietnamese Patients ...

321

Fig. 5 Data summarization in Vietnamese mammograms dataset

4 Methods of Detecting ROI This section describes the method we proposed for detecting ROI areas in mammograms.

4.1 Proposal Method Figure 6 describes the training and evaluation process of ROI detection model using Mask R-CNN. Data Preprocessing step: When data is extracted, processed, and cleaned, the system performance is greatly improved. Therefore, this is one of the very important steps of the Deep learning problem in general and the diagnosis of breast cancer in particular. In the dataset, the breast x-ray dicom files will often have different quality coming from many causes such as shooting environment, mammography equipment, different breast structure of patients. In addition, data is often labeled during a doctor’s diagnosis. Mediolateral oblique view (MLO) are often stuck to the patient’s chest muscles, etc. These need to be preprocessed before training and identifying mammograms. In this section, we will present some pre-processing methods of mammograms which involved : removing the label from the image, separating the background in a mammogram, increasing the contrast and reducing noise.

Fig. 6 The process of training and evaluating the ROI detection model using Mask R-CNN

322

N. D. Thang et al.

Removing the label from a mammogram is an important step to make the learning process better. The labels will be on non-breast areas and usually on the background. So to remove the label we just need to separate the breast region with the background of the image.

4.1.1

Method with International Data

International data is of non-standard, blurry image, but with the support of OpenCV2 library, we can crop images as desired through the following steps (Fig. 7): • Convert images as black and white images (binary image) using the function cv2.threshold. • Then select the largest white image area. • The original image is "bitwise and" with the binary image to get a clean image area. We then continue to process the noise and increase the contrast of the x-ray image. In terms of noise processing, we would recommend using a nonlinear median filter. To increase the contrast we will use the CLAHE (contrast-limited adaptive histogram equalization) method (Fig. 8). In mammograms, the breast area is usually much smaller than the area of the X-ray image, so the black background is often quite large. Therefore, cropping the breast area will reduce database size and image loading time during training [19]. The steps taken include (Fig. 9): • Perform a full mammogram of L-CC images in the same direction as R-CC and similar to L-MLO images that are flipped in the same direction as R-MLO. • Binary the whole image using cv.threshold. • Select the largest region in the image because the largest area is the breast area. • Proceed to determine the non-zero and rightmost point in the central region (as shown in Fig. 2).

Fig. 7 Remove the image tag and the information on the mammogram

Building a X-ray Database for Mammography on Vietnamese Patients ...

323

Fig. 8 Increase the contrast and reduce noise in mammogram

Fig. 9 Method to crop the image

• From the 2 points above, crop the image and return to the original flip position. The computational complexity of this method is mainly to find the cutoff point O(hw) with ‘h’ and ‘w’ respectively the height and width of the image.

4.1.2

Preprocessing Vietnamese Mammograms

Vietnamese mammograms provided by Hanoi Medical University which are collected by modern technology and high-resolution (performed on Siemens’s mammomat inspiration). Therefore, the images have high accuracy and almost no noise. We therefore recommend a new preprocessing method for this type of image (Fig. 10). This method includes the following steps: • The boundary of the breast is determined by following a border different from the background of the image. Because of the standard data, this step has only O(h + w) complexity.

324

N. D. Thang et al.

Fig. 10 Preprocessing data for Vietnamese mammograms

• To cut the image as minimalist as possible, we need to identify 3 cutting points by defining 1 point of ‘maximum’ and 2 points of “minimum”. This algorithm has O(h) complexity. Making patches: One problem that will be encountered when diagnosing breast cancer is that the size of the abnormal points (tumor) is quite small compared to the size of the entire image. In addition, training on large images (1000 × 5000 px) will be less efficient and time consuming. In fact, the diagnosis will be based on the characteristics of abnormal areas and surrounding areas (background) to make decisions. With the mask image (the doctor’s identifying area) we will create ROIs image areas ourselves, these areas have background surrounding. So to ensure the effect of the model training, we extracted patches with dimensions of 1024 × 1024 px, 512 × 512 px, 256 × 256 px. We will design ConvNet to accept 256 × 256 px input. The effectiveness has been verified through studies [2, 17, 18]. To increase the size of the training set, we apply the following process: • Center region, 8 corners shifting and cropping on the original image as depicted in Fig. 11. • Perform rotation, flip, deformation, increase the contrast, ... on the image made in the step above as shown in Fig. 12.

Fig. 11 Example of patches when ROIs are moved and cropped on the original image

Building a X-ray Database for Mammography on Vietnamese Patients ...

325

Fig. 12 Example of a set of ROIs when applying augmentation

Abnormal image extraction method: According to the analysis of UCI data, it is shown that the edge/edge characteristics of abnormal areas are very important in determining pathology and disease severity. This has been confirmed from expert imaging doctors that the surrounding environment of the abnormal area is also an important component in the diagnosis. To create the most efficient dataset, each image has been extracted in the following ways: • Determine the center of interest and calculate the area of the ROI. Use a 1:1 aspect ratio with 256, 512 px pixels with ROI center and flips along the vertical, horizontal and center axes by 4 angles. • Crop ROI if it satisfies: Area of ROI/Area of crop ROI ≥ 0.7. Model architecture: In this study we will train the Resnet convolutional neural network which already pre-trained on ImageNet/COCO datasets. Transfer learning is one of the popular techniques of Deep learning. This technique uses a previously trained network model and instead of retraining from the beginning, we will continue to fine-tune the model on the dataset of the problem. This makes the model converge faster in the training process and the system also has better accuracy. This also solves the problem of small datasets, because with Deep Learning models to effectively train from the beginning, at least the data samples for one class should have at least 10,000 images. Our problem needs to solve two tasks: identifying abnormal regions and determining BI-RAD level of the abnormal region (on a BI-RADS scale from 0 to 5). Therefore this problem will be divided into 2 phases. Phase 1 will identify vulnerability regions and phase 2 classifies BI-RADS of each region.

326

N. D. Thang et al.

Realizing that this is a problem that can apply the Object detection and Segmentation technique (with the region to be identified as tumors), this vulnerability area is the objects to be identified and classified. Among the most common methods to solve the above problem are Faster RCNN [16] and Mask-RCNN [8], these two methods also use CNNs architectures such as Resnet, Inception, Exception, etc. Mask R-CNN is a method that extends from Faster R-CNN with masking branch for segmentation of objects in an image. The architecture of the model determining and classifying the ROI in the mammogram described in Fig. 12. This architecture consists of two main phases: • Phase 1: Classifying mammograms with 2 labels BI-RAD 0 and 1 compared to the remaining BI-RADS. The idea is to use Resnet 50/Resnet 101, inception, ... and realizing at the whole image level. • Phase 2: Identify, localize and classify lesions according to BI-RADS from 2 to 5 using Mask-RCNN method.

4.2 Testing and Evaluation One problem is that the Vietnam dataset is quite small in size. Also among the datasets that have been built in the world, the DDSM and CBIS-DDSM datasets are public. So we used these data sets to train the ROI detection model using Mask R-CNN. This model was then tested on Vietnam dataset (Fig. 13).

Fig. 13 Model architecture that determining ROI and classifying as BI-RAD level for mammograms

Building a X-ray Database for Mammography on Vietnamese Patients ...

327

Fig. 14 Detecting ROI on Vietnamese mammograms (for one patch). The results show that the model has identified correctly ROI region for this image

The model training process is performed on the Server machine with the following configuration: • • • •

CPU: Core i9–7900X (3.30 GHz) RAM: 128 GB 2 Harddisks: one SSD 500 GB and one HDD 4 TB 2 GPU Nvidia RTX2080TI-11 GB

Server is installed with Ubuntu 16.04.3 LTS (Linux operating system) with libraries supporting implementation of Deep Learning algorithms on GPU including Nvidia CUDA 8.0.61 and Nvidia cuDNN 6.0.21. Deep Learning system was developed on Keras with Tensorflow Back-end [1, 4]. The total number of patches acquired in the training set is 69,828 training data. These patches are obtained on pre-processed mammograms. We then apply additional transformations that increase the data (Sect. 4a). The model will train on the above dataset through 300 epochs using the SGD (stochastic gradient descent) method with a patch size of 500 images, using a pre-trained model of Resnet101 on COCO dataset. The training process used the learning rate reduction strategy of 10e−5 to 10e−9 (decrease as a power of 10 at the epochs that are multiples of 50). Because the DDSM and CBIS-DDSM datasets include the description of benign and malignancy cases, for experiments on the Vietnamese dataset, we only identify two types of BIRAD: BIRAD 2 (corresponding to benign) and BIRAD 4, 5 (corresponding to malignancy). Test results across the entire test set is [email protected] = 0.8109 and BIRAD classification accuracy is 58.44%. The results showed that the system has determined accurately the ROI regions, but the classification of BIRAD levels is still not highly accurate. The main reason may be because we have applied transfer learning method on DDSM and CBIS-DDSM datasets with mammograms of Vietnamese patients. This method has some limitations as follows: • The above set of data includes scanned images, so the image quality is not really good. Mammograms of Vietnamese patients meanwhile are of very high quality

328

N. D. Thang et al.

as this data is collected and digitized directly from a good resolution equipment (Siemens’s mammomat inspiration). • The breast structure of international women is also different from the breast structure of Vietnamese female patients. • The number of mammograms in the Vietnam dataset is quite small. Therefore, in order to increase the reliability and diagnostic capabilities of the system, the following tasks will be performed in the future: • Collect additional mammograms of Vietnamese patients data. • Use Vietnamese data to directly train the Mask R-CNN model and compare new results with the above result. • Combine different CNN models to find the optimal result.

5 Conclusions and Perspectives This article describes the method of building X-ray dataset for Mammography on Vietnamese patient database that we collected at Hanoi Medical University Hospital and the method of detecting and identifying ROI based on Mask R-CNN. The results collected 4664 images (Dicom) corresponding to 1161 standard patients with a uniform distribution according to BIRAD from 0 to 5. The method of determining the ROI region for detected accuracy [email protected] of 0.8109 and identified accuracy of BIRAD levels of 58.44%. In the future, we will continue to collect more Vietnamese mammograms, then we will adapt on the model. Acknowledgements This work is supported by foundation of the research and development contract between Thang Long University and Hanoi Medical University Hospital, Vietnam dated on 27 November, 2018 on “Developing a support system for diagnosis of breast cancer based on X-Ray using Artificial Intelligence”.

References 1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G., Davis, A., Dean, J., Devin, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems (2015) 2. Agarwal, R., Diaz, O., Llado, X., Yap, M.H., Mart, R.: Automatic mass detection in mammograms using deep convolutional neural networks. J. Med. Imaging 6, 1 (2019) 3. Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., Bouchard, N., Warde-Farley, D., Bengio, Y.: Theano: new features and speed improvements. arXiv preprint arXiv:1211.5590 (2012) 4. Chollet, F., et al.: Keras (2015) 5. Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS workshop, number EPFL-CONF-192376 (2011)

Building a X-ray Database for Mammography on Vietnamese Patients ...

329

6. Fukushima, K., Miyake, S.: Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and Cooperation in Neural Nets, pp. 267– 285. Springer (1982) 7. Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. J. Am. Med. Assoc. 316, 2402–2410 (2016) 8. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), October 2017 9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 10. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and